[HN Gopher] Scalene: a high-performance, high-precision CPU and ... ___________________________________________________________________ Scalene: a high-performance, high-precision CPU and memory profiler for Python Author : matt_d Score : 78 points Date : 2020-01-09 17:33 UTC (5 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | [deleted] | bawr_hszm wrote: | Sadly, relying on Python's signal handling is not enough to get | robust profiling information the moment your code is spending a | significant chunk of time outside of simple Python calls. This is | because signals don't get delivered to the Python level until the | interpreter comes back from C land, and it's possible to get | stuck in C land even with pure Python code. | | To wit: $ python -m scalene trace.py f1 | 1.3852 312499987500000 f2 1.2420 2812499962500000 f3 | 1.5018 1.5 trace.py: % of CPU time = 33.66% out of | 4.13s. Line | CPU % | [trace.py] 1 | | | #!/usr/bin/env python 2 | | 3 | | | import time 4 | | 5 | | | def timed(f): 6 | | def f_timed(*args, | **kwargs): 7 | | t = time.time() | 8 | | r = f(*args, **kwargs) 9 | | | t = time.time() - t 10 | | | print('%s %.4f %s' % (f.__name__, t, r)) 11 | | | return f_timed 12 | | 13 | | | @timed 14 | | def f1(n): 15 | | | s = 0 16 | 17.99% | for i in range(n): | 17 | 81.29% | s += i 18 | | | return s 19 | | 20 | | | @timed 21 | | def f2(n): 22 | 0.72% | | return sum(range(n)) 23 | | 24 | | | @timed 25 | | def f3(t): | 26 | | time.sleep(t) 27 | | | return t 28 | | 29 | | if | __name__ == '__main__': 30 | | | f1(25_000_000) 31 | | f2(75_000_000) | 32 | | f3(1.5) | emeryberger wrote: | Good observation. My personal POV is that the best way to | optimize your Python code is to use native code (whether as | libraries or through pure Python code that is essentially a | thin wrapper over C) rather than living in the interpreter. I | want to see the parts of the program that are spending that | time in the interpreter. | | In short, a profiler that tells me that a program is spending a | lot of time in C is not generally providing me particularly | actionable information. | | (In any event, the top-line report is that the Python part of | the program only accounts for 33.66% of the execution time, | which looks just about right.) | bawr_hszm wrote: | I can understand this approach, but I fundamentally disagree | with it - the first duty of a profiling tool is _not to | mislead the user_. | | In this example, the program spends a third of the time just | sleeping / blocked, a third of the time on CPU but at the C | level, and the remainder just evaluating Python loops. Unless | you already know how it's implemented, that 33.66% is not | easy to interpret, and the docs don't mention what _exactly_ | is being profiled. Specifically, these samples aren 't a % of | CPU time, they're a % of real time that we happened to have | been able to do a Python-level interrupt, and even that isn't | a great explanation. I think most users would still expect | line 22 to get traced properly. | | I very much _do_ want to know about C time, too, because it | 's very much actionable for most of the optimizations I end | up making in production systems. | | That said, I don't think this line of discussion is super | productive for either of us, we seem to have different goals | in mind, which is fine. ;) | | So I'll close by saying that I was impressed by your | LD_PRELOAD hacks for memory profiling, which isn't an | approach that I've ever seen in other Python profilers. | jeanvalmarc wrote: | Another similar sampling profiler which has decent performance is | pyinstrument: https://github.com/joerick/pyinstrument | spott wrote: | Another python profile that isn't mentioned is pprofile[0]. It is | also a statistical profiler with line-level granularity. It | doesn't have the memory profiling abilities (that is pretty | slick...), but it also doesn't require the dynamic library | injection. | | I don't know how it's statistical profiling speed compares to | scalene, but it would be great to see the comparison. | | Also, does anyone know how the malloc interacts with pytorch? | | [0]https://github.com/vpelletier/pprofile | JackC wrote: | I love the idea of fast Python profilers that don't require | modified source code to run. | | One profiler I used recently which isn't mentioned in the readme | is pyflame (developed at uber): | | https://pyflame.readthedocs.io/ | | pyflame likewise claims to run on unmodified source code and be | fast enough to run in production, so it might be worth adding to | the comparison. It generates flamegraphs, which greatly sped up | debugging the other day when I needed to figure out why something | was slow somewhere in a Django request-response callstack. | albertzeyer wrote: | Similar to PyFlame is also py-spy: | https://github.com/benfred/py-spy | | > While pyflame is a great project, it doesn't support Python | 3.7 yet and doesn't work on OSX, Windows or FreeBSD. | | I wonder how the CPU profiling in Scalene is different. It does | not mention PyFlame or py-spy at all in the Readme. Of course, | the memory profiler is some nice extra. | emeryberger wrote: | Scalene author here. I was not aware of either tool - many | thanks for the pointers! I just tried py-spy (I will try | PyFlame on a Linux box momentarily). It's pretty cool, though | having to run it as root on OS X isn't great (scalene runs | without the need for root privileges; the CPU profiling part | is pure Python). Py-spy does appear to efficiently track CPU | perf at the line granularity (and does a lot of other stuff). | It does a lot of things that scalene does not do, but not | memory profiling. Also, I personally prefer scalene's line- | level display of annotated source code vs. the flame graphs, | but YMMV. | Znafon wrote: | The possibility to run py-spy on an already running Python | program in production is pretty awesome. Do you think | Scalene could do this? | mlthoughts2018 wrote: | You can fairly easily write helper scripts using eg pkg_util to | automatically add profiler decorators (eg with kernprof) so | that profiling never requires modifying code. | | My team has a large body of profiling code written with | kernprof and none of it modifies the underlying source. | Profiler annotations are solely added automatically by the | little profiler runner tooling we wrote. | | Not to say other profiling tools aren't worth it. ___________________________________________________________________ (page generated 2020-01-09 23:00 UTC)