[HN Gopher] Scalene: a high-performance, high-precision CPU and ...
       ___________________________________________________________________
        
       Scalene: a high-performance, high-precision CPU and memory profiler
       for Python
        
       Author : matt_d
       Score  : 78 points
       Date   : 2020-01-09 17:33 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | [deleted]
        
       | bawr_hszm wrote:
       | Sadly, relying on Python's signal handling is not enough to get
       | robust profiling information the moment your code is spending a
       | significant chunk of time outside of simple Python calls. This is
       | because signals don't get delivered to the Python level until the
       | interpreter comes back from C land, and it's possible to get
       | stuck in C land even with pure Python code.
       | 
       | To wit:                 $ python -m scalene trace.py       f1
       | 1.3852 312499987500000       f2 1.2420 2812499962500000       f3
       | 1.5018 1.5       trace.py: % of CPU time =  33.66% out of
       | 4.13s.       Line  | CPU %    |  [trace.py]          1  |
       | | #!/usr/bin/env python          2  |          |           3  |
       | | import time          4  |          |           5  |          |
       | def timed(f):          6  |          |     def f_timed(*args,
       | **kwargs):          7  |          |         t = time.time()
       | 8  |          |         r = f(*args, **kwargs)          9  |
       | |         t = time.time() - t         10  |          |
       | print('%s %.4f %s' % (f.__name__, t, r))         11  |          |
       | return f_timed         12  |          |          13  |          |
       | @timed         14  |          | def f1(n):         15  |
       | |     s = 0         16  |  17.99%  |     for i in range(n):
       | 17  |  81.29%  |         s += i         18  |          |
       | return s         19  |          |          20  |          |
       | @timed         21  |          | def f2(n):         22  |   0.72%
       | |     return sum(range(n))         23  |          |          24
       | |          | @timed         25  |          | def f3(t):
       | 26  |          |     time.sleep(t)         27  |          |
       | return t         28  |          |          29  |          | if
       | __name__ == '__main__':         30  |          |
       | f1(25_000_000)         31  |          |     f2(75_000_000)
       | 32  |          |     f3(1.5)
        
         | emeryberger wrote:
         | Good observation. My personal POV is that the best way to
         | optimize your Python code is to use native code (whether as
         | libraries or through pure Python code that is essentially a
         | thin wrapper over C) rather than living in the interpreter. I
         | want to see the parts of the program that are spending that
         | time in the interpreter.
         | 
         | In short, a profiler that tells me that a program is spending a
         | lot of time in C is not generally providing me particularly
         | actionable information.
         | 
         | (In any event, the top-line report is that the Python part of
         | the program only accounts for 33.66% of the execution time,
         | which looks just about right.)
        
           | bawr_hszm wrote:
           | I can understand this approach, but I fundamentally disagree
           | with it - the first duty of a profiling tool is _not to
           | mislead the user_.
           | 
           | In this example, the program spends a third of the time just
           | sleeping / blocked, a third of the time on CPU but at the C
           | level, and the remainder just evaluating Python loops. Unless
           | you already know how it's implemented, that 33.66% is not
           | easy to interpret, and the docs don't mention what _exactly_
           | is being profiled. Specifically, these samples aren 't a % of
           | CPU time, they're a % of real time that we happened to have
           | been able to do a Python-level interrupt, and even that isn't
           | a great explanation. I think most users would still expect
           | line 22 to get traced properly.
           | 
           | I very much _do_ want to know about C time, too, because it
           | 's very much actionable for most of the optimizations I end
           | up making in production systems.
           | 
           | That said, I don't think this line of discussion is super
           | productive for either of us, we seem to have different goals
           | in mind, which is fine. ;)
           | 
           | So I'll close by saying that I was impressed by your
           | LD_PRELOAD hacks for memory profiling, which isn't an
           | approach that I've ever seen in other Python profilers.
        
       | jeanvalmarc wrote:
       | Another similar sampling profiler which has decent performance is
       | pyinstrument: https://github.com/joerick/pyinstrument
        
       | spott wrote:
       | Another python profile that isn't mentioned is pprofile[0]. It is
       | also a statistical profiler with line-level granularity. It
       | doesn't have the memory profiling abilities (that is pretty
       | slick...), but it also doesn't require the dynamic library
       | injection.
       | 
       | I don't know how it's statistical profiling speed compares to
       | scalene, but it would be great to see the comparison.
       | 
       | Also, does anyone know how the malloc interacts with pytorch?
       | 
       | [0]https://github.com/vpelletier/pprofile
        
       | JackC wrote:
       | I love the idea of fast Python profilers that don't require
       | modified source code to run.
       | 
       | One profiler I used recently which isn't mentioned in the readme
       | is pyflame (developed at uber):
       | 
       | https://pyflame.readthedocs.io/
       | 
       | pyflame likewise claims to run on unmodified source code and be
       | fast enough to run in production, so it might be worth adding to
       | the comparison. It generates flamegraphs, which greatly sped up
       | debugging the other day when I needed to figure out why something
       | was slow somewhere in a Django request-response callstack.
        
         | albertzeyer wrote:
         | Similar to PyFlame is also py-spy:
         | https://github.com/benfred/py-spy
         | 
         | > While pyflame is a great project, it doesn't support Python
         | 3.7 yet and doesn't work on OSX, Windows or FreeBSD.
         | 
         | I wonder how the CPU profiling in Scalene is different. It does
         | not mention PyFlame or py-spy at all in the Readme. Of course,
         | the memory profiler is some nice extra.
        
           | emeryberger wrote:
           | Scalene author here. I was not aware of either tool - many
           | thanks for the pointers! I just tried py-spy (I will try
           | PyFlame on a Linux box momentarily). It's pretty cool, though
           | having to run it as root on OS X isn't great (scalene runs
           | without the need for root privileges; the CPU profiling part
           | is pure Python). Py-spy does appear to efficiently track CPU
           | perf at the line granularity (and does a lot of other stuff).
           | It does a lot of things that scalene does not do, but not
           | memory profiling. Also, I personally prefer scalene's line-
           | level display of annotated source code vs. the flame graphs,
           | but YMMV.
        
             | Znafon wrote:
             | The possibility to run py-spy on an already running Python
             | program in production is pretty awesome. Do you think
             | Scalene could do this?
        
         | mlthoughts2018 wrote:
         | You can fairly easily write helper scripts using eg pkg_util to
         | automatically add profiler decorators (eg with kernprof) so
         | that profiling never requires modifying code.
         | 
         | My team has a large body of profiling code written with
         | kernprof and none of it modifies the underlying source.
         | Profiler annotations are solely added automatically by the
         | little profiler runner tooling we wrote.
         | 
         | Not to say other profiling tools aren't worth it.
        
       ___________________________________________________________________
       (page generated 2020-01-09 23:00 UTC)