[HN Gopher] I Tried to Reduce Pylint Memory Usage
       ___________________________________________________________________
        
       I Tried to Reduce Pylint Memory Usage
        
       Author : zdw
       Score  : 207 points
       Date   : 2020-10-12 14:01 UTC (8 hours ago)
        
 (HTM) web link (rtpg.co)
 (TXT) w3m dump (rtpg.co)
        
       | jnwatson wrote:
       | I recently went through a similar exercise using the same tools
       | on a large open source Python codebase. The solution is the same
       | as the author found: don't keep Exception objects around past the
       | actual lifetime of the exception.
       | 
       | An Exception has the traceback, and the traceback has all the
       | frames of the call stack, and each frame has a reference to the
       | local variables in that frame.
       | 
       | Keeping an Exception around past its time can yield a huge tangle
       | of circular references that puts a _lot_ of pressure on the GC.
       | 
       | I reduced memory utilization dramatically by deleting 1 line of
       | code.
        
         | trupper-se wrote:
         | > 'don't keep Exception objects around past the actual lifetime
         | of the exception'
         | 
         | But, does it generate more read/write-cycles on your local
         | SSD/Flash-Memory ?
         | 
         | Maybe my main intention was: "Does it make 'computing' longer
         | lasting -or does it support obsolence", or 'is it a typical
         | language processing idea' ? ? ^[?]^ ?
        
           | seabrookmx wrote:
           | This impacts memory (RAM) usage and has nothing to do with
           | storage (SSD).
        
         | mikepurvis wrote:
         | Does it do the right thing if you unset the traceback portion
         | of the Exception object?
        
           | jnwatson wrote:
           | Sure. However, instead of reaching into objects to prune
           | them, the better approach is to ask why you're keeping the
           | object around at all. Extracting just the parts you want into
           | your own structure is usually the better way to go.
        
       | tus88 wrote:
       | Get more memory.
        
       | Borkdude wrote:
       | I had a similar issue with a clj-kondo, a Clojure linter.
       | 
       | https://github.com/borkdude/clj-kondo/issues/1036
       | 
       | The reason of the memory leak was: I used a memoized function on
       | some argument that should be GC-ed after one run, but as memoized
       | functions store their arguments for future comparison, it was
       | kept in memory forever.
        
       | PaulHoule wrote:
       | That's a very important insight that, in many real applications,
       | memory is the the limiting factor for performance.
        
         | smitty1e wrote:
         | It's awesome that the resources of the cloud are less limited.
         | 
         | We can work at a higher level.
         | 
         | However, the old salts who had to care about every byte are
         | rightly horrified at the lack of frugality today.
        
           | rcxdude wrote:
           | Cloud is if anything where memory hurts more, because it's
           | less shareable than CPU time and so renting it costs more.
           | One of the big reasons I've seen cited for teams with a cloud
           | based application switching from a GCd language like java to
           | rust is because of the memory usage cost of the GC, which can
           | be a 2x or even 3x multiplier compared to the same in rust.
        
           | acdha wrote:
           | Seconding TFortunato: in my experience the people who didn't
           | ignore performance have much better cloud experiences because
           | they don't get a massive bill when a Lambda or auto scaled
           | system ramps up to handle that inefficient code. When you
           | were running your own servers you could ignore this somewhat
           | since the sunk cost had already been paid.
        
           | TFortunato wrote:
           | I wouldn't say they are less limited, so much as you are able
           | to trade your money for resources a lot faster.. You can
           | scale quickly, but at some point the monthly cloud bill come
           | due, and code optimization starts to look a little more
           | appealing :-)
        
             | WrtCdEvrydy wrote:
             | > code optimization starts to look a little more appealing
             | 
             | If you are on a cloud, you should always consider using
             | cloud native functions and services. Keeping your toe out
             | of the lock-in is expensive :(
        
         | [deleted]
        
       | taeric wrote:
       | I find it curious that "the power of python" is so heavily
       | credited here. Seems most of those tricks are easily done on the
       | jvm, as well. I'd imagine any interpreter based environment could
       | do similar. No?
        
         | hansvm wrote:
         | I'm not super familiar with the JVM, so correct me if I'm wrong
         | on that front, but I think the distinction is that Python
         | provides those kinds of introspection tools in the same
         | language, not just the runtime platform, so it can be easier on
         | tool writers and _much_ easier for individuals who only need to
         | peek into a few of the internals as a small component of some
         | other project.
        
           | taeric wrote:
           | The profiling tools on the jvm are quite good. Easily
           | comparable to what was presented in this article. I can see
           | some benefit to the idea that you can get a repl looking at
           | the results, though, again, I imagine any runtime based
           | language could give this. (I said interpreter based last
           | time, but it is the runtime that is important, I think.)
           | 
           | That said, I have not seen this done with lisp. I would
           | assume it would look a lot like this.
        
             | rtpg wrote:
             | Hey, I wrote the original post.
             | 
             | Though theoretically any interpreted language gives you all
             | of this, you actually might not have access to the
             | internals in practice. JS (well, node.js and browsers)
             | doesn't expose GC internals (partly due to a spec
             | requirement that the language be deterministic), and barely
             | gives you good exception introspection.
             | 
             | There is also the ergonomic advantage of no static typing:
             | you just get a reference and can figure out the type later.
             | Very useful when poking around unknown objects. I imagine
             | an API like this in Java would be much more verbose and
             | requiring a lot of casts etc.
        
       | loeg wrote:
       | This is a great example of the sort of meandering process one
       | might take while examining a performance defect in a large and
       | not entirely familiar code base. You don't always hit the exact
       | right cause immediately, and it's not realistic to assume you
       | will do so without some digging and false starts. Kudos to the
       | author for explaining their work, including the things that
       | didn't pan out.
        
       | recursivecaveat wrote:
       | Coincidentally I went through a very similar process debugging
       | memory usage of an internal application with guppy recently. One
       | thing I learned is that file dumps and the profile browser are
       | kindof a trap. You discard a huge amount of information when you
       | dump the heap summary to file. The guppy docs aren't too great,
       | but you can poke around the entire heap in detail if you break
       | into a debugger after taking a heap snapshot. You can explore
       | references and referrers, group objects by various categories,
       | and inspect their values without the tedious ctypes cast trick.
        
         | heavenlyblue wrote:
         | You can just use 'gc.get_referrers' without any libraries.
        
       | nickdrozd wrote:
       | Pylint is a great tool, but it could definitely be faster. One
       | school of thought says: of course it's too slow, it's written in
       | Python, if you want it faster you should rewrite it in C. But I
       | don't think such a drastic and destabilizing change is necessary.
       | There are critical sites in the code where a slightly wrong move
       | is made and it gets magnified into something larger. This post
       | points out one such site. The 80/20 rule says that fixing those
       | errors will give most of the benefit of a full rewrite with way
       | less effort.
       | 
       | A few years ago I got fed up with how slow Pylint was, so I made
       | some changes to improve its performance. The main issue was that
       | the tree traversal code was written generically. It was nice and
       | elegant in terms of readability, but it meant that there was a
       | lot of unnecessary runtime type-checking. It also meant that a
       | lot of work was getting done for no reason. For example, if you
       | are trying to apply lint rules for assign statements, you only
       | want to check places where assign statement can legally occur.
       | But Pylint was checking every single node for assign statements,
       | including places where they cannot occur, such as inside function
       | calls. Breaking up that generic logic into specialized instances
       | had an enormous impact on performance.
       | 
       | Here are the PRs that implemented these changes:
       | https://github.com/PyCQA/astroid/pull/497
       | https://github.com/PyCQA/astroid/pull/519
       | https://github.com/PyCQA/astroid/pull/552
        
         | user5994461 wrote:
         | > if you want it faster you should rewrite it in C. But I don't
         | think such a drastic and destabilizing change is necessary.
         | 
         | I've worked around the python linter and it is written in C.
         | The bits in python (especially our custom rules) were horribly
         | slow.
         | 
         | Lint is building a tree of the code and going through it
         | recursively. Going through a large tree in pure python is
         | horribly slow, it has to interpret every single line of code
         | again and resolve every variable again, because they could
         | change anytime, the overhead is insane.
         | 
         | It's the one thing that python cannot do, parsing tree. If
         | python had a JIT it might be doable but currently it isn't.
        
           | xapata wrote:
           | Try PyPy.
        
             | PaulHoule wrote:
             | Now that PyPy is up to Python 3.6 and has the bugs from the
             | 3.5 series fixed I am using it for real work.
             | 
             | The contextvars polyfill brings in the one thing I really
             | need from Py 3.7.
             | 
             | I have a toolkit that puffs up XML, JSON, whatever files
             | into an RDF graph that has extra blank nodes that let you
             | annotate anything. A year ago I was complaining about how
             | slow it was, with PyPy it is 5 times faster and I am not
             | complaining.
             | 
             | I hear people get similar speed-ups for branchy monte carlo
             | simulations too.
        
           | Recursing wrote:
           | black (the python auto formatter) uses mypyc
           | 
           | https://github.com/python/mypy/tree/master/mypyc
        
             | rattray wrote:
             | That looks exciting!
             | 
             | Compiles mypy-annotated Python to a Python C extension.
             | Claims a ~4x speedup, but still quite unstable/buggy, so
             | who knows how much performance they'll have to give up for
             | working software. Fingers crossed!
        
         | hinkley wrote:
         | Smaller memory footprints can make code faster, but making code
         | faster often creates a bigger memory footprint.
         | 
         | For instance, pipelining an operation to allow multiple CPUs to
         | work on a problem at once can reduce the pressure on the tall
         | tent pole, be it IO bottlenecks or one phase. Time is still
         | dictated by the tallest tent poles, but not also by the fourth
         | through tenth tallest poles.
         | 
         | But now everything is happening at once, which means all of the
         | temporary data structures exist in memory in parallel, instead
         | of sequentially.
         | 
         | Heavily paraphrasing someone else: The simple things get taken
         | care of early. If we're standing around talking about problems
         | in a successful tool, they're complicated. Although I disagree
         | with that person on one point: we can also be talking about
         | problems that were complicated to diagnose (but straightforward
         | to fix).
        
       | nickcw wrote:
       | I did a similar exercise on a Python program a few years ago now.
       | 
       | In the end I ended up adding __slots__ to two classes and the
       | memory usage shrunk by 2/3 (there were a lot of those two
       | objects).
       | 
       | I also managed to double the speed by re-writing (re-wording
       | really) a bit of string handling which just happened to be the
       | bottleneck. Can't remember the details now but it was a trivial
       | code change.
       | 
       | Profiling for the win!
        
       ___________________________________________________________________
       (page generated 2020-10-12 23:00 UTC)