[HN Gopher] I Tried to Reduce Pylint Memory Usage ___________________________________________________________________ I Tried to Reduce Pylint Memory Usage Author : zdw Score : 207 points Date : 2020-10-12 14:01 UTC (8 hours ago) (HTM) web link (rtpg.co) (TXT) w3m dump (rtpg.co) | jnwatson wrote: | I recently went through a similar exercise using the same tools | on a large open source Python codebase. The solution is the same | as the author found: don't keep Exception objects around past the | actual lifetime of the exception. | | An Exception has the traceback, and the traceback has all the | frames of the call stack, and each frame has a reference to the | local variables in that frame. | | Keeping an Exception around past its time can yield a huge tangle | of circular references that puts a _lot_ of pressure on the GC. | | I reduced memory utilization dramatically by deleting 1 line of | code. | trupper-se wrote: | > 'don't keep Exception objects around past the actual lifetime | of the exception' | | But, does it generate more read/write-cycles on your local | SSD/Flash-Memory ? | | Maybe my main intention was: "Does it make 'computing' longer | lasting -or does it support obsolence", or 'is it a typical | language processing idea' ? ? ^[?]^ ? | seabrookmx wrote: | This impacts memory (RAM) usage and has nothing to do with | storage (SSD). | mikepurvis wrote: | Does it do the right thing if you unset the traceback portion | of the Exception object? | jnwatson wrote: | Sure. However, instead of reaching into objects to prune | them, the better approach is to ask why you're keeping the | object around at all. Extracting just the parts you want into | your own structure is usually the better way to go. | tus88 wrote: | Get more memory. | Borkdude wrote: | I had a similar issue with a clj-kondo, a Clojure linter. | | https://github.com/borkdude/clj-kondo/issues/1036 | | The reason of the memory leak was: I used a memoized function on | some argument that should be GC-ed after one run, but as memoized | functions store their arguments for future comparison, it was | kept in memory forever. | PaulHoule wrote: | That's a very important insight that, in many real applications, | memory is the the limiting factor for performance. | smitty1e wrote: | It's awesome that the resources of the cloud are less limited. | | We can work at a higher level. | | However, the old salts who had to care about every byte are | rightly horrified at the lack of frugality today. | rcxdude wrote: | Cloud is if anything where memory hurts more, because it's | less shareable than CPU time and so renting it costs more. | One of the big reasons I've seen cited for teams with a cloud | based application switching from a GCd language like java to | rust is because of the memory usage cost of the GC, which can | be a 2x or even 3x multiplier compared to the same in rust. | acdha wrote: | Seconding TFortunato: in my experience the people who didn't | ignore performance have much better cloud experiences because | they don't get a massive bill when a Lambda or auto scaled | system ramps up to handle that inefficient code. When you | were running your own servers you could ignore this somewhat | since the sunk cost had already been paid. | TFortunato wrote: | I wouldn't say they are less limited, so much as you are able | to trade your money for resources a lot faster.. You can | scale quickly, but at some point the monthly cloud bill come | due, and code optimization starts to look a little more | appealing :-) | WrtCdEvrydy wrote: | > code optimization starts to look a little more appealing | | If you are on a cloud, you should always consider using | cloud native functions and services. Keeping your toe out | of the lock-in is expensive :( | [deleted] | taeric wrote: | I find it curious that "the power of python" is so heavily | credited here. Seems most of those tricks are easily done on the | jvm, as well. I'd imagine any interpreter based environment could | do similar. No? | hansvm wrote: | I'm not super familiar with the JVM, so correct me if I'm wrong | on that front, but I think the distinction is that Python | provides those kinds of introspection tools in the same | language, not just the runtime platform, so it can be easier on | tool writers and _much_ easier for individuals who only need to | peek into a few of the internals as a small component of some | other project. | taeric wrote: | The profiling tools on the jvm are quite good. Easily | comparable to what was presented in this article. I can see | some benefit to the idea that you can get a repl looking at | the results, though, again, I imagine any runtime based | language could give this. (I said interpreter based last | time, but it is the runtime that is important, I think.) | | That said, I have not seen this done with lisp. I would | assume it would look a lot like this. | rtpg wrote: | Hey, I wrote the original post. | | Though theoretically any interpreted language gives you all | of this, you actually might not have access to the | internals in practice. JS (well, node.js and browsers) | doesn't expose GC internals (partly due to a spec | requirement that the language be deterministic), and barely | gives you good exception introspection. | | There is also the ergonomic advantage of no static typing: | you just get a reference and can figure out the type later. | Very useful when poking around unknown objects. I imagine | an API like this in Java would be much more verbose and | requiring a lot of casts etc. | loeg wrote: | This is a great example of the sort of meandering process one | might take while examining a performance defect in a large and | not entirely familiar code base. You don't always hit the exact | right cause immediately, and it's not realistic to assume you | will do so without some digging and false starts. Kudos to the | author for explaining their work, including the things that | didn't pan out. | recursivecaveat wrote: | Coincidentally I went through a very similar process debugging | memory usage of an internal application with guppy recently. One | thing I learned is that file dumps and the profile browser are | kindof a trap. You discard a huge amount of information when you | dump the heap summary to file. The guppy docs aren't too great, | but you can poke around the entire heap in detail if you break | into a debugger after taking a heap snapshot. You can explore | references and referrers, group objects by various categories, | and inspect their values without the tedious ctypes cast trick. | heavenlyblue wrote: | You can just use 'gc.get_referrers' without any libraries. | nickdrozd wrote: | Pylint is a great tool, but it could definitely be faster. One | school of thought says: of course it's too slow, it's written in | Python, if you want it faster you should rewrite it in C. But I | don't think such a drastic and destabilizing change is necessary. | There are critical sites in the code where a slightly wrong move | is made and it gets magnified into something larger. This post | points out one such site. The 80/20 rule says that fixing those | errors will give most of the benefit of a full rewrite with way | less effort. | | A few years ago I got fed up with how slow Pylint was, so I made | some changes to improve its performance. The main issue was that | the tree traversal code was written generically. It was nice and | elegant in terms of readability, but it meant that there was a | lot of unnecessary runtime type-checking. It also meant that a | lot of work was getting done for no reason. For example, if you | are trying to apply lint rules for assign statements, you only | want to check places where assign statement can legally occur. | But Pylint was checking every single node for assign statements, | including places where they cannot occur, such as inside function | calls. Breaking up that generic logic into specialized instances | had an enormous impact on performance. | | Here are the PRs that implemented these changes: | https://github.com/PyCQA/astroid/pull/497 | https://github.com/PyCQA/astroid/pull/519 | https://github.com/PyCQA/astroid/pull/552 | user5994461 wrote: | > if you want it faster you should rewrite it in C. But I don't | think such a drastic and destabilizing change is necessary. | | I've worked around the python linter and it is written in C. | The bits in python (especially our custom rules) were horribly | slow. | | Lint is building a tree of the code and going through it | recursively. Going through a large tree in pure python is | horribly slow, it has to interpret every single line of code | again and resolve every variable again, because they could | change anytime, the overhead is insane. | | It's the one thing that python cannot do, parsing tree. If | python had a JIT it might be doable but currently it isn't. | xapata wrote: | Try PyPy. | PaulHoule wrote: | Now that PyPy is up to Python 3.6 and has the bugs from the | 3.5 series fixed I am using it for real work. | | The contextvars polyfill brings in the one thing I really | need from Py 3.7. | | I have a toolkit that puffs up XML, JSON, whatever files | into an RDF graph that has extra blank nodes that let you | annotate anything. A year ago I was complaining about how | slow it was, with PyPy it is 5 times faster and I am not | complaining. | | I hear people get similar speed-ups for branchy monte carlo | simulations too. | Recursing wrote: | black (the python auto formatter) uses mypyc | | https://github.com/python/mypy/tree/master/mypyc | rattray wrote: | That looks exciting! | | Compiles mypy-annotated Python to a Python C extension. | Claims a ~4x speedup, but still quite unstable/buggy, so | who knows how much performance they'll have to give up for | working software. Fingers crossed! | hinkley wrote: | Smaller memory footprints can make code faster, but making code | faster often creates a bigger memory footprint. | | For instance, pipelining an operation to allow multiple CPUs to | work on a problem at once can reduce the pressure on the tall | tent pole, be it IO bottlenecks or one phase. Time is still | dictated by the tallest tent poles, but not also by the fourth | through tenth tallest poles. | | But now everything is happening at once, which means all of the | temporary data structures exist in memory in parallel, instead | of sequentially. | | Heavily paraphrasing someone else: The simple things get taken | care of early. If we're standing around talking about problems | in a successful tool, they're complicated. Although I disagree | with that person on one point: we can also be talking about | problems that were complicated to diagnose (but straightforward | to fix). | nickcw wrote: | I did a similar exercise on a Python program a few years ago now. | | In the end I ended up adding __slots__ to two classes and the | memory usage shrunk by 2/3 (there were a lot of those two | objects). | | I also managed to double the speed by re-writing (re-wording | really) a bit of string handling which just happened to be the | bottleneck. Can't remember the details now but it was a trivial | code change. | | Profiling for the win! ___________________________________________________________________ (page generated 2020-10-12 23:00 UTC)