[HN Gopher] Making Python fast - Adventures with mypyc ___________________________________________________________________ Making Python fast - Adventures with mypyc Author : meadsteve Score : 177 points Date : 2022-09-27 12:42 UTC (10 hours ago) (HTM) web link (blog.meadsteve.dev) (TXT) w3m dump (blog.meadsteve.dev) | meadsteve wrote: | I recently experimented with using mypyc to make some of my | python a little faster. I was pleasantly surprised with how well | it worked for very little code change so I thought I'd share my | experiences. | | The blog post wanders around a little because I had to add | setuptools and wheel building as my project had previously | skipped this. | 4140tm wrote: | I just found out about Lagom from this blog post and it's | exactly what I have been looking for. | | All other Python options I've seen feel too involved or leak | too much into your code. Lagom seems to balance everything just | right. | | Thank you! | cinntaile wrote: | Haha, I can imagine Steve is quite pleased with this comment. | You should look up the meaning of the (Swedish) word lagom. | meadsteve wrote: | Thanks 4140tm and thanks cinntaile. I was very pleased. | That was very much the intention of the name | raphaelrk wrote: | I recently benchmarked "numpy vs js" matrix multiplication | performance, and was surprised to find js significantly | outperforming numpy. For multiplying two 512x512 matrices: | python numpy: ~3.30ms numpy | with numba: ~2.90ms node tfjs: | ~1.00ms gpu.js: ~4.00ms ndarray: | ~118.00ms vanilla loop: ~138.00ms | mathjs: ~1876.00ms browser tfjs | webgpu: ~.16ms tfjs webgl: ~.76ms | tfjs wasm: ~2.51ms gpu.js: | ~6.00ms tfjs cpu: ~244.65ms mathjs: | ~3469.00ms c accelerate: ~.06ms | | Source here: https://github.com/raphaelrk/matrix-mul-test | bobsmooth wrote: | What's with this fascination with making python fast? It's not | supposed to be fast, it's supposed to be simple. If you want | speed use a compiled language. Trying to make python fast is like | trying to strap a turbocharger to a tricycle. | alanwreath wrote: | I agree with you -- but I also don't say no to free food. | | I mean regardless of whether mypy was going to make my code run | faster I would have used it for the shear confidence it gives | wrt to my code correctness. The fact that I can use that same | code (untouched) to speed it up... that's just means I get to | have my cake and eat it too :P | meadsteve wrote: | Yeah this is exactly it for me. I already had type | annotations and ran mypy to help with correctness. And I | tried this out because it felt like a nice thing to get for | free. | intrepidhero wrote: | I like the concept of using mypyc to leverage type hints to | compile python. But I was pretty frustrated recently when I got | bit by a bug in mypyc[1] while trying to use black. Especially | since I wasn't using mypyc myself and so didn't realize it was | even in my dependency tree. Beware adding "alpha" quality | software as a dependency to your supposedly production ready | tool. | | [1] https://github.com/psf/black/issues/2846 | BiteCode_dev wrote: | Mind you, it still requires to have a c compiler, to be installed | separately. It's very easy on linux, but a x-code install on mac, | and can be fiddling on windows. | | Still nice, but not like golang or rust where you have a stand | alone solution. | | It's an alternative to nuitka, which I recommend to try out. | atoav wrote: | Anybody using Python and Rust should also check out maturin and | pyo3. I run some (non public) Python modules created in Rust and | both the performance and the testability is stellar. | meadsteve wrote: | Yeah these are great approaches too. I'd actually considered a | rewrite of the core in rust before I went with mypyc. But it | was nice not to have to do a rewrite. | atoav wrote: | Totally understandable. More options are better anyways. | jblindsay wrote: | I have the exact same experience. Both Maturin and PyO3 have | been a game changer for the work that I have been doing lately. | It works so seamlessly. | kodablah wrote: | We built the logic backing the Temporal Python SDK[0] in Rust | and leverage PyO3 (and PyO3 Asyncio). Unfortunately Maturin | didn't let us do some of the advanced things we needed to do | for wheel creation (at the time, unsure now), so we use | setuptools-rust with Poetry. | | 0 - https://github.com/temporalio/sdk-python | atoav wrote: | I had no issues with the standard maturin way of building | wheels - but my requirements were not special at all. I also | did this maybe 5 months ago, so maybe it has indeed gotten | better, I cannot tell. | wcdolphin wrote: | Is anyone else using MyPyC in production and can share their | experience? Did you attempt the compile it all approach, or | incrementally add? What do compile times look like at scale? | | Would love to buy you a coffee and hear about your experience and | the challenges and benefits. | bsenftner wrote: | Worth mentioning Taichi, a high-performance parallel programming | language embedded in Python. I've experimented with it a bit, and | high-performance is very true. One can pretty much just write | ordinary Python, plus enhancing existing Python is not that | difficult either. | | From their docs: | | You can write computationally intensive tasks in Python while | obeying a few extra rules imposed by Taichi to take advantage of | the latter's high performance. Use decorators @ti.func and | @ti.kernel as signals for Taichi to take over the implementation | of the tasks, and Taichi's just-in-time (JIT) compiler would | compile the decorated functions to machine code. All subsequent | calls to them are executed on multi-CPU cores or GPUs. In a | typical compute-intensive scenario (such as a numerical | simulation), Taichi can lead to a 50x~100x speed up over native | Python code. | | Taichi's built-in ahead-of-time (AOT) system also allows you to | export your code as binary/shader files, which can then be | invoked in C/C++ and run without the Python environment. | | https://www.taichi-lang.org/ | kingkongjaffa wrote: | Can this work with pyinstaller to make an executable faster? | Cyphase wrote: | I can't see why not. I've packaged some complex dependencies | with PyInstaller - on Windows. There is always a way. This | wouldn't even be particularly difficult. | ok_dad wrote: | I'll be that guy who says I love Python but it's been shoved into | too many spaces now. It's been a great tool for me for writing | things that require a lot of I/O and aren't CPU bound. | | I am even rethinking that now because I was able to write a | program in Go with an HTTP API and using JSON as the usual API | interchange format in one night (all stdlib too), and it was so | easy that I plan to pitch using it for several services we need | to rewrite at work that are currently in Python. That would be | very similar to what I wrote in a day. | | If Python doesn't fix their packaging, performance, and the | massive expansion in the language, I think it's going to start | losing ground to other languages. | rkrzr wrote: | I didn't know that you can compile individual modules with mypyc. | That's very interesting since it allows a gradual adoption of the | compiler, which really helps with big codebases. | | Do you know if there are any requirements for which modules can | be compiled? E.g. can they be imported in other modules or do | they have to be a leaf in the import tree/graph ? | traverseda wrote: | Having read through the docs Mypyc has a concept of "native | classes" and python classes, and it looks like you can use a | "native" (compiles) class from regular python and vice-versa. | | So my reading is that it should be pretty seamless. | an1sotropy wrote: | I'm curious how to compare this with a PyPy FAQ: | https://doc.pypy.org/en/latest/faq.html#would-type-annotatio... | which describes a bit about why type hints aren't as helpful to | optimize code under PyPy as one (including myself) might think. | | Can someone explain more about how mypyc is in a better position | to produce better optimizations than pypy, or am I confused about | this? | detaro wrote: | pypy argues that considering type annotations gives them less | useful data than their existing tracing does, and thus pypy | wouldn't be faster if it considered them. Something like mypyc | by design has no chance of doing tracing, and thus has to work | with annotations. (I also don't see where you get the claim | from that that mypyc has better optimizations than pypy? But | the two also follow different designs, so they might be good at | different things) | an1sotropy wrote: | sorry I didn't mean to claim that mypyc does have better | optimizations, I meant to be asking if that was possible. My | superficial read was: this post about mypyc goes from type | hints to compiling to "faster", and then I remembered the | pypy FAQ which says type hints didn't help with that. | | But if mypyc has no runtime information to go on (which pypy | does have), then certainly having some type information is | better than having none. | peterkelly wrote: | mypyc is cool and all, but I can't help thinking about how Node | just JITs everything automatically without the need for any | special steps like this. | BiteCode_dev wrote: | That's what Microsoft is paying Guido for, for the next | versions of python. | chrisseaton wrote: | I think that's not really the plan - they're talking about | just basic template compilation, nothing like V8 | https://github.com/markshannon/faster- | cpython/blob/master/pl.... | chrisseaton wrote: | That's not Node - that's V8. And it's possible to do the same | thing for Python - there's nothing magic about JavaScript | compared to Python - it's just a lot of engineering work to do | it, which is beyond what this project's scope is. PyPy does it, | but not inside standard Python. | peterkelly wrote: | I'm well aware of V8 and pypy. I also really like Python as a | language, especially with mypy. | | It just makes me sad that in a world with multiple high- | performance JIT engines (including pypy, for Python itself), | the standard Python version that most people use is an | interpreter. I know it's largely due to compatibility reasons | (C extensions being deeply intertwined with CPython's API). | | There _is_ a really important (if not "magic") difference | between JavaScript and Python. JS has always (well, since IE | added support) been a language with multiple widely-used | implementations in the wild, which has prevented the | emergence of a third-party package ecosystem which is heavily | tied to one particular implementation. Python on the other | hand is for a large proportion of the userbase considered | CPython, with alternate implementations being second class | citizens, despite some truly impressive efforts on the | latter. | | The fact that packages written in JS are not tied to (or at | least work best with) a single implementation is also what | made it possible for developers of JS engines to experiment | with different implementation approaches, including JIT. | While I'm not intimately familiar with writing native | extension modules for Node (having dabbled only a little), my | understanding is the API surface is much narrower than | Python, allowing for changes in the engine that avoid | breaking APIs. But there is less need for native modules in | JS, because of the presence of JIT in all major engines. | mkoubaa wrote: | This is in the process of being addressed - look into the | HPy project | zzzeek wrote: | > It just makes me sad that in a world with multiple high- | performance JIT engines (including pypy, for Python | itself), the standard Python version that most people use | is an interpreter. I know it's largely due to compatibility | reasons (C extensions being deeply intertwined with | CPython's API). | | this is misleading, if one sees the phrase "interpreter" as | that code is represented as syntax-derived trees or other | datastructures which are then traversed at runtime to | produce results - someone correct me if I'm wrong but this | would apply to well known interpreted languages like Perl | 5. cPython is a _bytecode_ interpreter, not conceptually | unlike the Java VM before JITs were added. It just happens | to compile scripts to bytecode on the fly. | chrisseaton wrote: | Bytecode is just another data structure that you traverse | at runtime to produce results. It's a postfix | transformation of the AST. It's still an interpreter. | zzzeek wrote: | so you'd call the pre-JIT JVM an "interpreter" and you'd | call Java an interpreted language? | chrisseaton wrote: | > so you'd call the pre-JIT JVM an "interpreter" | | Yeah? I think almost everyone would? | | > and you'd call Java an interpreted language? | | Java is interpreted in many ways, and compiled in many | ways, as I said it's complicated. It's compiled to | bytecode, which is interpreted until it's time to be | compiled... at which point it's abstract interpreted to a | graph, which is compiled to machine code, until it needs | to deoptimise at which point the metadata from the graph | is interpreted again, allowing it to jump back into the | original interpreter. | | But if it didn't have the JIT it'd always be an | interpreter running. | an1sotropy wrote: | Well, ok, but then isn't a CPU is also just an | interpreter, traversing the object code text of compiled | code? | chrisseaton wrote: | We don't normally call hardware or firmware | implementations an 'interpreter'. | | Almost all execution techniques include some combination | of compilation and interpretation. Even some ASTs include | aspects of transformation to construct them from the | source code, which we could call a compiler. Native | compilers sometimes have to interpret metadata to do | things like roll forward for deoptimisation. | | But most people in the field would describe CPython | firmly as an 'interpreter'. | zzzeek wrote: | I call it "bytecode interpreted" to distinguish it from | traditional parse-tree interpretation such as Perl 5 and | others | [deleted] | detaro wrote: | That's not misleading, that's standard terminology. an | interpreter using bytecode is still an interpreter. | mixmastamyk wrote: | Python is a bit more dynamic than JS, which makes it uniquely | hard to optimize. There is more improvement to be done | however and is being done. | chrisseaton wrote: | Right, but I think we know how to optimise all these | things. It's all solved problems. | mixmastamyk wrote: | A few things are impossible without changing/subsetting | the language. What I was trying to get at. | chrisseaton wrote: | What things are you thinking of? | | (Not trying to interrogate you or prove you wrong, but | I've got an interest in optimising very difficult meta- | programming patterns.) | mixmastamyk wrote: | Nearly everything (or is it everything?) in memory can be | modified at runtime. There are no real constants for | example. The whole stack top to bottom can be | monkeypatched on a whim. | | This means nothing is guaranteed and so every instruction | must do multiple checks to make sure data structures are | what is expected at the current moment. | | This is true of JS as well, but to a lesser extent. | chrisseaton wrote: | > so every instruction must do multiple checks | | Aren't all the things you mentioned already fixed by | deoptimisation? | | You assume constants cannot be modified, and then get the | code that wants to modify constants to do the work of | stopping everyone who is assuming a constant value, and | modify them that they need to pick up the new value? | | > To deoptimize means to jump from more optimised code to | less optimized code. In practice that usually means to | jump from just-in-time compiled machine code back into an | interpreter. If we can do this at any point, and if we | can perfectly restore the entire state of the | interpreter, then we can start to throw away those checks | in our optimized code, and instead we can deoptimize when | the check would fail. | | https://chrisseaton.com/truffleruby/deoptimizing/ | | I work on a compiler for Ruby, and mutable constants and | the ability to monkey patch etc adds literally zero extra | checks to optimised code. | mixmastamyk wrote: | No such thing as a constant in Python. You can optionally | name a variable in uppercase to signal to others that it | should be, but that's about it. | | You can write a new compiler if you'd like, as detailed | on this page. But CPython doesn't work that way and 99% | of the ecosystem is targeted there. | | There is some work on making more assumptions as it runs, | now that the project has funding. This is about where my | off-top-of-head knowledge ends however so someone else | will want to chime in here. The HN search probably has a | few blog posts and discussions as well. | cozzyd wrote: | I think it's more that cpython is so slow so a lot of | things people use are implemented using the C API, and | many optimizations will break a bunch of things. If | everything was pure python the situation would be | different. | pmarreck wrote: | Have they cleaned up Python's packaging/dependency problem yet? | chrisseaton wrote: | > for free ... this was a problem as a number of my tests rely on | this [incompatible behaviour] | meadsteve wrote: | Maybe I should have said for cheap | mrtranscendence wrote: | Nice to see this. Do they have a project roadmap for mypyc? | | Doubling performance is nice, though it does still leave a lot of | performance on the table. I'd be curious to see a comparison | between this and Cython. ___________________________________________________________________ (page generated 2022-09-27 23:01 UTC)