[HN Gopher] Making Python fast - Adventures with mypyc
       ___________________________________________________________________
        
       Making Python fast - Adventures with mypyc
        
       Author : meadsteve
       Score  : 177 points
       Date   : 2022-09-27 12:42 UTC (10 hours ago)
        
 (HTM) web link (blog.meadsteve.dev)
 (TXT) w3m dump (blog.meadsteve.dev)
        
       | meadsteve wrote:
       | I recently experimented with using mypyc to make some of my
       | python a little faster. I was pleasantly surprised with how well
       | it worked for very little code change so I thought I'd share my
       | experiences.
       | 
       | The blog post wanders around a little because I had to add
       | setuptools and wheel building as my project had previously
       | skipped this.
        
         | 4140tm wrote:
         | I just found out about Lagom from this blog post and it's
         | exactly what I have been looking for.
         | 
         | All other Python options I've seen feel too involved or leak
         | too much into your code. Lagom seems to balance everything just
         | right.
         | 
         | Thank you!
        
           | cinntaile wrote:
           | Haha, I can imagine Steve is quite pleased with this comment.
           | You should look up the meaning of the (Swedish) word lagom.
        
             | meadsteve wrote:
             | Thanks 4140tm and thanks cinntaile. I was very pleased.
             | That was very much the intention of the name
        
       | raphaelrk wrote:
       | I recently benchmarked "numpy vs js" matrix multiplication
       | performance, and was surprised to find js significantly
       | outperforming numpy. For multiplying two 512x512 matrices:
       | python           numpy:               ~3.30ms           numpy
       | with numba:    ~2.90ms              node           tfjs:
       | ~1.00ms           gpu.js:              ~4.00ms           ndarray:
       | ~118.00ms           vanilla loop:      ~138.00ms
       | mathjs:           ~1876.00ms              browser           tfjs
       | webgpu:          ~.16ms           tfjs webgl:           ~.76ms
       | tfjs wasm:           ~2.51ms           gpu.js:
       | ~6.00ms           tfjs cpu:          ~244.65ms           mathjs:
       | ~3469.00ms              c           accelerate:           ~.06ms
       | 
       | Source here: https://github.com/raphaelrk/matrix-mul-test
        
       | bobsmooth wrote:
       | What's with this fascination with making python fast? It's not
       | supposed to be fast, it's supposed to be simple. If you want
       | speed use a compiled language. Trying to make python fast is like
       | trying to strap a turbocharger to a tricycle.
        
         | alanwreath wrote:
         | I agree with you -- but I also don't say no to free food.
         | 
         | I mean regardless of whether mypy was going to make my code run
         | faster I would have used it for the shear confidence it gives
         | wrt to my code correctness. The fact that I can use that same
         | code (untouched) to speed it up... that's just means I get to
         | have my cake and eat it too :P
        
           | meadsteve wrote:
           | Yeah this is exactly it for me. I already had type
           | annotations and ran mypy to help with correctness. And I
           | tried this out because it felt like a nice thing to get for
           | free.
        
       | intrepidhero wrote:
       | I like the concept of using mypyc to leverage type hints to
       | compile python. But I was pretty frustrated recently when I got
       | bit by a bug in mypyc[1] while trying to use black. Especially
       | since I wasn't using mypyc myself and so didn't realize it was
       | even in my dependency tree. Beware adding "alpha" quality
       | software as a dependency to your supposedly production ready
       | tool.
       | 
       | [1] https://github.com/psf/black/issues/2846
        
       | BiteCode_dev wrote:
       | Mind you, it still requires to have a c compiler, to be installed
       | separately. It's very easy on linux, but a x-code install on mac,
       | and can be fiddling on windows.
       | 
       | Still nice, but not like golang or rust where you have a stand
       | alone solution.
       | 
       | It's an alternative to nuitka, which I recommend to try out.
        
       | atoav wrote:
       | Anybody using Python and Rust should also check out maturin and
       | pyo3. I run some (non public) Python modules created in Rust and
       | both the performance and the testability is stellar.
        
         | meadsteve wrote:
         | Yeah these are great approaches too. I'd actually considered a
         | rewrite of the core in rust before I went with mypyc. But it
         | was nice not to have to do a rewrite.
        
           | atoav wrote:
           | Totally understandable. More options are better anyways.
        
         | jblindsay wrote:
         | I have the exact same experience. Both Maturin and PyO3 have
         | been a game changer for the work that I have been doing lately.
         | It works so seamlessly.
        
         | kodablah wrote:
         | We built the logic backing the Temporal Python SDK[0] in Rust
         | and leverage PyO3 (and PyO3 Asyncio). Unfortunately Maturin
         | didn't let us do some of the advanced things we needed to do
         | for wheel creation (at the time, unsure now), so we use
         | setuptools-rust with Poetry.
         | 
         | 0 - https://github.com/temporalio/sdk-python
        
           | atoav wrote:
           | I had no issues with the standard maturin way of building
           | wheels - but my requirements were not special at all. I also
           | did this maybe 5 months ago, so maybe it has indeed gotten
           | better, I cannot tell.
        
       | wcdolphin wrote:
       | Is anyone else using MyPyC in production and can share their
       | experience? Did you attempt the compile it all approach, or
       | incrementally add? What do compile times look like at scale?
       | 
       | Would love to buy you a coffee and hear about your experience and
       | the challenges and benefits.
        
       | bsenftner wrote:
       | Worth mentioning Taichi, a high-performance parallel programming
       | language embedded in Python. I've experimented with it a bit, and
       | high-performance is very true. One can pretty much just write
       | ordinary Python, plus enhancing existing Python is not that
       | difficult either.
       | 
       | From their docs:
       | 
       | You can write computationally intensive tasks in Python while
       | obeying a few extra rules imposed by Taichi to take advantage of
       | the latter's high performance. Use decorators @ti.func and
       | @ti.kernel as signals for Taichi to take over the implementation
       | of the tasks, and Taichi's just-in-time (JIT) compiler would
       | compile the decorated functions to machine code. All subsequent
       | calls to them are executed on multi-CPU cores or GPUs. In a
       | typical compute-intensive scenario (such as a numerical
       | simulation), Taichi can lead to a 50x~100x speed up over native
       | Python code.
       | 
       | Taichi's built-in ahead-of-time (AOT) system also allows you to
       | export your code as binary/shader files, which can then be
       | invoked in C/C++ and run without the Python environment.
       | 
       | https://www.taichi-lang.org/
        
       | kingkongjaffa wrote:
       | Can this work with pyinstaller to make an executable faster?
        
         | Cyphase wrote:
         | I can't see why not. I've packaged some complex dependencies
         | with PyInstaller - on Windows. There is always a way. This
         | wouldn't even be particularly difficult.
        
       | ok_dad wrote:
       | I'll be that guy who says I love Python but it's been shoved into
       | too many spaces now. It's been a great tool for me for writing
       | things that require a lot of I/O and aren't CPU bound.
       | 
       | I am even rethinking that now because I was able to write a
       | program in Go with an HTTP API and using JSON as the usual API
       | interchange format in one night (all stdlib too), and it was so
       | easy that I plan to pitch using it for several services we need
       | to rewrite at work that are currently in Python. That would be
       | very similar to what I wrote in a day.
       | 
       | If Python doesn't fix their packaging, performance, and the
       | massive expansion in the language, I think it's going to start
       | losing ground to other languages.
        
       | rkrzr wrote:
       | I didn't know that you can compile individual modules with mypyc.
       | That's very interesting since it allows a gradual adoption of the
       | compiler, which really helps with big codebases.
       | 
       | Do you know if there are any requirements for which modules can
       | be compiled? E.g. can they be imported in other modules or do
       | they have to be a leaf in the import tree/graph ?
        
         | traverseda wrote:
         | Having read through the docs Mypyc has a concept of "native
         | classes" and python classes, and it looks like you can use a
         | "native" (compiles) class from regular python and vice-versa.
         | 
         | So my reading is that it should be pretty seamless.
        
       | an1sotropy wrote:
       | I'm curious how to compare this with a PyPy FAQ:
       | https://doc.pypy.org/en/latest/faq.html#would-type-annotatio...
       | which describes a bit about why type hints aren't as helpful to
       | optimize code under PyPy as one (including myself) might think.
       | 
       | Can someone explain more about how mypyc is in a better position
       | to produce better optimizations than pypy, or am I confused about
       | this?
        
         | detaro wrote:
         | pypy argues that considering type annotations gives them less
         | useful data than their existing tracing does, and thus pypy
         | wouldn't be faster if it considered them. Something like mypyc
         | by design has no chance of doing tracing, and thus has to work
         | with annotations. (I also don't see where you get the claim
         | from that that mypyc has better optimizations than pypy? But
         | the two also follow different designs, so they might be good at
         | different things)
        
           | an1sotropy wrote:
           | sorry I didn't mean to claim that mypyc does have better
           | optimizations, I meant to be asking if that was possible. My
           | superficial read was: this post about mypyc goes from type
           | hints to compiling to "faster", and then I remembered the
           | pypy FAQ which says type hints didn't help with that.
           | 
           | But if mypyc has no runtime information to go on (which pypy
           | does have), then certainly having some type information is
           | better than having none.
        
       | peterkelly wrote:
       | mypyc is cool and all, but I can't help thinking about how Node
       | just JITs everything automatically without the need for any
       | special steps like this.
        
         | BiteCode_dev wrote:
         | That's what Microsoft is paying Guido for, for the next
         | versions of python.
        
           | chrisseaton wrote:
           | I think that's not really the plan - they're talking about
           | just basic template compilation, nothing like V8
           | https://github.com/markshannon/faster-
           | cpython/blob/master/pl....
        
         | chrisseaton wrote:
         | That's not Node - that's V8. And it's possible to do the same
         | thing for Python - there's nothing magic about JavaScript
         | compared to Python - it's just a lot of engineering work to do
         | it, which is beyond what this project's scope is. PyPy does it,
         | but not inside standard Python.
        
           | peterkelly wrote:
           | I'm well aware of V8 and pypy. I also really like Python as a
           | language, especially with mypy.
           | 
           | It just makes me sad that in a world with multiple high-
           | performance JIT engines (including pypy, for Python itself),
           | the standard Python version that most people use is an
           | interpreter. I know it's largely due to compatibility reasons
           | (C extensions being deeply intertwined with CPython's API).
           | 
           | There _is_ a really important (if not  "magic") difference
           | between JavaScript and Python. JS has always (well, since IE
           | added support) been a language with multiple widely-used
           | implementations in the wild, which has prevented the
           | emergence of a third-party package ecosystem which is heavily
           | tied to one particular implementation. Python on the other
           | hand is for a large proportion of the userbase considered
           | CPython, with alternate implementations being second class
           | citizens, despite some truly impressive efforts on the
           | latter.
           | 
           | The fact that packages written in JS are not tied to (or at
           | least work best with) a single implementation is also what
           | made it possible for developers of JS engines to experiment
           | with different implementation approaches, including JIT.
           | While I'm not intimately familiar with writing native
           | extension modules for Node (having dabbled only a little), my
           | understanding is the API surface is much narrower than
           | Python, allowing for changes in the engine that avoid
           | breaking APIs. But there is less need for native modules in
           | JS, because of the presence of JIT in all major engines.
        
             | mkoubaa wrote:
             | This is in the process of being addressed - look into the
             | HPy project
        
             | zzzeek wrote:
             | > It just makes me sad that in a world with multiple high-
             | performance JIT engines (including pypy, for Python
             | itself), the standard Python version that most people use
             | is an interpreter. I know it's largely due to compatibility
             | reasons (C extensions being deeply intertwined with
             | CPython's API).
             | 
             | this is misleading, if one sees the phrase "interpreter" as
             | that code is represented as syntax-derived trees or other
             | datastructures which are then traversed at runtime to
             | produce results - someone correct me if I'm wrong but this
             | would apply to well known interpreted languages like Perl
             | 5. cPython is a _bytecode_ interpreter, not conceptually
             | unlike the Java VM before JITs were added. It just happens
             | to compile scripts to bytecode on the fly.
        
               | chrisseaton wrote:
               | Bytecode is just another data structure that you traverse
               | at runtime to produce results. It's a postfix
               | transformation of the AST. It's still an interpreter.
        
               | zzzeek wrote:
               | so you'd call the pre-JIT JVM an "interpreter" and you'd
               | call Java an interpreted language?
        
               | chrisseaton wrote:
               | > so you'd call the pre-JIT JVM an "interpreter"
               | 
               | Yeah? I think almost everyone would?
               | 
               | > and you'd call Java an interpreted language?
               | 
               | Java is interpreted in many ways, and compiled in many
               | ways, as I said it's complicated. It's compiled to
               | bytecode, which is interpreted until it's time to be
               | compiled... at which point it's abstract interpreted to a
               | graph, which is compiled to machine code, until it needs
               | to deoptimise at which point the metadata from the graph
               | is interpreted again, allowing it to jump back into the
               | original interpreter.
               | 
               | But if it didn't have the JIT it'd always be an
               | interpreter running.
        
               | an1sotropy wrote:
               | Well, ok, but then isn't a CPU is also just an
               | interpreter, traversing the object code text of compiled
               | code?
        
               | chrisseaton wrote:
               | We don't normally call hardware or firmware
               | implementations an 'interpreter'.
               | 
               | Almost all execution techniques include some combination
               | of compilation and interpretation. Even some ASTs include
               | aspects of transformation to construct them from the
               | source code, which we could call a compiler. Native
               | compilers sometimes have to interpret metadata to do
               | things like roll forward for deoptimisation.
               | 
               | But most people in the field would describe CPython
               | firmly as an 'interpreter'.
        
               | zzzeek wrote:
               | I call it "bytecode interpreted" to distinguish it from
               | traditional parse-tree interpretation such as Perl 5 and
               | others
        
               | [deleted]
        
               | detaro wrote:
               | That's not misleading, that's standard terminology. an
               | interpreter using bytecode is still an interpreter.
        
           | mixmastamyk wrote:
           | Python is a bit more dynamic than JS, which makes it uniquely
           | hard to optimize. There is more improvement to be done
           | however and is being done.
        
             | chrisseaton wrote:
             | Right, but I think we know how to optimise all these
             | things. It's all solved problems.
        
               | mixmastamyk wrote:
               | A few things are impossible without changing/subsetting
               | the language. What I was trying to get at.
        
               | chrisseaton wrote:
               | What things are you thinking of?
               | 
               | (Not trying to interrogate you or prove you wrong, but
               | I've got an interest in optimising very difficult meta-
               | programming patterns.)
        
               | mixmastamyk wrote:
               | Nearly everything (or is it everything?) in memory can be
               | modified at runtime. There are no real constants for
               | example. The whole stack top to bottom can be
               | monkeypatched on a whim.
               | 
               | This means nothing is guaranteed and so every instruction
               | must do multiple checks to make sure data structures are
               | what is expected at the current moment.
               | 
               | This is true of JS as well, but to a lesser extent.
        
               | chrisseaton wrote:
               | > so every instruction must do multiple checks
               | 
               | Aren't all the things you mentioned already fixed by
               | deoptimisation?
               | 
               | You assume constants cannot be modified, and then get the
               | code that wants to modify constants to do the work of
               | stopping everyone who is assuming a constant value, and
               | modify them that they need to pick up the new value?
               | 
               | > To deoptimize means to jump from more optimised code to
               | less optimized code. In practice that usually means to
               | jump from just-in-time compiled machine code back into an
               | interpreter. If we can do this at any point, and if we
               | can perfectly restore the entire state of the
               | interpreter, then we can start to throw away those checks
               | in our optimized code, and instead we can deoptimize when
               | the check would fail.
               | 
               | https://chrisseaton.com/truffleruby/deoptimizing/
               | 
               | I work on a compiler for Ruby, and mutable constants and
               | the ability to monkey patch etc adds literally zero extra
               | checks to optimised code.
        
               | mixmastamyk wrote:
               | No such thing as a constant in Python. You can optionally
               | name a variable in uppercase to signal to others that it
               | should be, but that's about it.
               | 
               | You can write a new compiler if you'd like, as detailed
               | on this page. But CPython doesn't work that way and 99%
               | of the ecosystem is targeted there.
               | 
               | There is some work on making more assumptions as it runs,
               | now that the project has funding. This is about where my
               | off-top-of-head knowledge ends however so someone else
               | will want to chime in here. The HN search probably has a
               | few blog posts and discussions as well.
        
               | cozzyd wrote:
               | I think it's more that cpython is so slow so a lot of
               | things people use are implemented using the C API, and
               | many optimizations will break a bunch of things. If
               | everything was pure python the situation would be
               | different.
        
       | pmarreck wrote:
       | Have they cleaned up Python's packaging/dependency problem yet?
        
       | chrisseaton wrote:
       | > for free ... this was a problem as a number of my tests rely on
       | this [incompatible behaviour]
        
         | meadsteve wrote:
         | Maybe I should have said for cheap
        
       | mrtranscendence wrote:
       | Nice to see this. Do they have a project roadmap for mypyc?
       | 
       | Doubling performance is nice, though it does still leave a lot of
       | performance on the table. I'd be curious to see a comparison
       | between this and Cython.
        
       ___________________________________________________________________
       (page generated 2022-09-27 23:01 UTC)