[HN Gopher] PyO3: Rust Bindings for the Python Interpreter
       PyO3: Rust Bindings for the Python Interpreter
       Author : batterylow
       Score  : 255 points
       Date   : 2021-01-29 12:17 UTC (10 hours ago)
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
       | mleonhard wrote:
       | I'm interested in running Python inside wasmtime. I think PyO3
       | would be useful. We could build a small Rust wasm binary that
       | exports an "execute_python_script" function. This would finally
       | be a way to run Python in a strong sandbox with memory [0] and
       | CPU [1] restrictions. (In 1999, I asked Guido for sandboxing
       | support in Python, but he refused.)
       | [0] https://github.com/bytecodealliance/wasmtime/issues/2273
       | [1] https://github.com/bytecodealliance/wasmtime/issues/2274
       | minimaxir wrote:
       | Huggingface Tokenizers
       | (https://github.com/huggingface/tokenizers), which are now used
       | by default in their Transformers Python library, use pyO3 and
       | became popular due to the pitch that it encoded text an order of
       | magnitude faster with zero config changes.
       | It lives up to that claim. (I had issues with return object
       | typing when going between Python/Rust at first but those are more
       | consistent now)
       | adsharma wrote:
       | There is another way to speed up python:
       | Write code in python and transpile to another language (could be
       | rust) and then import it back into python
       | https://github.com/adsharma/py2many/tree/main/tests/expected
       | Figuring out a mapping between a subset of a compiled language
       | and a subset of statically typed python should be possible.
       | The hard part is mapping standard library. I suspect something
       | like nim might have an advantage there.
       | gukoff wrote:
       | With PyO3, I built the library to parse datetimes 10x faster than
       | `datetime.strptime` in just a few lines of code:
       | https://github.com/gukoff/dtparse
       | It just calls the Rust's chrono library that does the parsing and
       | wraps the result in a Python object. You can do it for any Rust
       | library, it's very, very easy!
       | The only slightly complicated part is the distribution. You need
       | to use https://github.com/PyO3/maturin or
       | https://github.com/PyO3/setuptools-rust, and of course, you need
       | to have Rust installed on the wheel-building machine.
       | Feel free to use this repo as a reference if you want to build a
       | similar thing. The code is commented, and there's a working
       | GitHub action that builds the wheels for all platforms and
       | uploads them to PyPi:
       | https://github.com/gukoff/dtparse/tree/master/.github/workfl...
         | JPKab wrote:
         | Thank you thank you thank you!
         | I was looking at PyO3 a few months ago, after discovering the
         | orjson python (with rust inside) library and radically speeding
         | up an auto-ML app for work.
         | I really enjoyed starting to learn Rust, but found the process
         | to embed in Python to be rather intimidating. Looking forward
         | to using your repo as a reference, and love the dtparse work
         | you've done.
         | Rotareti wrote:
         | This is awesome, thanks for sharing! I think this should be
         | added to the PyO3 examples list :)
         | https://github.com/PyO3/pyo3#examples
         | japhyr wrote:
         | I was surprised to find out how slow strptime() can be. I was
         | working on a data-focused project that was finally starting to
         | slow down from the growing volume of data. I was looking at
         | river heights over time, and once I hit about 140,000 data
         | points the project got slow enough to make some profiling and
         | optimization worthwhile. I was quite surprised to find it was
         | spending more than two full seconds just running strptime(),
         | out of a total execution time of around 15 seconds.
         | I ended up looking at a bunch of different ways of processing
         | timestamps in Python: strptime(), string parsing, regex,
         | datetime.isoformat(), NumPy, Pandas, and more. I got a 46x
         | speedup using datetime.isoformat(). Other approaches got
         | anywhere from 4x to 40x speedup, and a couple approaches were
         | an order of magnitude slower than strptime().
         | My takeaway was there's no substitute for profiling the actual
         | code you're running, and focusing on the specific bottlenecks
         | in your own project. I wrote this up in a blog post if anyone's
         | interested, "What's faster than strptime()?"
         | https://ehmatthes.com/blog/faster_than_strptime/
         | mrcarruthers wrote:
         | how does it compare against ciso8601 perf-wise?
         | https://pypi.org/project/ciso8601/
         | to be fair ciso8601 only parses iso8601 datetimes, but that's
         | enough for 90%+ of my use cases.
         | throwaway894345 wrote:
         | I'm very curious to hear the use case for which date time
         | parsing was the bottleneck! Also, I'm surprised that the
         | overhead of calling across the language boundary didn't dwarf
         | the gains from parsing...
           | pbecotte wrote:
           | I've certainly never been bottlenecked on date parsing :)
           | However, many/most of the high performance python libraries
           | are built in C code, and compiled down into something the
           | python interpreter can use directly. There are lots of python
           | bindings written in c++ to native c libraries as well, I know
           | I have used ZeroMQ pretty recently. Rust is done the same
           | way- the code is compiled down into objects that Python can
           | use directly- its not like running a javascript interpreter
           | in your code.
           | oblvious-earth wrote:
           | I've had this situation a few times. Most recently
           | transforming large (1-50 GB) CSV files in to a format that
           | can be digested by a proprietary bulk DB loader.
           | Because our problem was just about reformatting we ended up
           | reading the CSVs in binary mode and using struct to extract
           | the relevant values from the date time fields. But if we
           | needed to do actual date logic something like this would
           | perhaps be useful (but there other fast date time libraries
           | out there, I've been a fan of pendulum for some tasks).
             | throwaway894345 wrote:
             | That makes sense, but I have a hard time believing the
             | approach of calling into a date time parser O(n) times is
             | going to yield a significant performance gain no matter how
             | much faster the parser is. However, I'm being downvoted, so
             | perhaps I'm mistaken?
               | brundolf wrote:
               | Maybe they did it in bulk? i.e. send all the strings over
               | at once, parse them in a loop, send them back. Seems like
               | that would reduce overhead
               | throwaway894345 wrote:
               | Right, and that makes sense, but the context here is a
               | date parsing library for Python--unless said library has
               | a batch interface, I'm not sure how that would improve
               | performance, but maybe I'm misestimating something.
               | brundolf wrote:
               | Ah, I skimmed over the part where this is a library and
               | not application-code
               | lincolnq wrote:
               | My instinct is that the overhead is small. You need to
               | add a few C stack frames and do some string conversion on
               | each call, maybe an allocation to store the result. It's
               | not going to be as quick as doing in pure Rust, but the
               | python-to-native code layer can be pretty lightweight I
               | think!
               | oblvious-earth wrote:
               | Sometimes it's about optimizing wall time not algorithmic
               | complexity.
               | If you have a batch SLA of 1 hour, and your currently
               | spending 50-70 mins to complete the batch and 20 minutes
               | of that time is spent date parsing and you can reduce it
               | to 5 minutes that's an big win.
               | throwaway894345 wrote:
               | No doubt, but if your date parsing saves you 1 second per
               | date parsed but each call into the faster library costs 2
               | seconds, then your performance actually suffers. The only
               | way around this is to make a batch call such that the
               | overhead is O(1).
               | minitech wrote:
               | I'm not going to install it to check, but when someone
               | writes "Fast datetime parser for Python written in Rust.
               | Parses 10x-15x faster than datetime.strptime." it seems
               | reasonable to assume that this is not the case.
               | throwaway894345 wrote:
               | Depends on whether or not the parent is including the
               | overhead in their statistic. Misinformation about
               | microbenchmarks is hardly a rarity.
         | dmw_ng wrote:
         | Another cheap trick if the time column is sequential is to
         | split the string into date and time components, cache the date
         | part and calculate the time part just with some multiplication
         | Major caveat is timezone handling, but this only applies in a
         | subset of situations
           | quietbritishjim wrote:
           | If you've got to that point of modifying the storage format
           | then you might as well just use an integer (microseconds
           | success the epoch) and be done with it. That seems cleaner
           | than using a string (or two strings) anyway.
       | adkadskhj wrote:
       | I needed Blender integration a while back and wasn't sure what i
       | could write it in. Py03 worked great with Blender with no
       | configuration. I was quite concerned that something about the
       | Python-embedded-Blender behavior would limit Py03.. but nope, so
       | far it's worked flawlessly.
       | Thanks Py03 team :)
       | mynameisash wrote:
       | At work, I'm using PyO3 for a project that churns through a lot
       | of data (step 1) and does some pattern mining (step 2). This is
       | the second generation of the project and is on-demand compared
       | with the large, batch project in Spark that it is replacing. The
       | Rust+Python project has really good performance, and using Rust
       | for the core logic is such a joy compared with Scala or Python
       | that a lot of other pieces are written in.
       | Learning PyO3, I cobbled together a sample project[0] to
       | demonstrate how some functionality works. It's a little outdated
       | (uses PyO3 0.11.0 compared with the current 0.13.1) and doesn't
       | show everything, but I think it's reasonably clear.
       | One thing I noticed is that passing very large data from Rust and
       | into Python's memory space is a bit of a challenge. I haven't
       | quite grokked who owns what when and how memory gets correctly
       | dropped, but I think the issues I've had are with the amount of
       | RAM used at any moment and not with any memory leaks.
       | [0] https://github.com/aeshirey/CheeseShop
       | fulafel wrote:
       | Previously (2017): https://news.ycombinator.com/item?id=14859844
       | LockAndLol wrote:
       | If this works well, I'd rather use this over being forced to use
       | type hints and mypy.
       | Has anybody used this in conjunction with a python framework?
       | Django, fastapi or something?
         | uranusjr wrote:
         | Uh, how do you plan to use FastAPI while avoiding type hints?
         | edenhyacinth wrote:
         | I have! Used FastAPI as a frontend to do some minor data
         | modification, and passed the data for model inference in Rust.
         | Works really nicely, although given how little work I'm doing
         | in the Python side I honestly prefer using Rocket instead of
         | FastAPI and then using pyo3 to call the Python library in Rust,
         | rather than the other way around.
           | LockAndLol wrote:
           | Thanks for the response. That does sound pretty much like
           | what I would like to do. Have you by any chance open-sourced
           | your project?
           | I'm new to rust, but I'll check out Rocket. Cheers
         | pansa2 wrote:
         | How would PyO3 help you avoid type hints and mypy?
           | brundolf wrote:
           | I think the idea is that they move their business logic to
           | the Rust code, since Rust's type system is more powerful and
           | more sound, instead of trying to make do with MyPy
             | zerkten wrote:
             | Wouldn't it be more of a priority to move it for lower
             | memory use and higher request speed? A better type system
             | is good, but often these are a struggle with scaling
             | interpreted languages compared to other lower level
             | languages.
               | brundolf wrote:
               | For many people the primary appeal of Rust is its type
               | system and related features (declaring deep immutability,
               | pattern-matching, etc)
               | > often these are a struggle with scaling interpreted
               | languages compared to other lower level languages
               | Not sure what's meant by this
           | LockAndLol wrote:
           | It would minimize the python surface required to be covered
           | with type-hints and mypy. If possible, one could simply point
           | django to the modules generated from rust.
           | I'll give it a shot tonight and see how it goes. Now I'm
           | curious.
       | edeion wrote:
       | That's a really great name you came up with! Embodies both parts
       | of your focus, stays pronounceable. Does the 3 relate to the
       | Python version or are you mimicking some specific molecule that I
       | can't think of?
         | [deleted]
         | SnowflakeOnIce wrote:
         | My guess is that the name is derived from the `-O3` compiler
         | optimization level from many compilers.
           | fafhrd91 wrote:
           | name was chosen after `uranium trioxide`, pythonium trioxied
           | - pyo3
             | chc wrote:
             | If you're trying to figure out the origin of a Rust
             | project's name, the safest bet is always to choose the one
             | that's a reference to metal.
               | fafhrd91 wrote:
               | i am original author of pyo3. Yuri Selivanov (author of
               | uvloop and edgedb) suggested pyo3 name.
               | chc wrote:
               | Oh, I know, I wasn't trying to correct you or anything. I
               | was just adding on to the correct answer to point out
               | that PyO3's naming scheme is part of a popular trend in
               | Rust libraries.
         | batterylow wrote:
         | It's indeed a cool name, but it's not my doing (this isn't a
         | Show HN)!
         | smlckz wrote:
         | Py (iv)                             O          O = Py < |
         | O
         | or Py (vi)                        O             ||         O =
         | Py             ||              O
         | or Py (ii)                        O         Py <   > O
         | O
         | heh!
           | auscompgeek wrote:
           | I think you might be missing an oxygen atom there.
         | Swenrekcah wrote:
         | I would guess it is derived from:
         | https://en.wikipedia.org/wiki/Iron(III)_oxide
           | smlckz wrote:
           | But that's Fe_2 O_3 !
             | ziml77 wrote:
             | I think calling it Py2O3 would be a bit confusing though.
               | smlckz wrote:
               | Just PyO or Py_3 O_4 could have been used as well, does
               | not matter that much.
           | OskarS wrote:
           | I thought it was like the compiler flag, -O3. "With full
           | optimization", basically.
       | benecollyridam wrote:
       | Another related project: Wasmtime and Rust+Python
       | Compile your Rust code to wasm to circumvent having to compile
       | for different architectures.
       | https://docs.wasmtime.dev/wasm-rust.html
       | ksm1717 wrote:
       | Between pyodide, pyo3, rust-cpython, and rustpython, I think Pyo3
       | is the best way to drop in rust in a python project for a speed
       | up, if that is your goal. Some of the demos show using python
       | from rust, but to me the biggest feature is without a doubt
       | compiling rust code to native python modules. I'm using it to
       | speed up image manipulation backed by numpy arrays.
       | There's a setuptools rust [0] extension package that can be used
       | to hook the compilation of the rust into the wheel building or
       | install from source. Maturin [1] seems to be regarded as the new
       | and improved solution for this, but I found that it's angled
       | toward the using python from rust.
       | There's also the rust numpy [2] package by the same org which is
       | fantastic in that it lets you pass a numpy matrix to a native
       | method written in rust and convert it to the rust equivalent data
       | structure, perform whatever transformation you want (in parallel
       | using rayon [3]), and return the array. When building for
       | release, I was seeing speed ups of 100x over numpy on the most
       | matrix mathable function imaginable, and numpy is no joke.
       | I think there is a lot of potential for these two ecosystems
       | together. If there's not a python package for something, there's
       | probably a rust crate.
       | If anyone is interested the python package that I'm building with
       | some rust backend, its called pyrogis [4] for making custom image
       | manipulations through numpy arrays.
       | [0] https://github.com/PyO3/setuptools-rust
       | [1] https://github.com/PyO3/maturin
       | [2] https://github.com/PyO3/rust-numpy
       | [3] https://github.com/rayon-rs/rayon
       | [4] https://github.com/pierogis/pierogis
         | cycomanic wrote:
         | > Between pyodide, pyo3, rust-cpython, and rustpython, I think
         | Pyo3 is the best way to drop in rust in a python project for a
         | speed up, if that is your goal. Some of the demos show using
         | python from rust, but to me the biggest feature is without a
         | doubt compiling rust code to native python modules. I'm using
         | it to speed up image manipulation backed by numpy arrays.
         | > There's a setuptools rust [0] extension package that can be
         | used to hook the compilation of the rust into the wheel
         | building or install from source. Maturin [1] seems to be
         | regarded as the new and improved solution for this, but I found
         | that it's angled toward the using python from rust.
         | > There's also the rust numpy [2] package by the same org which
         | is fantastic in that it lets you pass a numpy matrix to a
         | native method written in rust and convert it to the rust
         | equivalent data structure, perform whatever transformation you
         | want (in parallel using rayon [3]), and return the array. When
         | building for release, I was seeing speed ups of 100x over numpy
         | on the most matrix mathable function imaginable, and numpy is
         | no joke.
         | What sort of algorithm was that? Generally getting 100x speedup
         | on vectorized code is highly unusual even using handcoded c++.
         | So I suspect it was quite loop heavy? In those cases I have
         | also seen very significant speed ups.
         | I have been using pythran [1] for speeding up my python code.
         | It generally achieves extremely good performance. I have
         | blogged about it here [2] and recently a member used pythran to
         | speed up some nbody benchmarks [3] which was used in an article
         | to argue for using compiled languages.
         | That said I find pyO3 quite exciting and have been
         | contemplating to try it with some of my projects. [1]
         | https://github.com/serge-sans-paille/pythran [2]
         | https://jochenschroeder.com/blog/articles/DSP_with_Python2/ [3]
         | https://github.com/paugier/nbabel
           | ksm1717 wrote:
           | Matrix of shape (rows, columns, 3). Average the last dim for
           | each point and change it to [0,0,0] if average less than a
           | value, [255,255,255] if greater. A brightness threshold. May
           | be remembering the speed up factor wrong so take it with a
           | grain of salt - fact of the matter is it was very impressive.
           | I'm checking out that post later, I'm trying to make my
           | package easy to build on, so being able to write extensions
           | with Pythran would be another great option for speed ups.
           | Thanks
             | cycomanic wrote:
             | Just for the fun of it I tested what speed up I could get
             | with a naive algorithm and pythran. Based on your
             | description it looks like the I should do the following:
             | def threshold_pixel(img, thr): out = np.zeros_like(img) o =
             | np.mean(img, axis=-1) out[o>thr] = 255 return out
             | This runs in ~30ms for a (1024,1024,3) array using numpy on
             | my machine. Using pythran (note I had to explicitely write
             | out the loop for out[o>thr] =255, due to a bug, that I
             | found and just reported), I get a speed of 6.ms (with
             | openmp) and 9ms without (I did not tune the openmp, but
             | this should yield a much higher speedup).
             | P.S.: Just had a look at your project, very cool, I have to
             | try that
       | pansa2 wrote:
       | Related: RustPython - A Python interpreter written in Rust.
       | https://github.com/RustPython/RustPython
       | bluedays wrote:
       | Without looking at it I wonder if it's using the Python language
       | underneath, or the python vm. Either way this is pretty cool.
         | Nvorzula wrote:
         | Precisely, this is Rust that compiles to a C FFI that plugs
         | into CPython.
       | itamarst wrote:
       | I've been playing with PyO3 for prototyping, and wrapped some
       | Rust code to see if it's faster than Python. The experience was
       | very much like using Boost Python (whcih these days has
       | alternative with https://github.com/pybind/pybind11). It's
       | _really_ easy to wrap code for Python, and it has nice APIs to
       | ensure GIL is held. Being Rust, I'm much more confident I won't
       | suffer from memory unsafety issues which my C++ at the time did.
       | Now I'm starting to use it as part of the Python memory profiler
       | I'm working on (https://pythonspeed.com/fil), in this case to
       | call in to the low-level Python C API which PyO3 includes
       | bindings for in addition to its high-level API. This kind of
       | usage is more like writing C, except with the benefit of having
       | high-level APIs (for GIL holding, but also object conversion)
       | available when I need it.
       | So basically you get safe, high-level, easy-to-use APIs, with
       | fallback to low-level unsafe APIs if you need them.
       | Highly recommend trying it out.
         | JPKab wrote:
         | Was just checking out your fil project. It looks really useful,
         | and I dig the jupyter kernel as well.
           | itamarst wrote:
           | Thank you! If you have any questions/problems/ideas, please
           | reach out via GitHub or email (itamar@pythonspeed.com).
         | brundolf wrote:
         | What's the data-conversion overhead look like at the boundary?
         | Which data structures can be passed back and forth without a
         | full clone, etc?
           | itamarst wrote:
           | There's definitely a conversion cost. For strings, Python
           | apparently caches the UTF-8 encoded string, so if you
           | _repeatedly_ transfer it to Rust I suspect (but haven't
           | checked) that the cost is much lower.
           | In general I suspect it's the usual "NumPy arrays are fast,
           | everything else you better be getting a sufficiently large
           | boost from the low-level code to justify conversion".
           | For the thing I prototyped in Rust, it was wrapping the
           | `ahocorasick` crate which was in fact faster than
           | `pyahocorasick` which is written in C or Cython or something.
           | Both have similar conversion costs, probably, so it came down
           | to "for lots of data the Rust version was faster".
             | burntsushi wrote:
             | Be sure to use auto configuration to get it to go even
             | faster, depending on your use case: https://docs.rs/aho-
             | corasick/0.7.15/aho_corasick/struct.AhoC...
             | Or just be sure to enable the DFA option if you can afford
             | it. It looks like the Python library is just the standard
             | NFA algorithm.
               | itamarst wrote:
               | Yeah, I was using DFA.
               | Next step is trying alternative approach, but if that
               | alternative doesn't work I'm going to see about wrapping
               | your package for Python.
               | Thanks for all your work on it!
               | burntsushi wrote:
               | Nice! Reach out if there are any problems or if you need
               | something exposed in the API. Looking at the
               | pyahocorasick issue tracker, there are a number of
               | features/bugs that your wrapper package would resolve. :)
             | liuliu wrote:
             | NumPy also support conversions without copying. One thing I
             | haven't found good way to bridge between Python is the
             | pandas.DataFrame, it seems to be quite Python focused
             | object and iterating through DataFrame is particularly
             | slow.
               | itamarst wrote:
               | Internally Pandas often uses NumPy arrays, especially for
               | numeric data, so might be able to pass things that way in
               | some cases?
               | E.g. `df["column_name"].values` will you get you a NumPy
               | array.
         | shirakawasuna wrote:
         | Sounds great! Would so much rather drop into Rust than C or
         | C++.
       | dbrgn wrote:
       | If you're interested in publishing Rust libraries as Python
       | packages (or integrating Rust code into an existing Python
       | package), check out https://github.com/PyO3/maturin and
       | https://github.com/PyO3/setuptools-rust.
         | edenhyacinth wrote:
         | Been using Maturin for a little while professionally, and it's
         | surprisingly good. There's a few bugbears here and there - I
         | haven't found a way to have Cargo Test & a pyo3 library working
         | at the same time - but overall it's a lot more pleasant than
         | working with Rust and R was.
       (page generated 2021-01-29 23:00 UTC)