[HN Gopher] PyO3: Rust Bindings for the Python Interpreter ___________________________________________________________________ PyO3: Rust Bindings for the Python Interpreter Author : batterylow Score : 255 points Date : 2021-01-29 12:17 UTC (10 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | mleonhard wrote: | I'm interested in running Python inside wasmtime. I think PyO3 | would be useful. We could build a small Rust wasm binary that | exports an "execute_python_script" function. This would finally | be a way to run Python in a strong sandbox with memory [0] and | CPU [1] restrictions. (In 1999, I asked Guido for sandboxing | support in Python, but he refused.) | | [0] https://github.com/bytecodealliance/wasmtime/issues/2273 | | [1] https://github.com/bytecodealliance/wasmtime/issues/2274 | minimaxir wrote: | Huggingface Tokenizers | (https://github.com/huggingface/tokenizers), which are now used | by default in their Transformers Python library, use pyO3 and | became popular due to the pitch that it encoded text an order of | magnitude faster with zero config changes. | | It lives up to that claim. (I had issues with return object | typing when going between Python/Rust at first but those are more | consistent now) | adsharma wrote: | There is another way to speed up python: | | Write code in python and transpile to another language (could be | rust) and then import it back into python | | https://github.com/adsharma/py2many/tree/main/tests/expected | | Figuring out a mapping between a subset of a compiled language | and a subset of statically typed python should be possible. | | The hard part is mapping standard library. I suspect something | like nim might have an advantage there. | gukoff wrote: | With PyO3, I built the library to parse datetimes 10x faster than | `datetime.strptime` in just a few lines of code: | https://github.com/gukoff/dtparse | | It just calls the Rust's chrono library that does the parsing and | wraps the result in a Python object. You can do it for any Rust | library, it's very, very easy! | | The only slightly complicated part is the distribution. You need | to use https://github.com/PyO3/maturin or | https://github.com/PyO3/setuptools-rust, and of course, you need | to have Rust installed on the wheel-building machine. | | Feel free to use this repo as a reference if you want to build a | similar thing. The code is commented, and there's a working | GitHub action that builds the wheels for all platforms and | uploads them to PyPi: | https://github.com/gukoff/dtparse/tree/master/.github/workfl... | JPKab wrote: | Thank you thank you thank you! | | I was looking at PyO3 a few months ago, after discovering the | orjson python (with rust inside) library and radically speeding | up an auto-ML app for work. | | I really enjoyed starting to learn Rust, but found the process | to embed in Python to be rather intimidating. Looking forward | to using your repo as a reference, and love the dtparse work | you've done. | Rotareti wrote: | This is awesome, thanks for sharing! I think this should be | added to the PyO3 examples list :) | | https://github.com/PyO3/pyo3#examples | japhyr wrote: | I was surprised to find out how slow strptime() can be. I was | working on a data-focused project that was finally starting to | slow down from the growing volume of data. I was looking at | river heights over time, and once I hit about 140,000 data | points the project got slow enough to make some profiling and | optimization worthwhile. I was quite surprised to find it was | spending more than two full seconds just running strptime(), | out of a total execution time of around 15 seconds. | | I ended up looking at a bunch of different ways of processing | timestamps in Python: strptime(), string parsing, regex, | datetime.isoformat(), NumPy, Pandas, and more. I got a 46x | speedup using datetime.isoformat(). Other approaches got | anywhere from 4x to 40x speedup, and a couple approaches were | an order of magnitude slower than strptime(). | | My takeaway was there's no substitute for profiling the actual | code you're running, and focusing on the specific bottlenecks | in your own project. I wrote this up in a blog post if anyone's | interested, "What's faster than strptime()?" | | https://ehmatthes.com/blog/faster_than_strptime/ | mrcarruthers wrote: | how does it compare against ciso8601 perf-wise? | https://pypi.org/project/ciso8601/ | | to be fair ciso8601 only parses iso8601 datetimes, but that's | enough for 90%+ of my use cases. | throwaway894345 wrote: | I'm very curious to hear the use case for which date time | parsing was the bottleneck! Also, I'm surprised that the | overhead of calling across the language boundary didn't dwarf | the gains from parsing... | pbecotte wrote: | I've certainly never been bottlenecked on date parsing :) | However, many/most of the high performance python libraries | are built in C code, and compiled down into something the | python interpreter can use directly. There are lots of python | bindings written in c++ to native c libraries as well, I know | I have used ZeroMQ pretty recently. Rust is done the same | way- the code is compiled down into objects that Python can | use directly- its not like running a javascript interpreter | in your code. | oblvious-earth wrote: | I've had this situation a few times. Most recently | transforming large (1-50 GB) CSV files in to a format that | can be digested by a proprietary bulk DB loader. | | Because our problem was just about reformatting we ended up | reading the CSVs in binary mode and using struct to extract | the relevant values from the date time fields. But if we | needed to do actual date logic something like this would | perhaps be useful (but there other fast date time libraries | out there, I've been a fan of pendulum for some tasks). | throwaway894345 wrote: | That makes sense, but I have a hard time believing the | approach of calling into a date time parser O(n) times is | going to yield a significant performance gain no matter how | much faster the parser is. However, I'm being downvoted, so | perhaps I'm mistaken? | brundolf wrote: | Maybe they did it in bulk? i.e. send all the strings over | at once, parse them in a loop, send them back. Seems like | that would reduce overhead | throwaway894345 wrote: | Right, and that makes sense, but the context here is a | date parsing library for Python--unless said library has | a batch interface, I'm not sure how that would improve | performance, but maybe I'm misestimating something. | brundolf wrote: | Ah, I skimmed over the part where this is a library and | not application-code | lincolnq wrote: | My instinct is that the overhead is small. You need to | add a few C stack frames and do some string conversion on | each call, maybe an allocation to store the result. It's | not going to be as quick as doing in pure Rust, but the | python-to-native code layer can be pretty lightweight I | think! | oblvious-earth wrote: | Sometimes it's about optimizing wall time not algorithmic | complexity. | | If you have a batch SLA of 1 hour, and your currently | spending 50-70 mins to complete the batch and 20 minutes | of that time is spent date parsing and you can reduce it | to 5 minutes that's an big win. | throwaway894345 wrote: | No doubt, but if your date parsing saves you 1 second per | date parsed but each call into the faster library costs 2 | seconds, then your performance actually suffers. The only | way around this is to make a batch call such that the | overhead is O(1). | minitech wrote: | I'm not going to install it to check, but when someone | writes "Fast datetime parser for Python written in Rust. | Parses 10x-15x faster than datetime.strptime." it seems | reasonable to assume that this is not the case. | throwaway894345 wrote: | Depends on whether or not the parent is including the | overhead in their statistic. Misinformation about | microbenchmarks is hardly a rarity. | dmw_ng wrote: | Another cheap trick if the time column is sequential is to | split the string into date and time components, cache the date | part and calculate the time part just with some multiplication | | Major caveat is timezone handling, but this only applies in a | subset of situations | quietbritishjim wrote: | If you've got to that point of modifying the storage format | then you might as well just use an integer (microseconds | success the epoch) and be done with it. That seems cleaner | than using a string (or two strings) anyway. | adkadskhj wrote: | I needed Blender integration a while back and wasn't sure what i | could write it in. Py03 worked great with Blender with no | configuration. I was quite concerned that something about the | Python-embedded-Blender behavior would limit Py03.. but nope, so | far it's worked flawlessly. | | Thanks Py03 team :) | mynameisash wrote: | At work, I'm using PyO3 for a project that churns through a lot | of data (step 1) and does some pattern mining (step 2). This is | the second generation of the project and is on-demand compared | with the large, batch project in Spark that it is replacing. The | Rust+Python project has really good performance, and using Rust | for the core logic is such a joy compared with Scala or Python | that a lot of other pieces are written in. | | Learning PyO3, I cobbled together a sample project[0] to | demonstrate how some functionality works. It's a little outdated | (uses PyO3 0.11.0 compared with the current 0.13.1) and doesn't | show everything, but I think it's reasonably clear. | | One thing I noticed is that passing very large data from Rust and | into Python's memory space is a bit of a challenge. I haven't | quite grokked who owns what when and how memory gets correctly | dropped, but I think the issues I've had are with the amount of | RAM used at any moment and not with any memory leaks. | | [0] https://github.com/aeshirey/CheeseShop | fulafel wrote: | Previously (2017): https://news.ycombinator.com/item?id=14859844 | LockAndLol wrote: | If this works well, I'd rather use this over being forced to use | type hints and mypy. | | Has anybody used this in conjunction with a python framework? | Django, fastapi or something? | uranusjr wrote: | Uh, how do you plan to use FastAPI while avoiding type hints? | edenhyacinth wrote: | I have! Used FastAPI as a frontend to do some minor data | modification, and passed the data for model inference in Rust. | | Works really nicely, although given how little work I'm doing | in the Python side I honestly prefer using Rocket instead of | FastAPI and then using pyo3 to call the Python library in Rust, | rather than the other way around. | LockAndLol wrote: | Thanks for the response. That does sound pretty much like | what I would like to do. Have you by any chance open-sourced | your project? | | I'm new to rust, but I'll check out Rocket. Cheers | pansa2 wrote: | How would PyO3 help you avoid type hints and mypy? | brundolf wrote: | I think the idea is that they move their business logic to | the Rust code, since Rust's type system is more powerful and | more sound, instead of trying to make do with MyPy | zerkten wrote: | Wouldn't it be more of a priority to move it for lower | memory use and higher request speed? A better type system | is good, but often these are a struggle with scaling | interpreted languages compared to other lower level | languages. | brundolf wrote: | For many people the primary appeal of Rust is its type | system and related features (declaring deep immutability, | pattern-matching, etc) | | > often these are a struggle with scaling interpreted | languages compared to other lower level languages | | Not sure what's meant by this | LockAndLol wrote: | It would minimize the python surface required to be covered | with type-hints and mypy. If possible, one could simply point | django to the modules generated from rust. | | I'll give it a shot tonight and see how it goes. Now I'm | curious. | edeion wrote: | That's a really great name you came up with! Embodies both parts | of your focus, stays pronounceable. Does the 3 relate to the | Python version or are you mimicking some specific molecule that I | can't think of? | [deleted] | SnowflakeOnIce wrote: | My guess is that the name is derived from the `-O3` compiler | optimization level from many compilers. | fafhrd91 wrote: | name was chosen after `uranium trioxide`, pythonium trioxied | - pyo3 | chc wrote: | If you're trying to figure out the origin of a Rust | project's name, the safest bet is always to choose the one | that's a reference to metal. | fafhrd91 wrote: | i am original author of pyo3. Yuri Selivanov (author of | uvloop and edgedb) suggested pyo3 name. | chc wrote: | Oh, I know, I wasn't trying to correct you or anything. I | was just adding on to the correct answer to point out | that PyO3's naming scheme is part of a popular trend in | Rust libraries. | batterylow wrote: | It's indeed a cool name, but it's not my doing (this isn't a | Show HN)! | smlckz wrote: | Py (iv) O O = Py < | | O | | or Py (vi) O || O = | Py || O | | or Py (ii) O Py < > O | O | | heh! | auscompgeek wrote: | I think you might be missing an oxygen atom there. | Swenrekcah wrote: | I would guess it is derived from: | https://en.wikipedia.org/wiki/Iron(III)_oxide | smlckz wrote: | But that's Fe_2 O_3 ! | ziml77 wrote: | I think calling it Py2O3 would be a bit confusing though. | smlckz wrote: | Just PyO or Py_3 O_4 could have been used as well, does | not matter that much. | OskarS wrote: | I thought it was like the compiler flag, -O3. "With full | optimization", basically. | benecollyridam wrote: | Another related project: Wasmtime and Rust+Python | | Compile your Rust code to wasm to circumvent having to compile | for different architectures. | | https://docs.wasmtime.dev/wasm-rust.html | ksm1717 wrote: | Between pyodide, pyo3, rust-cpython, and rustpython, I think Pyo3 | is the best way to drop in rust in a python project for a speed | up, if that is your goal. Some of the demos show using python | from rust, but to me the biggest feature is without a doubt | compiling rust code to native python modules. I'm using it to | speed up image manipulation backed by numpy arrays. | | There's a setuptools rust [0] extension package that can be used | to hook the compilation of the rust into the wheel building or | install from source. Maturin [1] seems to be regarded as the new | and improved solution for this, but I found that it's angled | toward the using python from rust. | | There's also the rust numpy [2] package by the same org which is | fantastic in that it lets you pass a numpy matrix to a native | method written in rust and convert it to the rust equivalent data | structure, perform whatever transformation you want (in parallel | using rayon [3]), and return the array. When building for | release, I was seeing speed ups of 100x over numpy on the most | matrix mathable function imaginable, and numpy is no joke. | | I think there is a lot of potential for these two ecosystems | together. If there's not a python package for something, there's | probably a rust crate. | | If anyone is interested the python package that I'm building with | some rust backend, its called pyrogis [4] for making custom image | manipulations through numpy arrays. | | [0] https://github.com/PyO3/setuptools-rust | | [1] https://github.com/PyO3/maturin | | [2] https://github.com/PyO3/rust-numpy | | [3] https://github.com/rayon-rs/rayon | | [4] https://github.com/pierogis/pierogis | cycomanic wrote: | > Between pyodide, pyo3, rust-cpython, and rustpython, I think | Pyo3 is the best way to drop in rust in a python project for a | speed up, if that is your goal. Some of the demos show using | python from rust, but to me the biggest feature is without a | doubt compiling rust code to native python modules. I'm using | it to speed up image manipulation backed by numpy arrays. | | > There's a setuptools rust [0] extension package that can be | used to hook the compilation of the rust into the wheel | building or install from source. Maturin [1] seems to be | regarded as the new and improved solution for this, but I found | that it's angled toward the using python from rust. | | > There's also the rust numpy [2] package by the same org which | is fantastic in that it lets you pass a numpy matrix to a | native method written in rust and convert it to the rust | equivalent data structure, perform whatever transformation you | want (in parallel using rayon [3]), and return the array. When | building for release, I was seeing speed ups of 100x over numpy | on the most matrix mathable function imaginable, and numpy is | no joke. | | What sort of algorithm was that? Generally getting 100x speedup | on vectorized code is highly unusual even using handcoded c++. | So I suspect it was quite loop heavy? In those cases I have | also seen very significant speed ups. | | I have been using pythran [1] for speeding up my python code. | It generally achieves extremely good performance. I have | blogged about it here [2] and recently a member used pythran to | speed up some nbody benchmarks [3] which was used in an article | to argue for using compiled languages. | | That said I find pyO3 quite exciting and have been | contemplating to try it with some of my projects. [1] | https://github.com/serge-sans-paille/pythran [2] | https://jochenschroeder.com/blog/articles/DSP_with_Python2/ [3] | https://github.com/paugier/nbabel | ksm1717 wrote: | Matrix of shape (rows, columns, 3). Average the last dim for | each point and change it to [0,0,0] if average less than a | value, [255,255,255] if greater. A brightness threshold. May | be remembering the speed up factor wrong so take it with a | grain of salt - fact of the matter is it was very impressive. | | I'm checking out that post later, I'm trying to make my | package easy to build on, so being able to write extensions | with Pythran would be another great option for speed ups. | Thanks | cycomanic wrote: | Just for the fun of it I tested what speed up I could get | with a naive algorithm and pythran. Based on your | description it looks like the I should do the following: | | def threshold_pixel(img, thr): out = np.zeros_like(img) o = | np.mean(img, axis=-1) out[o>thr] = 255 return out | | This runs in ~30ms for a (1024,1024,3) array using numpy on | my machine. Using pythran (note I had to explicitely write | out the loop for out[o>thr] =255, due to a bug, that I | found and just reported), I get a speed of 6.ms (with | openmp) and 9ms without (I did not tune the openmp, but | this should yield a much higher speedup). | | P.S.: Just had a look at your project, very cool, I have to | try that | pansa2 wrote: | Related: RustPython - A Python interpreter written in Rust. | | https://github.com/RustPython/RustPython | bluedays wrote: | Without looking at it I wonder if it's using the Python language | underneath, or the python vm. Either way this is pretty cool. | Nvorzula wrote: | Precisely, this is Rust that compiles to a C FFI that plugs | into CPython. | itamarst wrote: | I've been playing with PyO3 for prototyping, and wrapped some | Rust code to see if it's faster than Python. The experience was | very much like using Boost Python (whcih these days has | alternative with https://github.com/pybind/pybind11). It's | _really_ easy to wrap code for Python, and it has nice APIs to | ensure GIL is held. Being Rust, I'm much more confident I won't | suffer from memory unsafety issues which my C++ at the time did. | | Now I'm starting to use it as part of the Python memory profiler | I'm working on (https://pythonspeed.com/fil), in this case to | call in to the low-level Python C API which PyO3 includes | bindings for in addition to its high-level API. This kind of | usage is more like writing C, except with the benefit of having | high-level APIs (for GIL holding, but also object conversion) | available when I need it. | | So basically you get safe, high-level, easy-to-use APIs, with | fallback to low-level unsafe APIs if you need them. | | Highly recommend trying it out. | JPKab wrote: | Was just checking out your fil project. It looks really useful, | and I dig the jupyter kernel as well. | itamarst wrote: | Thank you! If you have any questions/problems/ideas, please | reach out via GitHub or email (itamar@pythonspeed.com). | brundolf wrote: | What's the data-conversion overhead look like at the boundary? | Which data structures can be passed back and forth without a | full clone, etc? | itamarst wrote: | There's definitely a conversion cost. For strings, Python | apparently caches the UTF-8 encoded string, so if you | _repeatedly_ transfer it to Rust I suspect (but haven't | checked) that the cost is much lower. | | In general I suspect it's the usual "NumPy arrays are fast, | everything else you better be getting a sufficiently large | boost from the low-level code to justify conversion". | | For the thing I prototyped in Rust, it was wrapping the | `ahocorasick` crate which was in fact faster than | `pyahocorasick` which is written in C or Cython or something. | Both have similar conversion costs, probably, so it came down | to "for lots of data the Rust version was faster". | burntsushi wrote: | Be sure to use auto configuration to get it to go even | faster, depending on your use case: https://docs.rs/aho- | corasick/0.7.15/aho_corasick/struct.AhoC... | | Or just be sure to enable the DFA option if you can afford | it. It looks like the Python library is just the standard | NFA algorithm. | itamarst wrote: | Yeah, I was using DFA. | | Next step is trying alternative approach, but if that | alternative doesn't work I'm going to see about wrapping | your package for Python. | | Thanks for all your work on it! | burntsushi wrote: | Nice! Reach out if there are any problems or if you need | something exposed in the API. Looking at the | pyahocorasick issue tracker, there are a number of | features/bugs that your wrapper package would resolve. :) | liuliu wrote: | NumPy also support conversions without copying. One thing I | haven't found good way to bridge between Python is the | pandas.DataFrame, it seems to be quite Python focused | object and iterating through DataFrame is particularly | slow. | itamarst wrote: | Internally Pandas often uses NumPy arrays, especially for | numeric data, so might be able to pass things that way in | some cases? | | E.g. `df["column_name"].values` will you get you a NumPy | array. | shirakawasuna wrote: | Sounds great! Would so much rather drop into Rust than C or | C++. | dbrgn wrote: | If you're interested in publishing Rust libraries as Python | packages (or integrating Rust code into an existing Python | package), check out https://github.com/PyO3/maturin and | https://github.com/PyO3/setuptools-rust. | edenhyacinth wrote: | Been using Maturin for a little while professionally, and it's | surprisingly good. There's a few bugbears here and there - I | haven't found a way to have Cargo Test & a pyo3 library working | at the same time - but overall it's a lot more pleasant than | working with Rust and R was. ___________________________________________________________________ (page generated 2021-01-29 23:00 UTC)