[HN Gopher] WebAssembly: Adding Python support to WASM language ... ___________________________________________________________________ WebAssembly: Adding Python support to WASM language runtimes Author : assambar Score : 134 points Date : 2023-01-30 16:02 UTC (1 days ago) (HTM) web link (wasmlabs.dev) (TXT) w3m dump (wasmlabs.dev) | brrrrrm wrote: | the issue right now with Python support in WASM (at least for | machine learning, the main driver of the language) is that Python | is largely a wrapper language and none the utilities that make it | so powerful (numpy, PyTorch, JAX) work particularly well in wasm, | since it's so limited performance-wise (no FMA, no GPU support). | | I'm excited for pairing wasm with WebGPU, which will likely | unblock these projects from building support for the | web/untrusted ecosystem. A useful project would be one that makes | this integration really easy to build today and a flip of the | switch to turn on in the future. | mjw1007 wrote: | I've come across this notion that nowadays machine learning | provides (in some sense) the biggest group of Python users a | few times recently. | | What reason is there to suppose this is true? It seems | surprising to me. | claytonjy wrote: | It's really hard to do much ML in anything _except_ python. | Virtually everyone improving the ML ecosystems of other | language got their start in Python and are knowingly | competing with Python (e.g. R, Julia). If you want to get | started in ML today, python is the obvious easiest path | forward. | | So, most ML users are python users. I don't know how that | group compares to non-ML python users, but I have a feeling | there isn't a flood of eager new Django devs the way there is | Pytorch users. Most non-ML things you could do with python | can be done similarly well in Go/Rust/Typescript, but there's | no other option for most ML stuff. | mjw1007 wrote: | I found a recentish (2021) survey at [1] which suggests | that in 2021 ML was some way behind web development, | sysadmin stuff, and data analysis among Python users (and | didn't seem to be on the way up the list). | | [1] https://lp.jetbrains.com/python-developers- | survey-2021/#Gene... | claytonjy wrote: | Great source; looks like I've quite underestimated the | python-web-dev crowd's size. | | I'm curious what the longer-term trends look like; not | much change between consecutive years. | | Data analysis is basically a pre-requisite for ML, so the | combined "data stuff" usage is quite a lot bigger than | web dev usage! | _visgean wrote: | > What reason is there to suppose this is true? It seems | surprising to me. | | One reason is its just super easy for input output | operations. ML is all about data and getting the data to the | right place is really easy in python compared to some other | languages.. | still_grokking wrote: | Which languages? | | Python is OOP; but the "classical" data-centric languages | are actually all more or less in the FP space. (I count | array languages and APL-likes to FP in this case). | | Just an example: You don't have immutable data types by | default in Python. This is actually a pretty bad default | for data processing tasks. | m00dy wrote: | I have integrated pyodide + webgpu recently. (you can do matmul | using webgpu's compute pipeline). The real problem is that | browser tabs have 4gb max memory size. So, training neural | networks on this stack is almost impossible. ( I don't even | want to mention pyTorch's dependency hell). | miohtama wrote: | WebAssembly Memory64 is coming | | https://webassembly.org/roadmap/ | brrrrrm wrote: | My claim is that it's not easy, not impossible. There's | little incentive to hack in JavaScript or maintain a Pyodide | compatible build. The 4gb limit isn't a technical limitation, | just a standards thing (it could change easily). | c120 wrote: | So for someone who has python installed locally, what's the | point? | | Is it just the sandbox or is there anything else I'm missing? | kasajian wrote: | It's not for someone who only runs Python locally. | angelmm wrote: | You get an extra layer of isolation, even at your development | environment level. | | I remember a NodeJs CVE that was caused by a poisoned | dependency. It was affecting people when downloading it from | npm. | | There's still a gap here to cover, but the benefits may be | worth :) | ElectricalUnion wrote: | I don't see how this would in any way prevent you from being | affected by a equivalent poisoned pypi dependency; after all | your secrets/credentials are inside the sandbox anyways or | your code can't work. | angelmm wrote: | With Wasm + WASI, you need to explicitly mount files and | environment variables. Inside the Wasm VM, the Python | interpreter, source code and dependencies only have access | to a very reduced surface. Although you're right that if | you mount credentials inside, they will be accessible too. | | The incident I was talking about was the event-stream[1] | vulnerability. The attacker introduced code that looked for | the data of a crypto wallet. This data was stored in the | user's home. | | By default, interpreters may get access to the same | resources that the user running the process. In Wasm, the | resources are granted manually. | | [1] https://blog.npmjs.org/post/180565383195/details-about- | the-e... | still_grokking wrote: | > By default, interpreters may get access to the same | resources that the user running the process. In Wasm, the | resources are granted manually. | | What's the difference to run the code under a different | user (like for example `nobody` for "full sandboxing", or | a "clone of nobody" with some additional access rights)? | chc wrote: | If you're just looking to run trusted scripts locally, there | isn't much point. If you're running a system that uses wasm, | this means you can now easily support Python. | AshleysBrain wrote: | How does this handle garbage collection? AFAIK the WebAssembly GC | proposal is still in development. Does it implement GC in WASM | code? | amelius wrote: | Perhaps it just uses Python's built-in garbage collector that | just increases/decreases the data segment size as needed by | calling sbrk()? | ridruejo wrote: | Correct, it is just CPython compiled to Wasm (similar to | compiling to x86 or arm) | robertlagrant wrote: | The non-Docker version seems to require an external site- | packages, unless I missed it. Is it possible to produce a single | wasm binary with all dependencies compiled in? | seddonm1 wrote: | I have been following and playing with this repository: | https://github.com/singlestore-labs/python-wasi/ | | It builds a single Python WASM module with all dependencies | included (they use VFS) and a Dockerfile to make the process | easy (and actually worked first go). It does produce large | files though: wasi-python3.11.wasm 110MB | ridruejo wrote: | Yes! Single store is a great team. We are currently using | some of their work for this Python release, like libz | angelmm wrote: | Hey! Dev here :) | | For external libraries, it requires you to mount the libraries | with WASI when running the python.wasm module. Another option | we're exploring is to use wasi-vfs[1] to include some common | modules in our pre-built binaries. For example, Ruby does | require some extra libraries for common workloads (like JSON | parsing). This is still on the exploration phase, but we may do | something with it. | | [1] https://github.com/kateinoigakukun/wasi-vfs | robertlagrant wrote: | Very cool. We ship some Python as a Debian dependency and so | this could become a really interesting way to package | everything up. | simonw wrote: | This looks very promising! | | The thing I most want to solve right now is this: I want to write | a regular Python application that can safely execute untrusted | Python code in a WASM sandbox as part of its execution. | | I want to do this so I can let end users customize my web | applications in weird and interesting ways by pasting their own | Python code into a textarea - think features like "run this | Python code to transform my stored data" - without them being | able to break my system. | | This feels like it should be pretty easy with WebAssembly! It's | the classic code sandboxing problem - long a big challenge in | Python world - finally solved in a robust way. | | I've been finding it surprisingly hard to get a proof-of-concept | of this working though. | | Essentially I want to be able to do this, in my regular Python | code: import some_webassembly_engine | python = some_webassembly_engine.load( "python.wasm", | max_cpu_time_in_seconds=3.0, | max_allowed_memory_in_bytes=32000000 ) result = | python.execute("3 + 5") | | I've not yet figured out the incantations I need to actually do | this - in particular the limits on CPU and memory time. | | I posed this question on Mastodon recently and Jim Kring put | together this demo, which gets most of the way there (albeit | using an old Python 3.6 build): | https://github.com/jimkring/python-sandbox-wasm | | It doesn't feel like this should be as hard to figure out as it | is! | irrational wrote: | Why do this on the client? Why not pass it to the server and | run it on Python there? | simonw wrote: | That's what I'm talking about: I want to run Python code on | my server, but since it's from an untrusted source I want to | make sure that it's in a sandbox with strict limits on what | it can do, how much CPU it can use and how much RAM it has | available to it - so malicious code can't be used to crash my | server or steal data it shouldn't have access to. | callahad wrote: | The startup I'm working at is basically trying to do exactly | that as a service, but a one-off thing for a regular Python | application _shouldn 't_ be as hard to figure out as it is. Can | you link to the Mastodon thread (darn lack of search!) and we | can continue there? | simonw wrote: | Here's the Mastodon conversation: | https://fedi.simonwillison.net/@simon/109682777068881522 | | (I'm so close to building my own search engine just against | my own content there.) | phickey wrote: | Wasmtime's `wasmtime-py` embedding in python has support for | Wasm Components: https://github.com/bytecodealliance/wasmtime- | py#components (disclosure, I helped create it) | | The remaining piece of the puzzle would be to create a wit- | bindgen guest generator | https://github.com/bytecodealliance/wit-bindgen#guests for this | build of the python interpreter. You could then seamlessly call | back and forth between the host and guest pythons, without even | knowing that wasmtime is under the hood. | simonw wrote: | If you could provide example code for how to do this - how to | run a snippet of untrusted Python code using wasmtime-py with | a CPU and RAM limit - I would shout it from the rooftops. I | think a LOT of people would benefit from clear examples of | how to actually achieve this. | samsquire wrote: | This would be great. And with an exposeable API for safety a | memory safe API that could be exposed to wasm applications. And | rate limited. | mritchie712 wrote: | Have you tried to do it with pyodide? What issues did you hit | using that? | simonw wrote: | Pyodide isn't currently supported outside of browsers, though | that might change: | https://github.com/pyodide/pyodide/issues/869 | | Either way, I couldn't figure out how to do the above | sequence of steps with any of the available Python WASM | runtimes - they're all very under-documented at the moment, | sadly. I tried all three of these: | | - https://github.com/wasmerio/wasmer-python | | - https://github.com/bytecodealliance/wasmtime-py | | - https://github.com/wasm3/pywasm3 | mike_hearn wrote: | FWIW although it's not WebAssembly based, you can do that with | GraalVM. It has a concept of language contexts which can be | sandboxed including those constraints. There are two caveats: | | 1. Sandboxing for CPU time and max allowed memory requires the | enterprise edition, so you'd have to pay for it. | | 2. The Python engine isn't 100% compatible with regular Python, | although that may not matter for your use case as the | compatibility is pretty good and issues mostly show up around | extension modules. | dayeye2006 wrote: | Can anyone give me a ELI5 version what is the relationship | between this and pyodie? | ridruejo wrote: | Pyodide is for the browser, this is intended for server side | environments, so it can interact with files, sockets etc via | WASI standard | assambar wrote: | Ready-to-use python.wasm, also in a Docker+Wasm container image. | still_grokking wrote: | But please don't forget to wrap it in at least some VM! /s | | That's not even funny, as in real life people would run | something like that actually in a VM. | | So we have now: HW memory protection -> HW virtualization -> VM | -> OS -> Docker -> WASM -> language runtime -> some code | snippet. | | Things become quite crazy these days, to be honest... | dom96 wrote: | There seems to be so many different variants of the same thing | out there. What makes this unique? For example I know Pyodide | exists and also runs CPython under WASM. | ridruejo wrote: | This one is designed to run on the server side and interface | with the OS via WASI, so it can read/write files etc ___________________________________________________________________ (page generated 2023-01-31 23:00 UTC)