[HN Gopher] WebAssembly: Adding Python support to WASM language ...
       ___________________________________________________________________
        
       WebAssembly: Adding Python support to WASM language runtimes
        
       Author : assambar
       Score  : 134 points
       Date   : 2023-01-30 16:02 UTC (1 days ago)
        
 (HTM) web link (wasmlabs.dev)
 (TXT) w3m dump (wasmlabs.dev)
        
       | brrrrrm wrote:
       | the issue right now with Python support in WASM (at least for
       | machine learning, the main driver of the language) is that Python
       | is largely a wrapper language and none the utilities that make it
       | so powerful (numpy, PyTorch, JAX) work particularly well in wasm,
       | since it's so limited performance-wise (no FMA, no GPU support).
       | 
       | I'm excited for pairing wasm with WebGPU, which will likely
       | unblock these projects from building support for the
       | web/untrusted ecosystem. A useful project would be one that makes
       | this integration really easy to build today and a flip of the
       | switch to turn on in the future.
        
         | mjw1007 wrote:
         | I've come across this notion that nowadays machine learning
         | provides (in some sense) the biggest group of Python users a
         | few times recently.
         | 
         | What reason is there to suppose this is true? It seems
         | surprising to me.
        
           | claytonjy wrote:
           | It's really hard to do much ML in anything _except_ python.
           | Virtually everyone improving the ML ecosystems of other
           | language got their start in Python and are knowingly
           | competing with Python (e.g. R, Julia). If you want to get
           | started in ML today, python is the obvious easiest path
           | forward.
           | 
           | So, most ML users are python users. I don't know how that
           | group compares to non-ML python users, but I have a feeling
           | there isn't a flood of eager new Django devs the way there is
           | Pytorch users. Most non-ML things you could do with python
           | can be done similarly well in Go/Rust/Typescript, but there's
           | no other option for most ML stuff.
        
             | mjw1007 wrote:
             | I found a recentish (2021) survey at [1] which suggests
             | that in 2021 ML was some way behind web development,
             | sysadmin stuff, and data analysis among Python users (and
             | didn't seem to be on the way up the list).
             | 
             | [1] https://lp.jetbrains.com/python-developers-
             | survey-2021/#Gene...
        
               | claytonjy wrote:
               | Great source; looks like I've quite underestimated the
               | python-web-dev crowd's size.
               | 
               | I'm curious what the longer-term trends look like; not
               | much change between consecutive years.
               | 
               | Data analysis is basically a pre-requisite for ML, so the
               | combined "data stuff" usage is quite a lot bigger than
               | web dev usage!
        
           | _visgean wrote:
           | > What reason is there to suppose this is true? It seems
           | surprising to me.
           | 
           | One reason is its just super easy for input output
           | operations. ML is all about data and getting the data to the
           | right place is really easy in python compared to some other
           | languages..
        
             | still_grokking wrote:
             | Which languages?
             | 
             | Python is OOP; but the "classical" data-centric languages
             | are actually all more or less in the FP space. (I count
             | array languages and APL-likes to FP in this case).
             | 
             | Just an example: You don't have immutable data types by
             | default in Python. This is actually a pretty bad default
             | for data processing tasks.
        
         | m00dy wrote:
         | I have integrated pyodide + webgpu recently. (you can do matmul
         | using webgpu's compute pipeline). The real problem is that
         | browser tabs have 4gb max memory size. So, training neural
         | networks on this stack is almost impossible. ( I don't even
         | want to mention pyTorch's dependency hell).
        
           | miohtama wrote:
           | WebAssembly Memory64 is coming
           | 
           | https://webassembly.org/roadmap/
        
           | brrrrrm wrote:
           | My claim is that it's not easy, not impossible. There's
           | little incentive to hack in JavaScript or maintain a Pyodide
           | compatible build. The 4gb limit isn't a technical limitation,
           | just a standards thing (it could change easily).
        
       | c120 wrote:
       | So for someone who has python installed locally, what's the
       | point?
       | 
       | Is it just the sandbox or is there anything else I'm missing?
        
         | kasajian wrote:
         | It's not for someone who only runs Python locally.
        
         | angelmm wrote:
         | You get an extra layer of isolation, even at your development
         | environment level.
         | 
         | I remember a NodeJs CVE that was caused by a poisoned
         | dependency. It was affecting people when downloading it from
         | npm.
         | 
         | There's still a gap here to cover, but the benefits may be
         | worth :)
        
           | ElectricalUnion wrote:
           | I don't see how this would in any way prevent you from being
           | affected by a equivalent poisoned pypi dependency; after all
           | your secrets/credentials are inside the sandbox anyways or
           | your code can't work.
        
             | angelmm wrote:
             | With Wasm + WASI, you need to explicitly mount files and
             | environment variables. Inside the Wasm VM, the Python
             | interpreter, source code and dependencies only have access
             | to a very reduced surface. Although you're right that if
             | you mount credentials inside, they will be accessible too.
             | 
             | The incident I was talking about was the event-stream[1]
             | vulnerability. The attacker introduced code that looked for
             | the data of a crypto wallet. This data was stored in the
             | user's home.
             | 
             | By default, interpreters may get access to the same
             | resources that the user running the process. In Wasm, the
             | resources are granted manually.
             | 
             | [1] https://blog.npmjs.org/post/180565383195/details-about-
             | the-e...
        
               | still_grokking wrote:
               | > By default, interpreters may get access to the same
               | resources that the user running the process. In Wasm, the
               | resources are granted manually.
               | 
               | What's the difference to run the code under a different
               | user (like for example `nobody` for "full sandboxing", or
               | a "clone of nobody" with some additional access rights)?
        
         | chc wrote:
         | If you're just looking to run trusted scripts locally, there
         | isn't much point. If you're running a system that uses wasm,
         | this means you can now easily support Python.
        
       | AshleysBrain wrote:
       | How does this handle garbage collection? AFAIK the WebAssembly GC
       | proposal is still in development. Does it implement GC in WASM
       | code?
        
         | amelius wrote:
         | Perhaps it just uses Python's built-in garbage collector that
         | just increases/decreases the data segment size as needed by
         | calling sbrk()?
        
           | ridruejo wrote:
           | Correct, it is just CPython compiled to Wasm (similar to
           | compiling to x86 or arm)
        
       | robertlagrant wrote:
       | The non-Docker version seems to require an external site-
       | packages, unless I missed it. Is it possible to produce a single
       | wasm binary with all dependencies compiled in?
        
         | seddonm1 wrote:
         | I have been following and playing with this repository:
         | https://github.com/singlestore-labs/python-wasi/
         | 
         | It builds a single Python WASM module with all dependencies
         | included (they use VFS) and a Dockerfile to make the process
         | easy (and actually worked first go). It does produce large
         | files though: wasi-python3.11.wasm 110MB
        
           | ridruejo wrote:
           | Yes! Single store is a great team. We are currently using
           | some of their work for this Python release, like libz
        
         | angelmm wrote:
         | Hey! Dev here :)
         | 
         | For external libraries, it requires you to mount the libraries
         | with WASI when running the python.wasm module. Another option
         | we're exploring is to use wasi-vfs[1] to include some common
         | modules in our pre-built binaries. For example, Ruby does
         | require some extra libraries for common workloads (like JSON
         | parsing). This is still on the exploration phase, but we may do
         | something with it.
         | 
         | [1] https://github.com/kateinoigakukun/wasi-vfs
        
           | robertlagrant wrote:
           | Very cool. We ship some Python as a Debian dependency and so
           | this could become a really interesting way to package
           | everything up.
        
       | simonw wrote:
       | This looks very promising!
       | 
       | The thing I most want to solve right now is this: I want to write
       | a regular Python application that can safely execute untrusted
       | Python code in a WASM sandbox as part of its execution.
       | 
       | I want to do this so I can let end users customize my web
       | applications in weird and interesting ways by pasting their own
       | Python code into a textarea - think features like "run this
       | Python code to transform my stored data" - without them being
       | able to break my system.
       | 
       | This feels like it should be pretty easy with WebAssembly! It's
       | the classic code sandboxing problem - long a big challenge in
       | Python world - finally solved in a robust way.
       | 
       | I've been finding it surprisingly hard to get a proof-of-concept
       | of this working though.
       | 
       | Essentially I want to be able to do this, in my regular Python
       | code:                   import some_webassembly_engine
       | python = some_webassembly_engine.load(             "python.wasm",
       | max_cpu_time_in_seconds=3.0,
       | max_allowed_memory_in_bytes=32000000         )         result =
       | python.execute("3 + 5")
       | 
       | I've not yet figured out the incantations I need to actually do
       | this - in particular the limits on CPU and memory time.
       | 
       | I posed this question on Mastodon recently and Jim Kring put
       | together this demo, which gets most of the way there (albeit
       | using an old Python 3.6 build):
       | https://github.com/jimkring/python-sandbox-wasm
       | 
       | It doesn't feel like this should be as hard to figure out as it
       | is!
        
         | irrational wrote:
         | Why do this on the client? Why not pass it to the server and
         | run it on Python there?
        
           | simonw wrote:
           | That's what I'm talking about: I want to run Python code on
           | my server, but since it's from an untrusted source I want to
           | make sure that it's in a sandbox with strict limits on what
           | it can do, how much CPU it can use and how much RAM it has
           | available to it - so malicious code can't be used to crash my
           | server or steal data it shouldn't have access to.
        
         | callahad wrote:
         | The startup I'm working at is basically trying to do exactly
         | that as a service, but a one-off thing for a regular Python
         | application _shouldn 't_ be as hard to figure out as it is. Can
         | you link to the Mastodon thread (darn lack of search!) and we
         | can continue there?
        
           | simonw wrote:
           | Here's the Mastodon conversation:
           | https://fedi.simonwillison.net/@simon/109682777068881522
           | 
           | (I'm so close to building my own search engine just against
           | my own content there.)
        
         | phickey wrote:
         | Wasmtime's `wasmtime-py` embedding in python has support for
         | Wasm Components: https://github.com/bytecodealliance/wasmtime-
         | py#components (disclosure, I helped create it)
         | 
         | The remaining piece of the puzzle would be to create a wit-
         | bindgen guest generator
         | https://github.com/bytecodealliance/wit-bindgen#guests for this
         | build of the python interpreter. You could then seamlessly call
         | back and forth between the host and guest pythons, without even
         | knowing that wasmtime is under the hood.
        
           | simonw wrote:
           | If you could provide example code for how to do this - how to
           | run a snippet of untrusted Python code using wasmtime-py with
           | a CPU and RAM limit - I would shout it from the rooftops. I
           | think a LOT of people would benefit from clear examples of
           | how to actually achieve this.
        
         | samsquire wrote:
         | This would be great. And with an exposeable API for safety a
         | memory safe API that could be exposed to wasm applications. And
         | rate limited.
        
         | mritchie712 wrote:
         | Have you tried to do it with pyodide? What issues did you hit
         | using that?
        
           | simonw wrote:
           | Pyodide isn't currently supported outside of browsers, though
           | that might change:
           | https://github.com/pyodide/pyodide/issues/869
           | 
           | Either way, I couldn't figure out how to do the above
           | sequence of steps with any of the available Python WASM
           | runtimes - they're all very under-documented at the moment,
           | sadly. I tried all three of these:
           | 
           | - https://github.com/wasmerio/wasmer-python
           | 
           | - https://github.com/bytecodealliance/wasmtime-py
           | 
           | - https://github.com/wasm3/pywasm3
        
         | mike_hearn wrote:
         | FWIW although it's not WebAssembly based, you can do that with
         | GraalVM. It has a concept of language contexts which can be
         | sandboxed including those constraints. There are two caveats:
         | 
         | 1. Sandboxing for CPU time and max allowed memory requires the
         | enterprise edition, so you'd have to pay for it.
         | 
         | 2. The Python engine isn't 100% compatible with regular Python,
         | although that may not matter for your use case as the
         | compatibility is pretty good and issues mostly show up around
         | extension modules.
        
       | dayeye2006 wrote:
       | Can anyone give me a ELI5 version what is the relationship
       | between this and pyodie?
        
         | ridruejo wrote:
         | Pyodide is for the browser, this is intended for server side
         | environments, so it can interact with files, sockets etc via
         | WASI standard
        
       | assambar wrote:
       | Ready-to-use python.wasm, also in a Docker+Wasm container image.
        
         | still_grokking wrote:
         | But please don't forget to wrap it in at least some VM! /s
         | 
         | That's not even funny, as in real life people would run
         | something like that actually in a VM.
         | 
         | So we have now: HW memory protection -> HW virtualization -> VM
         | -> OS -> Docker -> WASM -> language runtime -> some code
         | snippet.
         | 
         | Things become quite crazy these days, to be honest...
        
       | dom96 wrote:
       | There seems to be so many different variants of the same thing
       | out there. What makes this unique? For example I know Pyodide
       | exists and also runs CPython under WASM.
        
         | ridruejo wrote:
         | This one is designed to run on the server side and interface
         | with the OS via WASI, so it can read/write files etc
        
       ___________________________________________________________________
       (page generated 2023-01-31 23:00 UTC)