[HN Gopher] Oxidizing bmap-tools: rewriting a Python project in ...
       ___________________________________________________________________
        
       Oxidizing bmap-tools: rewriting a Python project in Rust
        
       Author : glenngillen
       Score  : 57 points
       Date   : 2023-03-04 11:43 UTC (11 hours ago)
        
 (HTM) web link (www.collabora.com)
 (TXT) w3m dump (www.collabora.com)
        
       | UncleEntity wrote:
       | > Usually a project is oxidised into Rust because of many
       | reasons, the main usually being memory safety.
       | 
       | What about python's memory model is unsafe?
        
         | pohl wrote:
         | Does Python do anything to enforce mutually exclusive access
         | when mutating? If not, that's a hole you could drive a truck
         | through, isn't it?
        
           | gpm wrote:
           | Isn't Python still run under a single global interpreter
           | lock? Can't have simultaneous access while mutating if only
           | one thing is running at a time...
        
             | pohl wrote:
             | Yeah, that's something, at least. Wouldn't the order that
             | mutations happen still matter, even though they have to
             | acquire a lock? Not a pythoneer, myself.
        
               | masklinn wrote:
               | Not in the context of memory safety. You can still have
               | race conditions up the ass, but not data races, unless
               | you're using a native library which 1. releases the GIL
               | and 2. is broken.
        
               | [deleted]
        
               | pohl wrote:
               | Interesting, I hadn't realized how much the phrase
               | "memory safety" understates what is desirable.
        
               | [deleted]
        
               | [deleted]
        
               | gpm wrote:
               | The thing is that races are _good_ a lot of the time. If
               | I have a set of tasks running in parallel that take an
               | unknown /variable amount of time and I want to tell the
               | users which ones are finished, my output _needs_ to be
               | based on a race between the tasks. If I 'm scraping a
               | website, I (may) want to have multiple connections going
               | in parallel, and as soon as one of those connections
               | spots a new link I (may) want to open a new connection to
               | start scraping it, but I don't know which connection is
               | going to spot a new link first, so there's a (benign)
               | race condition.
               | 
               | Making a language that banned them outright would be
               | making a language that couldn't do things that people
               | wanted to do.
        
               | Yoric wrote:
               | I figure that you could very easily mark which race
               | conditions are good.
        
               | gpm wrote:
               | Matter? Sure, there can be race conditions.
               | 
               | But allow for memory unsafety? No, not if every ordering
               | of the "critical sections" (chunks of code run as a unit
               | while the interpreter is locked) is valid and upholds the
               | invariants Python expects.
        
             | tialaramex wrote:
             | For so long as the GIL persists, you are correct, and thus
             | Python does not have data races and is able to achieve
             | memory safety in this regard.
             | 
             | It is conceivable (but extremely unlikely, 'cos it was
             | really, really hard) that after a GILectomy Python follows
             | the Java path, in which data races are technically safe+.
             | However it is most likely Python with a GILectomy will
             | behave like Go or C# or numerous other languages and lose
             | memory safety properties if a data race occurs.
             | 
             | + Data races can happen in Java, and _astonishing things_
             | might happen, but objects always remain in some valid
             | state, so there is no loss of memory safety whereas in most
             | languages with data races you can e.g. race a hash table
             | and mess up its internals and cause chaos.
        
         | brundolf wrote:
         | The article felt kind of disjointed, I think that statement was
         | just meant generally and not meant to suggest it applies here
        
           | UncleEntity wrote:
           | Yeah... It's becoming a pet peeve that the Rustafarians
           | believe they have a monopoly on "memory safety" and need to
           | point it out all the time.
        
             | shaunsingh wrote:
             | Its worth pointing out because rust has a monopoly on easy-
             | to-write gc-less memory safety, with the alternatives being
             | modern c++ or higher level languages where you run into a
             | garbage collector
        
           | masklinn wrote:
           | Also how I interpreted it, though even there it's quite weird
           | (e.g. better performances is also a common reason to convert
           | things to Rust, especially when "easy binding" tools like
           | pyo3, neon, or rustler are available and take care of the
           | unsafe bits between the two).
        
       | creddit wrote:
       | Wow I really like the terminology "oxidizing" for re-writing
       | something in Rust.
       | 
       | Sorry for the unsubstantive comment.
        
       | claytonjy wrote:
       | Which Rust-written Python tools are folks using?
       | 
       | I know of two big ones: ruff (linting) and pyflow (dependency
       | management). The standard lib crypto module uses rust, too.
       | 
       | Are there other ones I should know about? Maybe replacements for
       | mypy, pre-commit, tox/nox?
        
         | 1f60c wrote:
         | > The standard lib crypto module uses rust, too.
         | 
         | This couldn't matter less, but I think you're confused with the
         | third-party Cryptography package, which uses Rust.
        
           | claytonjy wrote:
           | My bad, thanks for the correction!
        
         | bogeholm wrote:
         | polars, pydantic and deltalake come to mind
         | 
         | - https://pypi.org/project/polars/
         | 
         | - https://pypi.org/project/pydantic/
         | 
         | - https://pypi.org/project/deltalake/
        
           | jamincan wrote:
           | Does Pydantic use rust? When I check the github repo, it
           | shows 100% python.
        
             | claytonjy wrote:
             | He's working on a rust rewrite, to be used in Pydantic 2.0
        
       | ilovecaching wrote:
       | I'm confused... you're talking about avoid a local copy of sparse
       | regions... Linux already does that at the level of the inode.
       | There's also a seek operation to move past the next hole. Not
       | sure why you would carry around metadata the filesystem is
       | already tracking for you.
        
         | masklinn wrote:
         | > Not sure why you would carry around metadata the filesystem
         | is already tracking for you.
         | 
         | Because bmap files are independent of the filesystem _and_ OS,
         | and thus would probably like to work even with filesystems
         | which don 't support sparse files, and OS which don't expose
         | holes?
         | 
         | For instance until NFS 4.2 in 2016 you could write sparse files
         | to an NFS volume, but there was no way to detect holes when
         | reading. exfat doesn't support sparse files at all. And
         | according to their man pages, OpenBSD and NetBSD have yet to
         | support SEEK_HOLE/SEEK_DATA (which are non-standard extensions
         | of POSIX lseek(2)).
         | 
         | Plus according to its history the bmaptools project was created
         | about a year after the release of kernel 3.1, which introduced
         | support for SEEK_HOLE and SEEK_DATA. Doesn't take much of a
         | leap to assume that the project's creator didn't consider that
         | widespread enough to be reliable (Debian wouldn't release a
         | 3.x-based version until the following year).
        
           | andrewshadura wrote:
           | Seeking through holes also doesn't work very well for
           | compressed images, since usually there is no way to tell
           | apart an insignificant hole from a long sequence of zeroes or
           | other filler data.
        
         | hummus_bae wrote:
         | [dead]
        
         | agildehaus wrote:
         | An example from my work:
         | 
         | We have a Yocto build that results in about 120MB worth of
         | files that make up our app and Yocto. Originally we had a
         | script that would write a bootloader, partition and format ext4
         | our target's eMMC, and decompress a 120MB tarball to that
         | filesystem.
         | 
         | That worked well, but we wanted our script to become OS-
         | independent, as our field team ran Windows laptops. It's quite
         | difficult to get Windows to do an ext4 format, and I wanted our
         | tool to have a minimal number of dependencies (e2fsprogs
         | requirement? some proprietary thing from Paragon? no thanks)
         | 
         | So instead, have Yocto produce an image containing the
         | bootloader and all four pre-formatted ext4 filesystems. No
         | operating system needs to do the format if the filesystem
         | already exits within, it's just a raw block write. But now the
         | image is 4GB, the size of our eMMC, and writing all of it would
         | be painfully slow.
         | 
         | Thankfully Yocto also outputs a bmap file which maps the parts
         | of that 4GB which are empty space -- blocks we don't need to
         | write when commissioning our target device. So our
         | commissioning tool was rewritten in Go, and I wrote a bmap
         | implementation in Go to do the write. Flashing our target is as
         | fast as it used to be, but now that tool can be easily made to
         | work on multiple operating systems.
        
       | masklinn wrote:
       | There's very little content to the article sadly, aside from
       | links to the artefacts.
       | 
       | It also seems to have been done independently of the upstream, so
       | it's not really an "oxidation" in the usual terms, more of a
       | pseudo-fork of the specific `bmaptool copy` subcommand (though
       | TBF it only has one other subcommand which is `create`, and the
       | implementation in the upstream is about 1/3rd that of copy, so
       | copy is clearly the "meat" of the project).
        
       ___________________________________________________________________
       (page generated 2023-03-04 23:00 UTC)