[HN Gopher] Oxidizing bmap-tools: rewriting a Python project in ... ___________________________________________________________________ Oxidizing bmap-tools: rewriting a Python project in Rust Author : glenngillen Score : 57 points Date : 2023-03-04 11:43 UTC (11 hours ago) (HTM) web link (www.collabora.com) (TXT) w3m dump (www.collabora.com) | UncleEntity wrote: | > Usually a project is oxidised into Rust because of many | reasons, the main usually being memory safety. | | What about python's memory model is unsafe? | pohl wrote: | Does Python do anything to enforce mutually exclusive access | when mutating? If not, that's a hole you could drive a truck | through, isn't it? | gpm wrote: | Isn't Python still run under a single global interpreter | lock? Can't have simultaneous access while mutating if only | one thing is running at a time... | pohl wrote: | Yeah, that's something, at least. Wouldn't the order that | mutations happen still matter, even though they have to | acquire a lock? Not a pythoneer, myself. | masklinn wrote: | Not in the context of memory safety. You can still have | race conditions up the ass, but not data races, unless | you're using a native library which 1. releases the GIL | and 2. is broken. | [deleted] | pohl wrote: | Interesting, I hadn't realized how much the phrase | "memory safety" understates what is desirable. | [deleted] | [deleted] | gpm wrote: | The thing is that races are _good_ a lot of the time. If | I have a set of tasks running in parallel that take an | unknown /variable amount of time and I want to tell the | users which ones are finished, my output _needs_ to be | based on a race between the tasks. If I 'm scraping a | website, I (may) want to have multiple connections going | in parallel, and as soon as one of those connections | spots a new link I (may) want to open a new connection to | start scraping it, but I don't know which connection is | going to spot a new link first, so there's a (benign) | race condition. | | Making a language that banned them outright would be | making a language that couldn't do things that people | wanted to do. | Yoric wrote: | I figure that you could very easily mark which race | conditions are good. | gpm wrote: | Matter? Sure, there can be race conditions. | | But allow for memory unsafety? No, not if every ordering | of the "critical sections" (chunks of code run as a unit | while the interpreter is locked) is valid and upholds the | invariants Python expects. | tialaramex wrote: | For so long as the GIL persists, you are correct, and thus | Python does not have data races and is able to achieve | memory safety in this regard. | | It is conceivable (but extremely unlikely, 'cos it was | really, really hard) that after a GILectomy Python follows | the Java path, in which data races are technically safe+. | However it is most likely Python with a GILectomy will | behave like Go or C# or numerous other languages and lose | memory safety properties if a data race occurs. | | + Data races can happen in Java, and _astonishing things_ | might happen, but objects always remain in some valid | state, so there is no loss of memory safety whereas in most | languages with data races you can e.g. race a hash table | and mess up its internals and cause chaos. | brundolf wrote: | The article felt kind of disjointed, I think that statement was | just meant generally and not meant to suggest it applies here | UncleEntity wrote: | Yeah... It's becoming a pet peeve that the Rustafarians | believe they have a monopoly on "memory safety" and need to | point it out all the time. | shaunsingh wrote: | Its worth pointing out because rust has a monopoly on easy- | to-write gc-less memory safety, with the alternatives being | modern c++ or higher level languages where you run into a | garbage collector | masklinn wrote: | Also how I interpreted it, though even there it's quite weird | (e.g. better performances is also a common reason to convert | things to Rust, especially when "easy binding" tools like | pyo3, neon, or rustler are available and take care of the | unsafe bits between the two). | creddit wrote: | Wow I really like the terminology "oxidizing" for re-writing | something in Rust. | | Sorry for the unsubstantive comment. | claytonjy wrote: | Which Rust-written Python tools are folks using? | | I know of two big ones: ruff (linting) and pyflow (dependency | management). The standard lib crypto module uses rust, too. | | Are there other ones I should know about? Maybe replacements for | mypy, pre-commit, tox/nox? | 1f60c wrote: | > The standard lib crypto module uses rust, too. | | This couldn't matter less, but I think you're confused with the | third-party Cryptography package, which uses Rust. | claytonjy wrote: | My bad, thanks for the correction! | bogeholm wrote: | polars, pydantic and deltalake come to mind | | - https://pypi.org/project/polars/ | | - https://pypi.org/project/pydantic/ | | - https://pypi.org/project/deltalake/ | jamincan wrote: | Does Pydantic use rust? When I check the github repo, it | shows 100% python. | claytonjy wrote: | He's working on a rust rewrite, to be used in Pydantic 2.0 | ilovecaching wrote: | I'm confused... you're talking about avoid a local copy of sparse | regions... Linux already does that at the level of the inode. | There's also a seek operation to move past the next hole. Not | sure why you would carry around metadata the filesystem is | already tracking for you. | masklinn wrote: | > Not sure why you would carry around metadata the filesystem | is already tracking for you. | | Because bmap files are independent of the filesystem _and_ OS, | and thus would probably like to work even with filesystems | which don 't support sparse files, and OS which don't expose | holes? | | For instance until NFS 4.2 in 2016 you could write sparse files | to an NFS volume, but there was no way to detect holes when | reading. exfat doesn't support sparse files at all. And | according to their man pages, OpenBSD and NetBSD have yet to | support SEEK_HOLE/SEEK_DATA (which are non-standard extensions | of POSIX lseek(2)). | | Plus according to its history the bmaptools project was created | about a year after the release of kernel 3.1, which introduced | support for SEEK_HOLE and SEEK_DATA. Doesn't take much of a | leap to assume that the project's creator didn't consider that | widespread enough to be reliable (Debian wouldn't release a | 3.x-based version until the following year). | andrewshadura wrote: | Seeking through holes also doesn't work very well for | compressed images, since usually there is no way to tell | apart an insignificant hole from a long sequence of zeroes or | other filler data. | hummus_bae wrote: | [dead] | agildehaus wrote: | An example from my work: | | We have a Yocto build that results in about 120MB worth of | files that make up our app and Yocto. Originally we had a | script that would write a bootloader, partition and format ext4 | our target's eMMC, and decompress a 120MB tarball to that | filesystem. | | That worked well, but we wanted our script to become OS- | independent, as our field team ran Windows laptops. It's quite | difficult to get Windows to do an ext4 format, and I wanted our | tool to have a minimal number of dependencies (e2fsprogs | requirement? some proprietary thing from Paragon? no thanks) | | So instead, have Yocto produce an image containing the | bootloader and all four pre-formatted ext4 filesystems. No | operating system needs to do the format if the filesystem | already exits within, it's just a raw block write. But now the | image is 4GB, the size of our eMMC, and writing all of it would | be painfully slow. | | Thankfully Yocto also outputs a bmap file which maps the parts | of that 4GB which are empty space -- blocks we don't need to | write when commissioning our target device. So our | commissioning tool was rewritten in Go, and I wrote a bmap | implementation in Go to do the write. Flashing our target is as | fast as it used to be, but now that tool can be easily made to | work on multiple operating systems. | masklinn wrote: | There's very little content to the article sadly, aside from | links to the artefacts. | | It also seems to have been done independently of the upstream, so | it's not really an "oxidation" in the usual terms, more of a | pseudo-fork of the specific `bmaptool copy` subcommand (though | TBF it only has one other subcommand which is `create`, and the | implementation in the upstream is about 1/3rd that of copy, so | copy is clearly the "meat" of the project). ___________________________________________________________________ (page generated 2023-03-04 23:00 UTC)