[HN Gopher] Improve Git monorepo performance with a file system ... ___________________________________________________________________ Improve Git monorepo performance with a file system monitor Author : chmaynard Score : 61 points Date : 2022-06-29 19:20 UTC (3 hours ago) (HTM) web link (github.blog) (TXT) w3m dump (github.blog) | blopker wrote: | Having a cross-platform file watcher built into a ubiquitous tool | like git is pretty awesome. I could see build tools integrating | with this and making more aspects of development faster without | having to run a bunch of file watcher services. They all seem to | have issues. | | I have tried Watchman, but setting it up is a pain. There are so | many ways to use it. I also welcome running less Facebook code on | my systems. | rektide wrote: | > _[FSMonitor] is currently available on macOS and Windows._ | | Are there any other git features with this limitation? Wild to me | that we're here. | | Thankfully the article covers the semi-longstanding "hooks" that | existing (& very high performance) tools like Watchman (which are | cross platform) can use. | | Great in depth read. Good stuff! From the 2.37 release[1]. | | [1] https://github.blog/2022-06-27-highlights-from-git-2-37/ | https://news.ycombinator.com/item?id=31898261 (34 points, 2 days | ago, 7 comments) | milliams wrote: | My assumption was that on Linux it's just been using inotify or | something for a while and so hasn't needed a bespoke monitor. I | have no idea if that's true or makes sense though. | gpderetta wrote: | More likely the linux fs is fast enough not to need the | optimization. Unsurprisingly git was designed to run well on | linux . | [deleted] | bobkazamakis wrote: | yeah it uses famous linux-exclusive data structures, like | hashes and strings. | nasretdinov wrote: | Not sure why the response got downvoted. I personally found | Git performance to be, well, okay on macOS (but depends) | and absolutely horrible on Windows due to very slow stat() | calls on NTFS. | | Of course, in a large enough monorepo Linux performance | would also suffer, but to a much lesser degree. | | Also, conveniently, both Windows and macOS have an API for | recursive directory watch, whereas Linux doesn't (in | Vanilla kernel). Inotify can only watch the immediate | directory you're observing + there's a pretty low default | limit on the number of inotify descriptors that you're | allowed to have on top of that | staticassertion wrote: | My guess is that inotify is so slow with large directories that | it wasn't worth it. Plus inotify has cumbersome user limits. | | inotify has a number of other relevant limitations, like not | being able to create recursive notifications or handle "move" | operations. Implementation effort is going to be way higher for | an inotify-based system, and of course that's made far worse by | the numerous file systems in linux - I imagine any | implementation would probably start first with ext4. | | I suspect an ideal solution would be via ebpf, but I'm not | sure. | est31 wrote: | I've been wondering about why there was no linux support, and | found an e-mail from the author of the subcommand (as well as | the github.blog post) explaining the situation. | | Apparently an older implementation using inotify was dropped | because inotify does not work recursively, so you would have to | do an inotify call for all directories of the hierarchy which | is obviously very inefficient. There are system wide limits in | the number of directories you can listen to, and even if you | increase the limit you would probably cause a lot of overhead. | | Newer linuxes support the fanotify system call, which does | allow recursive listening. They haven't implemented something | using fanotify yet however. | | https://lore.kernel.org/git/e1442a04-7c68-0a7a-6e95-304854ad... | tex0 wrote: | This serves as an example to me that git is - maybe - not the | right tool for the job. | elpakal wrote: | so what is? | staticassertion wrote: | Moving to a continuous, asynchronous strategy versus a point- | in-time synchronous strategy, seems like a perfectly reasonable | way to improve performance. | BudaDude wrote: | As with a lot of developer tools, the most adopted solutions | are rarely the best tool for the job. But because everyone | knows them, thats what continues to be used. | eurasiantiger wrote: | I wonder if this will cause issues in repos where changes can | come from containerized apps syncing their runtime config to | disk. Depending on the platform and the container framework, a | lot of different things could potentially break here, from NFS- | related to number of open files. | saagarjha wrote: | I have a healthy suspicion of the performance of file-watchers. I | hope this feature doesn't make Git faster at the expense of "all | filesystem operations crawl". | mpawelski wrote: | This is awesome. Especially the fact that it's built-in and easy | to turn on. | | Seams like quite a complex solution though. I guess some big | company (Microsoft?) implemented it internally for their own use | and later tried to move it to upstream git. I wonder if there was | some pushback from git maintainers from having this functionality | built-in. | | Also why for Windows they use named pipes when in theory Windows | also supports it? | (https://devblogs.microsoft.com/commandline/af_unix-comes-to-...) | | BTW, to the author of this article. It is very good. It was an | interesting read. The are some small issues: | | - "markdown" link didn't get converted to html: | "[core.untrackedcache](https://git-scm.com/docs/git- | config#Documentation/git-config...)" | | - the link to "philosophy" of Scalar doesn't work: | https://github.com/microsoft/git/blob/HEAD/contrib/scalar/do... | infogulch wrote: | What's the current state of git tooling for large files and | partial clones? | infogulch wrote: | My holy grail implementation would be a "partial clone" that | downloads desired files like normal, but creates stubs for | selected files that are not stored on the device but downloaded | on-demand upon opening them, like the OneDrive Files On-Demand | [1] or Google Drive File Stream. | | [1]: https://support.microsoft.com/en-us/office/save-disk- | space-w... | minimalist wrote: | Have you looked into git-annex? ___________________________________________________________________ (page generated 2022-06-29 23:00 UTC)