[HN Gopher] Improve Git monorepo performance with a file system ...
       ___________________________________________________________________
        
       Improve Git monorepo performance with a file system monitor
        
       Author : chmaynard
       Score  : 61 points
       Date   : 2022-06-29 19:20 UTC (3 hours ago)
        
 (HTM) web link (github.blog)
 (TXT) w3m dump (github.blog)
        
       | blopker wrote:
       | Having a cross-platform file watcher built into a ubiquitous tool
       | like git is pretty awesome. I could see build tools integrating
       | with this and making more aspects of development faster without
       | having to run a bunch of file watcher services. They all seem to
       | have issues.
       | 
       | I have tried Watchman, but setting it up is a pain. There are so
       | many ways to use it. I also welcome running less Facebook code on
       | my systems.
        
       | rektide wrote:
       | > _[FSMonitor] is currently available on macOS and Windows._
       | 
       | Are there any other git features with this limitation? Wild to me
       | that we're here.
       | 
       | Thankfully the article covers the semi-longstanding "hooks" that
       | existing (& very high performance) tools like Watchman (which are
       | cross platform) can use.
       | 
       | Great in depth read. Good stuff! From the 2.37 release[1].
       | 
       | [1] https://github.blog/2022-06-27-highlights-from-git-2-37/
       | https://news.ycombinator.com/item?id=31898261 (34 points, 2 days
       | ago, 7 comments)
        
         | milliams wrote:
         | My assumption was that on Linux it's just been using inotify or
         | something for a while and so hasn't needed a bespoke monitor. I
         | have no idea if that's true or makes sense though.
        
           | gpderetta wrote:
           | More likely the linux fs is fast enough not to need the
           | optimization. Unsurprisingly git was designed to run well on
           | linux .
        
             | [deleted]
        
             | bobkazamakis wrote:
             | yeah it uses famous linux-exclusive data structures, like
             | hashes and strings.
        
             | nasretdinov wrote:
             | Not sure why the response got downvoted. I personally found
             | Git performance to be, well, okay on macOS (but depends)
             | and absolutely horrible on Windows due to very slow stat()
             | calls on NTFS.
             | 
             | Of course, in a large enough monorepo Linux performance
             | would also suffer, but to a much lesser degree.
             | 
             | Also, conveniently, both Windows and macOS have an API for
             | recursive directory watch, whereas Linux doesn't (in
             | Vanilla kernel). Inotify can only watch the immediate
             | directory you're observing + there's a pretty low default
             | limit on the number of inotify descriptors that you're
             | allowed to have on top of that
        
         | staticassertion wrote:
         | My guess is that inotify is so slow with large directories that
         | it wasn't worth it. Plus inotify has cumbersome user limits.
         | 
         | inotify has a number of other relevant limitations, like not
         | being able to create recursive notifications or handle "move"
         | operations. Implementation effort is going to be way higher for
         | an inotify-based system, and of course that's made far worse by
         | the numerous file systems in linux - I imagine any
         | implementation would probably start first with ext4.
         | 
         | I suspect an ideal solution would be via ebpf, but I'm not
         | sure.
        
         | est31 wrote:
         | I've been wondering about why there was no linux support, and
         | found an e-mail from the author of the subcommand (as well as
         | the github.blog post) explaining the situation.
         | 
         | Apparently an older implementation using inotify was dropped
         | because inotify does not work recursively, so you would have to
         | do an inotify call for all directories of the hierarchy which
         | is obviously very inefficient. There are system wide limits in
         | the number of directories you can listen to, and even if you
         | increase the limit you would probably cause a lot of overhead.
         | 
         | Newer linuxes support the fanotify system call, which does
         | allow recursive listening. They haven't implemented something
         | using fanotify yet however.
         | 
         | https://lore.kernel.org/git/e1442a04-7c68-0a7a-6e95-304854ad...
        
       | tex0 wrote:
       | This serves as an example to me that git is - maybe - not the
       | right tool for the job.
        
         | elpakal wrote:
         | so what is?
        
         | staticassertion wrote:
         | Moving to a continuous, asynchronous strategy versus a point-
         | in-time synchronous strategy, seems like a perfectly reasonable
         | way to improve performance.
        
         | BudaDude wrote:
         | As with a lot of developer tools, the most adopted solutions
         | are rarely the best tool for the job. But because everyone
         | knows them, thats what continues to be used.
        
       | eurasiantiger wrote:
       | I wonder if this will cause issues in repos where changes can
       | come from containerized apps syncing their runtime config to
       | disk. Depending on the platform and the container framework, a
       | lot of different things could potentially break here, from NFS-
       | related to number of open files.
        
       | saagarjha wrote:
       | I have a healthy suspicion of the performance of file-watchers. I
       | hope this feature doesn't make Git faster at the expense of "all
       | filesystem operations crawl".
        
       | mpawelski wrote:
       | This is awesome. Especially the fact that it's built-in and easy
       | to turn on.
       | 
       | Seams like quite a complex solution though. I guess some big
       | company (Microsoft?) implemented it internally for their own use
       | and later tried to move it to upstream git. I wonder if there was
       | some pushback from git maintainers from having this functionality
       | built-in.
       | 
       | Also why for Windows they use named pipes when in theory Windows
       | also supports it?
       | (https://devblogs.microsoft.com/commandline/af_unix-comes-to-...)
       | 
       | BTW, to the author of this article. It is very good. It was an
       | interesting read. The are some small issues:
       | 
       | - "markdown" link didn't get converted to html:
       | "[core.untrackedcache](https://git-scm.com/docs/git-
       | config#Documentation/git-config...)"
       | 
       | - the link to "philosophy" of Scalar doesn't work:
       | https://github.com/microsoft/git/blob/HEAD/contrib/scalar/do...
        
       | infogulch wrote:
       | What's the current state of git tooling for large files and
       | partial clones?
        
         | infogulch wrote:
         | My holy grail implementation would be a "partial clone" that
         | downloads desired files like normal, but creates stubs for
         | selected files that are not stored on the device but downloaded
         | on-demand upon opening them, like the OneDrive Files On-Demand
         | [1] or Google Drive File Stream.
         | 
         | [1]: https://support.microsoft.com/en-us/office/save-disk-
         | space-w...
        
           | minimalist wrote:
           | Have you looked into git-annex?
        
       ___________________________________________________________________
       (page generated 2022-06-29 23:00 UTC)