[HN Gopher] Improving large monorepo performance on GitHub ___________________________________________________________________ Improving large monorepo performance on GitHub Author : todsacerdoti Score : 224 points Date : 2021-03-16 16:37 UTC (6 hours ago) (HTM) web link (github.blog) (TXT) w3m dump (github.blog) | seattle_spring wrote: | What geographic feature is pictured in the hero shot of this | blogpost? At first I thought it was the Golden Throne in Capitol | Reef but I now think it's something else. I'm 90% sure it's | either in Capitol Reef or Grand Staircase. | david_allison wrote: | https://www.flickr.com/photos/23155134@N06/7132776459 | | Forest Mountains of Zion National Park, Utah | seattle_spring wrote: | Close but so far! Thanks so much. | chris_wot wrote: | Did they contribute the repack optimizations upstream? | iliekcomputers wrote: | Nice work, really interesting blog post! | | On a sidenote, git itself can also get painfully slow with large | monorepos. Hope GitHub can push some changes there as well. | | I know FB moved off git to mercurial because of performance | issues. | klodolph wrote: | My understanding is that neither Git nor Mercurial can do this | well out of the box, and FB and Google both have their own | extensions to Mercurial to make this possible (because even | though Mercurial is often slower than Git, it's extensible) | | e.g. https://facebook.github.io/watchman/ - used as part of | Facebook's Mercurial solution, I think. | vtbassmatt wrote: | Git also has a file system monitor interface which can use | Watchman. We (GitHub) are working on a native file system | monitor implementation in addition - | https://github.com/gitgitgadget/git/pull/900. | jauer wrote: | And then from mercurial extensions to our own server, | mononoke, which apparently has been moved under the Eden | umbrella: https://github.com/facebookexperimental/eden | jayd16 wrote: | I thought Google used some custom fork of perforce. | klodolph wrote: | From what I understand, Piper is not a fork of Perforce, | but instead a completely different system with the same | interface. You know, built on top of a BigTable or Spanner | cluster instead of whatever Perforce uses. | | The Mercurial extensions are then an _alternative client_ | for Piper. | pitaj wrote: | You might be interested in scalar [1] developed by Microsoft | for handling large repos. | | [1]: https://github.com/microsoft/scalar | WorldMaker wrote: | It's also interesting to note how much of Microsoft's work | for handling large repos in git has merged upstream directly | into git itself. | | One very interesting part of that is the effort that has gone | into the git commit-graph: https://git-scm.com/docs/commit- | graph. | | It's part of what makes scalar interesting compared to some | of the projects you hear mentioned used inside the FB and | Google gates: not only is scalar itself open source, but a | lot of what scalar does is tune configuration flags to turn | on optional git features such as the commit-graph, sparse | checkout "cones", etc that are all themselves directly | supported by the git client. Even if you aren't at the scale | where it makes sense to use all of the tools that scalar | provides, you can get some interesting baby steps by | following scalar's "advice" on git configuration. | rmasters wrote: | If you are willing to adapt to a different structure and | workflow, you can filter the scope of git down dramatically | with sparse checkouts (as @WorldMaker also mentioned). | | https://github.blog/2020-01-17-bring-your-monorepo-down-to-s... | chgibb wrote: | Sparse-checkouts are amazing. I wrote some small tools that | use dependency information in Flutter packages to drive a | sparse-checkout. We use it at $dayjob now. | tuyiown wrote: | Offtopic, but I often wonder if there are people using `git | worktree` to have several related code trees within the same | repo. | | Technically it works the mostly the same as multiple repos, but | theoretically allows to have something like a bootstrap script | with everything self contained in the same repos. Looks like an | alternative tradeoff between a monorepos with shared history and | multiple repositories. | numbsafari wrote: | I'm not sure worktree works exactly as you think. | | I use worktree locally so that, for example, I can have my | working copy that I am doing development in and then a separate | working copy where I can do code review for someone else, | without having to interrupt what I am doing in my own worktree. | | My own experience is that if you are using branches with | radically different content for different purposes in the same | tree, it's going to end up a mess at some point. Worktrees, as | far as I am aware, do not help with that in any special way. | pjc50 wrote: | I tried that, and discovered that it won't let you have two | worktrees with the same branch checked out? | hiq wrote: | Offtopic (I think), but I just learned that "For instance, | git worktree add -d <path> creates a new working tree with a | detached HEAD at the same commit as the current branch." | (from the manpage). | | It's offtopic because the second worktree has a detached | HEAD, so that doesn't help in the case you mention. | ivanbakel wrote: | Which is very sane - what should git do if you modify the | branch in one tree but not in the other? The least painful | solution would require something like multiple-refs-per- | remote-branch, which would be (to my understanding) a re- | architecture. | pjc50 wrote: | Well, I wanted the same effect as two separate checkouts | (ie entirely separate branch structures) but with a bit of | the disk space shared between them, but that's not how it | works. | oftenwrong wrote: | Are you thinking of git subtree? | | https://www.atlassian.com/git/tutorials/git-subtree | hiq wrote: | Do you have an example of this setup? | | As far as I know git worktree is just to have different | branches of the same repo checked out in different locations. | At least, that's the only way I use it (and it's great!). Are | you suggesting to have different projects on different | branches? So an empty "master", then "project1", "project2" | etc. as branches? | adeltoso wrote: | Just 10 years too late, I remember when Facebook switched to | Mercurial because the Git community wouldn't care about big | monorepos. Mercurial is great! | kevincox wrote: | I'm slightly surprised that GitHub is still basically storing a | git repo on a regular filesystem using the Git CLI. I would have | expected that the repos were broken up into individual objects | and stored in an object store. This should make pushes much | faster as you have basically infinitely scalable writes. However | it does make pulls more difficult. However computing packfiles | could still be done (asynchronously) and with some attention to | data-locality it should be possible. | | This would be a huge rewrite of some internals but seems like it | would be a lot easier to manage. It would also provide a some | benefits as objects could be shared between repos (although some | care would probably be necessary for hash collisions) and it | would remove some of the oddness about forks (as IIUC they | effectively share the repo with the "parent" repo). | | I would love to know if something like this has been considered | and why the decided against it. | hobofan wrote: | > I'm slightly surprised that GitHub is still basically storing | a git repo on a regular filesystem using the Git CLI. | | Maybe I'm a bit dense, but how did you get that from the | article? I'm fairly certain that in other pieces of writing | they showed that they are using an object store, and I'm | guessing that's what the "file servers" in the article are. | stuhood wrote: | `git repack` is an operation that is fairly specific to git's | default file format: if they were storing objects in any | (other) database, it is very unlikely that they would | experience blocking repack operations, as that is an area | where databases are highly optimized to execute | incrementally. | lumost wrote: | I am not a github employee, but my 2 cents. | | An object store lacks an index which your typical FS will | provide with a relatively high degree of efficiency. FS's can | be distributed to arbitrary write velocity given an | appropriately distributed block storage solution ( which will | provide the k/v API of an object store that you're looking for | ). Distributed FS's are conveniently compatible with most POSIX | operations rather than requiring bespoke integration. Most | object stores are optimized for largish objects and lack the | ability to condense records into an individual write (via the | block API) or pre-emptively pre-fetch the most likely next set | of requested blocks. | | In the GitHub's case the choice of diverging from GitCLI/FS | based storage APIs could lead to long term support issues and | an implicit "github" flavor of git rather than improving the | core git toolchain. | | Object Stores are great, but if you need some form of index | they get slow and painful really fast. | ddorian43 wrote: | You should be able to split the object store into 2 systems: | 1 metadata (think rdbms/nosql/etc) and a blob-data service, | keeping large files, think 10KB+. Both systems should be able | to be more efficient than the current method. | | Example: you can add erasure coding to the blob-data service | for better efficiency. You can add fancy indexing to your | metadata store. etc etc. | | But somebody has to create it, that's the issue. | lumost wrote: | That's exactly how distributed filesystems are built. | | Systems such as HDFS use the NameNode for this task, but | depending on the exact characteristics of the fileSystem a | multi-master setup is often used. I know of at least one | NFS implementation which uses postgres as its metadata | layer. | Denvercoder9 wrote: | > This should make pushes much faster as you have basically | infinitely scalable writes. However it does make pulls more | difficult. | | I bet GitHub has much more read traffic than write traffic, so | this trade-off does not make sense. | random5634 wrote: | Seriously, imagine the compute and requests costs to assemble | a large git pull. | cordite wrote: | Sounds a lot like this here | https://github.com/Homebrew/brew/pull/9383 | kevincox wrote: | I said "difficult" not expensive. Once you assembled the | packfiles (much like they do today) it should be roughly the | same cost. | WorldMaker wrote: | Yes and I would also imagine that the trade-offs between | writing a proprietary object storage and reusing the battle- | tested object storage that everyone else uses would have been | considered as well. | | It seems like the sort of thing that would be an interesting | open source research topic if you could build an object | database for git that performs better than its packed in | filesystem object store. But it's probably not something you | want to do as a proprietary project with fewer eyeballs on | its performance trade-offs and more engineering work every | time git slightly changes its object storage behavior which | would remain tuned for the filesystem object store because it | was entirely unaware of your efforts. | oconnor663 wrote: | I think the GitHub folks have written more than one article | about this. I'm not sure I can find the one I'm thinking of, | but here's another one: | https://github.blog/2016-04-05-introducing-dgit/ | | > Perhaps it's surprising that GitHub's repository-storage | tier, DGit, is built using the same technologies. Why not a | SAN? A distributed file system? Some other magical cloud | technology that abstracts away the problem of storing bits | durably? The answer is simple: it's fast and it's robust. | parhamn wrote: | Have you ever tried it? It's not remotely performant and | wouldn't make sense since GH is read heavy. Plus I'm sure they | spend a lot of time thinking about this stuff, no? | | If you want to get your feet wet, check out go-git[1]. It's a | native golang implementation of git. They have a storage layer | abstracted over a lean interface that you quickly create | alternative drivers for in golang. You'll be effectively | implementing poorly sharded file system on a database, then it | becomes obvious why scaling the FS is just easier. | | [1] https://github.com/go-git/go-git/tree/master/storage | brown9-2 wrote: | > Plus I'm sure they spend a lot of time thinking about this | stuff, no? | | I think this is unfair - the author was not insinuating that | the people who designed this system at Github are stupid in | some way, but just asking if other architectures have been | considered. | parhamn wrote: | > ...Github are stupid in some way, but just asking if | other architectures have been considered. | | To me, asking an engineering org if they've considered | alternative architectures for their main engineering | problem is silly at best, overconfident at worst. | ben0x539 wrote: | I think the main point of the comment was asking _why_ | they decided against it. At one point the wording mildly | suggests the possibility that no one at github has | thought about it: | | > I would love to know if something like this has been | considered and why the decided against it. | | ... but that still sounds more like a grammatical hedge | than an actual suggestion that github didn't think it | through. | | imo it's fair to lay out why you're surprised about some | decision in the hopes that someone will enlighten you, | even if it can be tricky to phrase that without coming | off like a "why didn't you just..." comment. | tkiolp4 wrote: | > but just asking if other architectures have been | considered. | | That's the polite way of calling them stupids ;) | JeremyBanks wrote: | They're just expressing curiosity. Jeeze. | ben0x539 wrote: | The problem with recognizing that a lot of phrasings are | just the polite way to call someone stupid/tell someone | to fuck off/etc is that you start seeing assholes | whenever someone is just trying to be polite. :( | mvzvm wrote: | This kind of comment is why every single project needs to be | justified with "What problems are you solving?" and "What | usecase are you supporting?". Because I could 150% imagine | somebody getting excited about this and then: | | 1) Framing it as such with poor justification "a lot easier to | manage" | | 2) "This would be a huge rewrite of some internals" Becoming a | multi-year migration quagmire | | 3) The dawning realization that you have used a write-heavy | architecture in a read-heavy system | Ericson2314 wrote: | I always had the impression GitHub was not preemptively | investing in the fundamentals like that. So yeah, agree it's | bummer but also not surprised. | | And hey, at least that means a post GitHub FOSS world won't be | leaving fundamental improvements behind! | lamontcg wrote: | Additionally to everything else in this thread it'd be nice to | see better support for monorepos in the github UI as well. | | Something like the ability to have | github.com/<org>/<repo>/<subproject>/issues be a shard of all the | issues for a subproject. | | You can do that with tagging, but that's a bit of a PITA because | that's all fairly bad and unscalable of a UI. | masklinn wrote: | > Improving repository maintenance | | There's one thing I'd really like to see there: the ability to | lock out the repository and perform a _really_ aggressive repack. | I 'm talking `-AdF --window=500` or somesuch. On $dayjob's | repository, the base checkout is several gigs. Aggressively | repacking it reduces its size by 60%. | | There's also a git-level thing which would greatly benefit large | repositories: for packs to be easier to craft and more reliably | kept, so it's easier to e.g. segregate assets into packs and not | bother compressing that, or segregate l10n files separately from | the code and run a more expensive compression scheme on _that_. | tasuki wrote: | > On $dayjob's repository, the base checkout is several gigs. | | Why is it several gigs? Is that really necessary? | chrisseaton wrote: | > Why is it several gigs? Is that really necessary? | | A lot of code written by a lot of engineers over a lot of | years. | | I'm not sure what other answer you're expecting? | | I work with a compiler that has a ten of tens on it over a | decade or so and even that's 5 GB. No binary assets. I really | don't think it's that unusual. | wikibob wrote: | When is GitHub going to finally add support for Microsoft's | VFSforGit? | | https://github.com/microsoft/VFSForGit | | https://vfsforgit.org/ | hyperrail wrote: | I'm not sure that will ever happen [1] as Microsoft itself is | limiting active development of VFS for Git in favor of Scalar | [2] by the same team, which aims to improve client-side big | repo performance _without_ having to use OS-level file system | virtualization hooks. | | I don't believe VFS for Git will ever be abandoned by | Microsoft, but I'm doubtful it will ever get any more major | improvements from them. | | Scalar does use the VFS for Git client-server protocol, and | both Scalar and VFS for Git rely on the same improvements to | the git app itself, so I could imagine that GitHub would adopt | the GVFS protocol and support Scalar without formally | supporting GVFS itself. | | [1] GitHub did announce future GVFS support in 2017 - | https://venturebeat.com/2017/11/15/github-adopts-microsofts-... | - but if anything came out of that I don't see it in GitHub | help today. | | [2] https://github.com/microsoft/scalar | vitorgrs wrote: | You know Microsoft runs windows repo with VFS right? | hyperrail wrote: | Yes, I do know the Windows os git repo uses GVFS. In fact, | I shared my personal experience with git in the os repo | some time ago: | https://news.ycombinator.com/item?id=20748778 | | When I left Microsoft about half a year ago, GVFS and | Scalar were both in heavy use there. | hyperrail wrote: | I should clarify that Scalar does not _require_ a VFS for Git | server to work correctly, even though it can get significant | benefits if a VFS server is available. This means you can use | Scalar today with GitHub, but not VFS. | | Scalar also supports Windows and macOS, while VFS only | supports Windows: https://github.com/microsoft/VFSForGit/blob | /v1.0.21014.1/doc... | vtbassmatt wrote: | Hey, I'm the product manager for Git Systems at GitHub. Can you | share more about how you'd use VFS for Git / GVFS protocol if | we had it on GitHub? | | Right now we don't plan on supporting it; most of our work is | focused on upstreamable changes and opinionated defaults. But | that could change if we're missing some important use cases. | | Feel free to email me - my HN alias @github.com - if you prefer | to discuss privately. | jmull wrote: | I'm curious what counts as a "large" monorepo? | bob1029 wrote: | This is a very subjective evaluation. You could look at # of | files versioned, total bytes of the repository on disk, # of | logical business apps contained within, total # of commits, | etc. | | For me, its any repository where I would think "damnit im going | to have to do a fresh clone" if the situation comes up. There | isnt a hard line in the sand, but there is certainly some | abstract sensation of "largeness" around git repos when things | start to slow down a bit. | crecker wrote: | I can bet whatever you want they did this improvement for | microsoft/windows repo. | noahl wrote: | Microsoft/windows is hosted on Azure DevOps, and they have also | blogged about what they've done to improve its performance! | | Here's a recent post: | https://devblogs.microsoft.com/devops/introducing-scalar/ | WorldMaker wrote: | Rumors are Azure DevOps and GitHub are converging "soon", and | maybe "Project Cyclops" wasn't specifically to improve | Microsoft/Windows repo performance, but it seems reasonable | given the convergence rumors it could be a step in the | direction of preparing for/migrating the repo to GitHub. Of | course Microsoft doesn't want to panic Enterprise developers | on Azure DevOps just yet so they are extremely quiet right | now about any convergence efforts, so I take the rumors with | a grain of salt. It is something that I wish Microsoft would | properly announce sooner rather than later as it might | provide momentum towards GitHub in capital-E Enterprise | development world (even if will panic those that are still | afraid of GitHub for whatever reasons). | endisneigh wrote: | kind of an aside, but what's the best practice for pushing and | building separate projects in a monorepo? | | say you have a structure like: projectA projectB sharedUtils | | Each time you push you might have a build for projectA and | projectB but it builds both each time you push to master. Ideally | you could use Git to see if anything in projectA or sharedUtils | changed to trigger projectA's build and same for projectB, but | I'm curious what others are doing. | numbsafari wrote: | Perhaps check out a tool like please[1]. There are other tools | in this space, but that one has worked well for me without the | complexity of some other, similar tools. | | [1] https://please.build | oftenwrong wrote: | I can't speak for using it in a massive monorepo, but I | started using https://please.build for some of my personal | projects recently just as an alternative to the dominant Java | build systems (Ant/Maven/Gradle). It's far more | straightforward to use, and incremental builds actually work | reliably. | zdw wrote: | Monorepos require much more care to be put into the | integration/CI side of the process. | | This is worth a read: https://yosefk.com/blog/dont-ask-if-a- | monorepo-is-good-for-y... | alfalfasprout wrote: | As a few others have mentioned this is something that build | systems handle since they understand the dependency graph. For | example, Bazel is often used to this end. | | However... I would _strongly_ advise not going for a monorepo. | No, I don 't mean something like tensorflow where you have a | bunch of related tools and projects in a single repo. I mean | one repo for the entire org where totally unrelated projects | live. | | Every company I've been at that used a monorepo found | themselves struggling to make it work since you need a ton of | full time engineers just to keep things working and scaling. | Many of the problems that monorepos try to solve (simplifying | dependency and version management) are traded for 10x as many | problems and many of them are hard (incremental builds, | dependency resolution). | | Google has a huge team in charge of helping their monorepo | scale and work efficiently. You are not google... don't be | tempted. | jschwartzi wrote: | Well, sure. if you have a pile of totally unrelated things | that never need to change in lock-step, then you don't need a | "monorepo." But on the other hand if you're building an | entire software system such as a collection of API services, | a database schema, embedded device firmware, and a website, | and all of these things are interdependent and incompatible | across versions then please for the love of god use a | monorepo. | | At my job our cloud team uses multiple separate repositories | which makes sense, but it also moves the burden of versioning | to run-time. This is because they have to interface with | multiple different versions of the device firmware. So they | deploy different run-time versions of the APIs to support | legacy and current production firmware versions. But our | firmware repository is a monorepo in that the sources and | build system builds the artifacts for multiple devices from | the same source tree. | | So it's not so cut and dried as "never use a monorepo" or | "always use a monorepo." It involves engineering tradeoffs | and decisions that are made in a context, and you can't | extract your advice from the context in which it exists. What | works for our cloud team would be a terrible mess on the | embedded side simply because of how the software is deployed | and managed. | jayd16 wrote: | "You are not google" is also an argument for why you don't | have to worry about scaling a monorepo. | benreesman wrote: | I'll try to tread at least a little lightly here because this | topic does tend to be a bit flammable, but caveat emptor. | | My contrasting anecdotal experience is that whether at BigCo | or on a small team monorepo is almost always the right answer | until your requirements get exotic enough that you're in | special-case land anyways (like a separate repo for machine- | initiated commits, or something that's security-sensitive | enough to wall off some contributors). | | Both `git` and `hg` scale easily to to really big projects if | you're storing text in them (at FB our C++ code was in a | `git` monorepo on stock software until like 2014 or something | before it started bogging down, I'll gloss over the numbers | but, big): the monorepo-scaling argument is brought out a lot | but rarely quantified. | | The multi-repo problem that gets you is dependency | management, which in the general case requires a SAT-solver | (https://research.swtch.com/version-sat), but of course you | don't have a SAT-solver in your build script for your small- | to-medium organization, so you get some half-assed thing like | what `pip` and `npm` do. | | Again purely anecdotal, but in my personal experience multi- | repo too often gets pushed by folks who want to make their | own rules for part of the codebase ("the braces go _here_ "), | push an agenda around unnecessary RPCs, or both. That's not | true of all cases of course, but it's a common enough | antipattern to be memorable. | adsfoiu1 wrote: | I personally have seen the opposite problem - the friction of | making small changes to "utility" libraries becomes a huge | pain point for developers when you have to make changes, test | locally, push to package manager, update all consumers to use | the new version... It's much easier, in my experience, to | just consume a class that's already in the same project / | repo. | agency wrote: | I have also experienced this pain where a company I worked | for went too hard on splitting every thing into separate | repos, such that updating something deep in the dependency | tree becomes very painful and involves a protracted | "version bump dance" on dependent repos. There's no silver | bullet here. | TechBro8615 wrote: | > Google has a huge team in charge of helping their monorepo | scale and work efficiently. You are not google... don't be | tempted. | | It's funny, I've heard this exact same argument for why you | should not use micro services. | brown9-2 wrote: | The other half of needing to use a build system that | understands the dependency graph like Bazel is that Bazel | _keeps state_, so that it knows which part of the graph does | not need to be re-built when you push commit B because it was | already built in commit A. | simias wrote: | If separate projects have independent builds maybe a monorepo | was not a great idea to begin with? | | I have a big monorepo at work but whenever anything changes I | want to rebuild everything to generate a new firmware image. I | have ccache setup to speedup the process given that obviously | only a tiny fraction of the code actually needs to be rebuilt. | | It's a bit wasteful, sure, but if I were to optimize it I'd be | worried about ending up with buggy, non-reproducible builds. | Easier to just recompile everything every time and make sure | everything still works the way you expect. | | So basically my approach is KISS, even if it means longer build | times. | jrockway wrote: | That's what build systems aim to do, and there are many of | them. In general, I've found all the tooling required around | monorepos to be a job for a full-time team. Shortcuts (as | suggested in other replies) or full builds on every commit tend | to stop scaling relatively quickly. If you take shortcuts, you | will find that it becomes "tribal knowledge" to do a full build | every time you edit a single line of code, and people who were | once making multiple changes a day start making one change a | week, or they start committing code without ever having run it. | (It happened to me on a 4 person team. We had so many things | that needed to be running to test your app, that people just | started committing and pushing to production without ever | having run their changes locally! That is the kind of thing | that happens if you stop caring about tooling, and it happens | fast. I addressed it by taking a couple of days to start up the | exact environment locally in a few seconds, without a docker | build, and people started running their code again.) Be very, | very careful. | | If you do a full build on every commit, it gets slow much | sooner than you'd expect, and people are going to do less work | while they context switch to posting to HN while waiting for | their 15 minute build for a 1 line code change. | | I worked at Google and we had a monorepo, and there were | hundreds if not thousands of engineers working on build speed | and developer productivity, and it was still significantly | slower to "bazel run my-go-binary" versus "go run cmd/my-go- | binary". In many cases, it was worth it, but in very isolated | applications, it was definitely not worth it. (And people did | work around it, by just setting up Git somewhere and using | Makefiles or whatever, and that ended up being even worse. But | it gets worse incrementally over time, and you're kind of the | frog getting boiled alive.) | | Where I'm going with is to advise you to be very careful. The | tools to support real productivity in a monorepo are expensive | in terms of your org's time. If you can get by with a repo per | app and a common modules repo, and just update the app to refer | to a version of the modules repo as though it's some random | open source project you depend on, you're going to get much | farther with much less tooling work than you would with a | monorepo. But, the modules repo is going to break apps without | knowing, and that's going to be a pain. Monorepos do exist for | a good reason. | | (The other thing I like about monorepos is that you do less | per-project setup work. Want to make some new app? You can just | start writing it, and you get the build, deploy, framework, | etc. for free. It can be very productive if you're finding | yourself starting new projects regularly. In my spare time, I | write a software, and I really regret splitting it up into | multiple projects. But, it's kind of necessary for open source | stuff -- people don't want to download ekglue if they want to | just run jlog. So I split them, but it costs me my valuable | free time to do something I've already done ;) | | My TL;DR is that you will be tempted to take shortcuts and the | shortcuts will suck. If your project has the resources to have | someone set up Bazel, distribute the right version of Bazel and | the JRE to developer workstations, setup CI that is aware of | Bazel artifact caching, and SREs to be around 24/7 to support | your now-custom build environment, you will have a good | experience. Be aware that a monorepo is that level of | investment. | | Meanwhile, if you just have a frontend and a backend in the | same repo, you can probably get away with a full build for | every commit. And you don't need that shadow team of tooling | engineers to make it work, you just need a docker build, and a | script that runs "go test ./... && npm test" or whatever ;) | dgellow wrote: | IIRC you can specify filters "paths" and "paths-ignore" when | you define a github action that should only be triggered when a | subdirectory changes. | | See this documentation page: | https://docs.github.com/en/actions/reference/workflow-syntax... | | Their example is: on: push: | paths: - '\*.js' | | but I believe you can also specify the subdirectory you care | about. | kroolik wrote: | > We made a change to compute these replica checksums prior to | taking the lock. By precomputing the checksums, we've been able | to reduce the lock to under 1 second, allowing more write | operations to succeed immediatelly. | | Isnt this changeset introducing a race condition? One of the | replicas' checksum could change between the checksum is computed | and the lock is taken. Otherwise, there is no need for the lock | at all. | jacoblambda wrote: | You can compute the checksums outside of the lock. You just | need to compare them inside the lock. | | The key thing here is that prior to the lock if data changes | you recompute the checksums. As long as any change outside the | lock triggers a recompute of the corresponding checksums and no | changes can occur during the lock, there is no race condition. | | I imagine that this may result in data getting de- | synced/failing the checksum comparisons more often however it's | still a net performance increase as long as the aggregate time | spent re-syncing the data is less than the extra time spent | waiting for checksums in the lock. | alexhutcheson wrote: | No, it's just switching from a pessimistic locking approach to | an optimistic one: | https://en.wikipedia.org/wiki/Optimistic_concurrency_control ___________________________________________________________________ (page generated 2021-03-16 23:00 UTC)