[HN Gopher] ML Experiments Management with Git ___________________________________________________________________ ML Experiments Management with Git Author : shcheklein Score : 15 points Date : 2023-11-02 21:33 UTC (1 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | vinni2 wrote: | I gave up on dvc and instead switched to huggingface and wandb | because of the way it handled large files and large local cache | it downloaded. | icyfox wrote: | I have a similar harness going for my recent experiments, | except instead of hosting with huggingface I have a dataframe | with pointers to the files on S3 and then just download them | during local preprocessing. | | Every time I see DVC mentioned I always feel like the idea was | so close (and perhaps right in intuition to use git for | everything) but the execution had just enough friction that I | looked elsewhere. Small DX improvements really do cascade | pretty far. | nerdponx wrote: | DVC is great for medium-scale projects in small teams, but | that's where I'd stop with it. It only really makes sense for | work that you're doing on your own machine, or an old-school | Linux server type of setup, not something you'd use for | modern-day ML work in a cloud environment. | | Also I always thought the idea of using Git branches to track | experiments was a bad idea. I would never want to only have | one experiment "active" at a time. Even if I'm only running | one process at a time, I still want to be able to look at | outputs and such all side-by-side. Maybe there's some magic | tooling they created that makes it workable. | unsynced wrote: | FYI, you can use git worktrees [1] to work on multiple | branches simultaneously | | [1] https://git-scm.com/docs/git-worktree ___________________________________________________________________ (page generated 2023-11-02 23:00 UTC)