[HN Gopher] ML Experiments Management with Git
       ___________________________________________________________________
        
       ML Experiments Management with Git
        
       Author : shcheklein
       Score  : 15 points
       Date   : 2023-11-02 21:33 UTC (1 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | vinni2 wrote:
       | I gave up on dvc and instead switched to huggingface and wandb
       | because of the way it handled large files and large local cache
       | it downloaded.
        
         | icyfox wrote:
         | I have a similar harness going for my recent experiments,
         | except instead of hosting with huggingface I have a dataframe
         | with pointers to the files on S3 and then just download them
         | during local preprocessing.
         | 
         | Every time I see DVC mentioned I always feel like the idea was
         | so close (and perhaps right in intuition to use git for
         | everything) but the execution had just enough friction that I
         | looked elsewhere. Small DX improvements really do cascade
         | pretty far.
        
           | nerdponx wrote:
           | DVC is great for medium-scale projects in small teams, but
           | that's where I'd stop with it. It only really makes sense for
           | work that you're doing on your own machine, or an old-school
           | Linux server type of setup, not something you'd use for
           | modern-day ML work in a cloud environment.
           | 
           | Also I always thought the idea of using Git branches to track
           | experiments was a bad idea. I would never want to only have
           | one experiment "active" at a time. Even if I'm only running
           | one process at a time, I still want to be able to look at
           | outputs and such all side-by-side. Maybe there's some magic
           | tooling they created that makes it workable.
        
             | unsynced wrote:
             | FYI, you can use git worktrees [1] to work on multiple
             | branches simultaneously
             | 
             | [1] https://git-scm.com/docs/git-worktree
        
       ___________________________________________________________________
       (page generated 2023-11-02 23:00 UTC)