[HN Gopher] Reflow, a language for distributed, incremental data...
       ___________________________________________________________________
        
       Reflow, a language for distributed, incremental data processing in
       the cloud
        
       Author : krab
       Score  : 67 points
       Date   : 2021-05-18 04:49 UTC (2 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | all2 wrote:
       | From the README:
       | 
       | Reflow comprises:
       | 
       | - a functional, lazy, type-safe domain specific language for
       | writing workflow programs;
       | 
       | - a runtime for evaluating Reflow programs incrementally,
       | coordinating cluster execution, and transparent memoization;
       | 
       | - a cluster scheduler to dynamically provision and tear down
       | resources from a cloud provider (AWS currently supported).
       | 
       | and
       | 
       | Reflow was designed to support sophisticated, large-scale
       | bioinformatics workflows, but should be widely applicable to
       | scientific and engineering computing workloads. It was built
       | using Go.
       | 
       | Reflow joins a long list of systems designed to tackle
       | bioinformatics workloads, but differ from these in important
       | ways:
       | 
       | - it is a vertically integrated system with a minimal set of
       | external dependencies; this allows Reflow to be "plug-and-play":
       | bring your cloud credentials, and you're off to the races;
       | 
       | - it defines a strict data model which is used for transparent
       | memoization and other optimizations;
       | 
       | - it takes workflow software seriously: the Reflow DSL provides
       | type checking, modularity, and other constructors that are
       | commonplace in general purpose programming languages; because of
       | its high level data model and use of caching, Reflow computes
       | incrementally: it is always able to compute the smallest set of
       | operations given what has been computed previously.
        
       | dannykwells wrote:
       | If you're into workflow runners, Reflow and Cromwell
       | (https://github.com/broadinstitute/cromwell) are the only two
       | really to consider. Having tried them all, these two are by far
       | the best and most supported (and there are 100s!)
       | 
       | Cromwell is great because it is google cloud native and supported
       | within the Terra ecosystem (https://app.terra.bio/) meaning you
       | do not need to host it yourself - you can just connect your
       | google account and go.
       | 
       | Reflow, I've heard, is a little more "professional" given that
       | the Grail team is heavily ex-Google. But both can scale to
       | massively parallel (1000+ parallel analyses).
        
         | aednichols wrote:
         | Thanks for the shoutout. Cromwell/Terra developer here in an
         | informal capacity, can answer Qs.
        
       | epistasis wrote:
       | Interesting to see Grail share this, I'm excited to try it out.
       | 
       | I'm perpetually unsatisfied with bioinformatics workflow
       | software. Snakemake and GNU make remain my favorites so far in
       | terms of developing novel analysis. However, making GNU make into
       | a reusable pipeline always feels like an awful and ugly hack. And
       | GNU make requires a shared file system among nodes, which is
       | problematic on AWS...
       | 
       | This seems to have potential for both recording the steps for
       | reproducible science, but also turning those set of steps into a
       | reusable pipeline easily.
        
         | fwip wrote:
         | My personal favorite is Nextflow (http://nextflow.io/). Quick
         | to start up a one-off script in, and it's ready to run in
         | production without too much tweaking.
         | 
         | Edit: I especially appreciate the wide range of supported
         | systems for both dependency management (running the gamut from
         | GNU modules or conda to docker/singularity containers) and
         | execution environments (local, SLURM, SGE, AWS, Azure, etc.)
        
       | The_Amp_Walrus wrote:
       | Is reflow go only?
       | 
       | is the .rf file format a DSL or an existing language?
        
         | prb2 wrote:
         | Reflow is implemented in Go, but it can be used to run programs
         | in any language.
         | 
         | The Reflow language (.rf files) is a DSL, the language is
         | described in more detail here:
         | https://github.com/grailbio/reflow/blob/master/LANGUAGE.md
        
         | mariusae wrote:
         | If you want to work with Go, check out bigslice [1] (and
         | bigmachine [2]), which is built on a similar architecture.
         | 
         | [1] https://github.com/grailbio/bigslice/ [2]
         | https://github.com/grailbio/bigmachine
        
       ___________________________________________________________________
       (page generated 2021-05-20 23:00 UTC)