[HN Gopher] Reflow, a language for distributed, incremental data... ___________________________________________________________________ Reflow, a language for distributed, incremental data processing in the cloud Author : krab Score : 67 points Date : 2021-05-18 04:49 UTC (2 days ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | all2 wrote: | From the README: | | Reflow comprises: | | - a functional, lazy, type-safe domain specific language for | writing workflow programs; | | - a runtime for evaluating Reflow programs incrementally, | coordinating cluster execution, and transparent memoization; | | - a cluster scheduler to dynamically provision and tear down | resources from a cloud provider (AWS currently supported). | | and | | Reflow was designed to support sophisticated, large-scale | bioinformatics workflows, but should be widely applicable to | scientific and engineering computing workloads. It was built | using Go. | | Reflow joins a long list of systems designed to tackle | bioinformatics workloads, but differ from these in important | ways: | | - it is a vertically integrated system with a minimal set of | external dependencies; this allows Reflow to be "plug-and-play": | bring your cloud credentials, and you're off to the races; | | - it defines a strict data model which is used for transparent | memoization and other optimizations; | | - it takes workflow software seriously: the Reflow DSL provides | type checking, modularity, and other constructors that are | commonplace in general purpose programming languages; because of | its high level data model and use of caching, Reflow computes | incrementally: it is always able to compute the smallest set of | operations given what has been computed previously. | dannykwells wrote: | If you're into workflow runners, Reflow and Cromwell | (https://github.com/broadinstitute/cromwell) are the only two | really to consider. Having tried them all, these two are by far | the best and most supported (and there are 100s!) | | Cromwell is great because it is google cloud native and supported | within the Terra ecosystem (https://app.terra.bio/) meaning you | do not need to host it yourself - you can just connect your | google account and go. | | Reflow, I've heard, is a little more "professional" given that | the Grail team is heavily ex-Google. But both can scale to | massively parallel (1000+ parallel analyses). | aednichols wrote: | Thanks for the shoutout. Cromwell/Terra developer here in an | informal capacity, can answer Qs. | epistasis wrote: | Interesting to see Grail share this, I'm excited to try it out. | | I'm perpetually unsatisfied with bioinformatics workflow | software. Snakemake and GNU make remain my favorites so far in | terms of developing novel analysis. However, making GNU make into | a reusable pipeline always feels like an awful and ugly hack. And | GNU make requires a shared file system among nodes, which is | problematic on AWS... | | This seems to have potential for both recording the steps for | reproducible science, but also turning those set of steps into a | reusable pipeline easily. | fwip wrote: | My personal favorite is Nextflow (http://nextflow.io/). Quick | to start up a one-off script in, and it's ready to run in | production without too much tweaking. | | Edit: I especially appreciate the wide range of supported | systems for both dependency management (running the gamut from | GNU modules or conda to docker/singularity containers) and | execution environments (local, SLURM, SGE, AWS, Azure, etc.) | The_Amp_Walrus wrote: | Is reflow go only? | | is the .rf file format a DSL or an existing language? | prb2 wrote: | Reflow is implemented in Go, but it can be used to run programs | in any language. | | The Reflow language (.rf files) is a DSL, the language is | described in more detail here: | https://github.com/grailbio/reflow/blob/master/LANGUAGE.md | mariusae wrote: | If you want to work with Go, check out bigslice [1] (and | bigmachine [2]), which is built on a similar architecture. | | [1] https://github.com/grailbio/bigslice/ [2] | https://github.com/grailbio/bigmachine ___________________________________________________________________ (page generated 2021-05-20 23:00 UTC)