Skip to content(if available)orjump to list(if available)

Nextflow: System for creating scalable, portable, reproducible workflows

_Wintermute

The choice of groovy was unfortunate, but yet it still seems more popular than snakemake which I can only attribute to the nf-core set of curated workflows.

I have a dislike of nextflow because it submits 10s of thousands of separate jobs to our HPC scheduler which causes a number of issues, though they've now added support for array jobs which should hopefully solve that.

armedgorilla

At a previous Biotech, we used Cromwell/WDL because the DSL was the most intuitive to our bioinformatics scientists. But seeing as that doesn't work as nicely on AWS (and is also supported by an organization that is imploding), we opted for Argo on our K8s cluster to process RNAseq data en masse. Getting the scientists to use YAMl has been an uphill struggle, but the same issues would apply to learning groovy I guess. We've found that the Argo engine is easier to maintain, and also we only have to support one orchestrator across our Bioinformatics and ML teams.

For industrial purposes, I've started to approach these pipelines as a special case of feature extraction and so I'm reusing our ML infrastructure as much as possible.

totalperspectiv

I would rather write Groovy than YAML any day of the week.

Why did you rule out Nextflow or Snakemake? I believe they both work with k8 clusters.

Argo doesn’t look great from my standpoint as a workflow author.

totalperspectiv

Cool seeing a workflow language pop up on HN!

Nextflow and Snakemake are the two most-used options in bioinformatics these days, with WDL trailing those two.

I really wish Nextflow was based on Scala and not Groovy, but so it goes.

There is a Draft up for dsl3 that adds static types to the channels that I’m very excited about. https://github.com/nf-core/fetchngs/pull/309

azan_

I've used Snakemake my whole life, can someone experienced with both systems share whether jumping to nextflow is worth it?

Protostome

I have pipelines written in both frameworks. Nextflow (despite the questionable selection of groovy as the language of choice) is more powerful and enables greater flexibility in terms of information flow.

For example, snakemake makes it very difficult if not impossible to create pipelines that deviate from a DAG architecture. In cases where you need loops, conditionals and so on, Nextflow is a better option.

One thing that I didn't like about nextflow is that all processes can either run under apptainer or docker, you can mix and match docker/apptainer like you do in snakemake rules.

totalperspectiv

NF Tower / Seqera would be the selling points. They offer a nice UX for managing pipelines and abstract over AWS.

Technically snakemake can do it all. But in practice NF seems to scale up a bit better.

That said, if you don’t need the UI for scientists, I’d stick to snakemake.

01760809434

[flagged]

01760809434

[flagged]