Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow output definition #4784

Merged
merged 54 commits into from
May 17, 2024
Merged

Workflow output definition #4784

merged 54 commits into from
May 17, 2024

Conversation

bentsherman
Copy link
Member

Close #4670

This PR adds a DSL for defining workflow outputs. It is meant to completely replace the publishDir directive, and centralize the definition of published outputs into one location.

This first iteration is essentially a builder DSL for a directory structure, which also collects the essential elements of the existing publish mechanism: process selectors to select process outputs, and publish options like pattern and enabled (other options like mode can also be supported this way).

Under the hood, there is now a WorkflowPublisher which contains all of the publish "selectors" and uses PublishDir just like the directive. So the publishing itself is also handled in one place rather than during the task finalization.

See nf-core/fetchngs#275 for a prototype with nf-core/fetchngs

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Copy link

netlify bot commented Feb 28, 2024

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 0db73c8
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/664665b20da96700083892ac

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman

This comment was marked as outdated.

@pditommaso

This comment was marked as outdated.

@bentsherman

This comment was marked as outdated.

@pditommaso

This comment was marked as outdated.

@bentsherman

This comment was marked as outdated.

@pditommaso

This comment was marked as outdated.

@bentsherman

This comment was marked as outdated.

@bentsherman

This comment was marked as outdated.

@marcodelapierre

This comment was marked as outdated.

@bentsherman

This comment was marked as outdated.

@pinin4fjords

This comment was marked as outdated.

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@marcodelapierre
Copy link
Member

Let me try and clarify, in the case of rnaseq-nf we need to give an emit name to a module channel to be able to publish it, e.g. see this diff:

      output:
-    path "fastqc_${sample_id}_logs" 
+    path "fastqc_${sample_id}_logs", emit: logs
+
+    publish:
+    logs >> '.'

So my thinking point is whether we should have a syntax for the publish section that does not require giving the emit name logs to the channel to be published. As I was saying above, my personal answer is that we don't need it, but still wanted to leave this for your consideration.

@bentsherman
Copy link
Member Author

With static types we are moving towards a syntax where every process output will have a name by definition (<type> <name> = <value>), so in time this will not be an issue

@marcodelapierre
Copy link
Member

thanks for the reminder Ben

@marcodelapierre
Copy link
Member

marcodelapierre commented May 8, 2024

I am happy with the testing of the feature with the Fetchngs POC (both linux vm and Slurm cluster); there I disabled nf-validation.

For the rnaseq POC, how do I get around the following error?

ERROR ~ Install is not supported on dev mode - Missing plugin nf-validation 1.1.3

(nf-validation here seems needed to use fromSamplesheet)

@bentsherman
Copy link
Member Author

You can install the local build with make install. It will replace the installation of your current Nextflow version and allow you to run pipelines with third-party plugins. It won't use the local build of core plugins like nf-amazon, but that's okay because this PR doesn't modify any core plugins.

@marcodelapierre
Copy link
Member

Testing Rnaseq was also successful. As Ben expected, output directory layout is not identical, but output files are there.

pditommaso and others added 5 commits May 12, 2024 14:58
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@pditommaso
Copy link
Member

While testing the output index I've got this

ERROR ~ Index file record must be a list or map: /Users/pditommaso/Projects/rnaseq-nf/work/10/f200de889f31fcba450bab2d2964a3/fastqc_ggal_gut_logs [UnixPath]

 -- Check '.nextflow.log' file for details

What about instead creating a simple index as "file name", "file path" ?

@pditommaso
Copy link
Member

Made a few tests and works like a charm. I was expecting a much bigger impact in the codebase, great work Ben.

I've commented a few things to address above

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@pditommaso
Copy link
Member

Better, but it's not an index file

"/Users/pditommaso/Projects/rnaseq-nf/results/fastqc/fastqc_ggal_gut_logs"

I was expecting as

"fastqc_ggal_gut_logs", "/Users/pditommaso/Projects/rnaseq-nf/results/fastqc/fastqc_ggal_gut_logs"

@bentsherman
Copy link
Member Author

That can be done with the index mapper:

index {
  mapper { file -> [file.name, file] }
}

@pditommaso
Copy link
Member

I know, but it would be nice to have it as default behaviour

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@pditommaso
Copy link
Member

and it's ready!

@pditommaso pditommaso merged commit cf0546b into master May 17, 2024
22 checks passed
@pditommaso pditommaso deleted the 4670-workflow-outputs branch May 17, 2024 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Workflow output definitions and schema
5 participants