Individuals using Nextflow are usually very concerned with reproducibility, and
at the same time, they like their output files well organized. The
publishDir
process directive is very useful for that, as you can choose what files you
want to store, how, and where. However, if you're really worried about
reproducibility, it'd be interesting to know what configurations were used in
the specific run of the pipeline that generated these output files. The snippet
below which is a solution to this problem will make use of the
workflow.configFiles
Nextflow internal variable, that consists of a list of
all the configuration files used.
params.outdir = 'results'
process FOO {
publishDir "${params.outdir}/FOO/", mode: 'copy'
output:
path 'my_output_file'
"""
echo FOO > my_output_file
"""
}
workflow {
FOO()
Channel
.fromList(workflow.configFiles)
.collectFile(storeDir: "${params.outdir}/configs")
}
Save the snippet above as save_conf_files.nf
and run the command line below to
run the pipeline:
nextflow run save_conf_files.nf
If you have a nextflow.config
file in your current directory, the tree
command below will show you the following output:
tree results
results
├── FOO
│ └── my_output_file
└── configs
└── nextflow.config
3 directories, 2 files
Let's say you have another file called another_conf.config
and you provided
it with the -c
nextflow option, as in the command line below:
nextflow run save_conf_files.nf -c another_conf.config
The tree command should then show the following:
tree results
results
├── FOO
│ └── my_output_file
└── configs
├── another_conf.config
└── nextflow.config
3 directories, 3 files
The trick here is to use the workflow.configFiles
variable and the
collectFile
channel operator to store them in the same path you provided to the publishDir
directive.
Note
If by any chance there was no configuration file for a run, the configs
folder will still be there, but empty.