Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

symlinks for staged files in directories are not removed in .command.run #4971

Open
nick-youngblut opened this issue May 3, 2024 · 7 comments

Comments

@nick-youngblut
Copy link
Contributor

Bug report

For a failed job in my Nextflow pipeline, I'm manually running bash .command.run and I'm getting ln: failed to create symbolic link 'DIRECTORY_NAME/FILE_NAME.txt': File exists.

The nxf_stage() function includes:

nxf_stage() {
    true
    # stage input files
    mkdir -p 164164 && ln -s /home/nickyoungblut/tmp/work/ad/d0abcbad4c7b9137844e3ba48c8af4/KAPA_mRNA-enrichment_HumanRefRNA_500ng_1e-2dilution_20240417_C01_R1_001/summary.txt 164164/summary.txt
    mkdir -p 6262 && ln -s /home/nickyoungblut/tmp/work/53/f75d0ff2aa376187fafae661a5b400/DJv3_NT1_ctrl_rep1_031524_R1_001/fastqc_data.txt 6262/fastqc_data.txt
    mkdir -p 284284 && ln -s /home/nickyoungblut/tmp/work/be/b0b67f565d4ae5a0230e453acaa236/DJv2_FTH1_kd_rep2_031524_R2_001/fastqc_data.txt 284284/fastqc_data.txt 
    [...]
}

The symlinks are not removed via rm -f prior to recreating them in the nxf_stage() function, and ln -s is used instead of ln -sf. This results in the error when manually re-running .command.run. This make troubleshooting failed jobs harder, since I manually have to delete existing symlinks or comment-out all of the ln -s commands in nxf_stage().

This issue does not occur for files not in staged directories, just for mkdir -p new_directory && ln -s new_directory/new_file.txt.

Expected behavior and actual behavior

See above

Steps to reproduce the problem

This should occur for any pipeline that creates staged files in directories: mkdir -p new_directory && ln -s new_directory/new_file.txt

Program output

See above

Environment

  • Nextflow version: 23.10.1
  • Java version: openjdk 21
  • Operating system: Linux
  • Bash version: 5.2.15

Additional context

See this slack thread

@pditommaso
Copy link
Member

This is likely because you have many staged files, see here

// Delete all previous files with the same name
// Note: the file deletion is only needed to prevent
// file name collisions when re-running the runner script
// for debugging purpose. However, this can cause the creation
// of a very big runner script when a large number of files is
// given due to the file name duplication. Therefore the rationale
// here is to keep the deletion only when a file input number is
// given (which is more likely during pipeline development) and
// drop in any case when they are more than 100
if( len<100 )
delete << "rm -f ${Escape.path(stageName)}"

@nick-youngblut
Copy link
Contributor Author

Thanks @pditommaso for pointing that out! What is the problem with including possibly a few 1000 more lines in the runner script?

@pditommaso
Copy link
Member

It's explained in the comment. To contain the script file size. You can delete all symlink using a Bash oneliner like find . -type l -delete or something similar

@nick-youngblut
Copy link
Contributor Author

Why does the file size need to be contained to <100 lines of removing symlinks? Extending to 1000's of lines will not add much size to the file.

You can delete all symlink using a Bash oneliner like find . -type l -delete or something similar

Why not just use find . -type l -delete instead of removing each symlink individually in the runner script?

@pditommaso
Copy link
Member

lol. need to think it there could be other links. @bentsherman opinion?

@nick-youngblut
Copy link
Contributor Author

need to think it there could be other links

I thought all symlinks were (re)created by the runner script, but maybe I'm mistaken?

@bentsherman
Copy link
Member

Deleting all links should be fine, I can't think of any other links that are created. But Nick also suggested using ln -sf instead of deleting the links, maybe that would be better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants