Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

publishDir compatibility with Azure Blob Storage #4683

Open
endre-seqera opened this issue Jan 24, 2024 · 2 comments · May be fixed by #4692
Open

publishDir compatibility with Azure Blob Storage #4683

endre-seqera opened this issue Jan 24, 2024 · 2 comments · May be fixed by #4692

Comments

@endre-seqera
Copy link

endre-seqera commented Jan 24, 2024

Bug report

Expected behavior and actual behavior

Expected behavior is that publishDir directive should work with Azure links, using different formats.
Actual behavior is that publishDir fails for:

  • Azure links starting with https://
  • or Azure links with paths containing the storage account az://<storage-acccount>.<bucket>

Steps to reproduce the problem

  • set up Nextflow with Azure Cloud (basic set up)
  • run the nf-canary pipeline
  • pass in differently formatted paths for params.outdir

Working example:

nextflow run https://github.com/seqeralabs/nf-canary -r main -w az://nf-scratch/work --outdir "az://test-public" # succeeds

Failing example1 - storage account in the path:

nextflow run https://github.com/seqeralabs/nf-canary -r main -w az://nf-scratch/work --outdir "az://nfazurestore.test-public" # fail
ERROR ~ Error executing process > 'NF_CANARY:TEST_PUBLISH_FOLDER'

Caused by:
  /nfazurestore.test-public: Unable to determine if root directory exists

Failing example 2 - https path used:

nextflow run https://github.com/seqeralabs/nf-canary -r main -w az://nf-scratch/work --outdir "https://nfazurestore.blob.core.windows.net/test-public" # fail
ERROR ~ Error executing process > 'NF_CANARY:TEST_PUBLISH_FOLDER'

Caused by:
  Create directory not supported by HTTPS file system provider

Root cause of failures is:

  • first in FileHelper.groovy paths get transformed into canonicalPath (for example into /<storage-acccount>.<bucket>)
  • then Files.createDirectories(this.path) fails with the given error message

Environment

  • Nextflow version: 23.12.0-edge build 5901
  • Java version: openjdk 21.0.1 2023-10-17 LTS
  • Operating system: macOS Sonoma - 14.2.1 (23C71)
  • Bash version: zsh 5.9 (x86_64-apple-darwin23.0)

Additional context

Reasoning for path with storage account name included support:
Azure bucket/container names are not unique, they are only unique in a storage account. So to be able to identify them correctly, in Seqera Platform the following path format is used az://<storage-acccount>.<bucket>. Because Nextflow has knowledge of the storage account name - it has to be set up in the config - this part could be easily removed from the path, fixing the issue.

Reasoning for path with https support:
Azure docs about referencing blobs suggest using an URL like this: https://<storage-acccount>.blob.core.windows.net/<bucket>.

@bentsherman
Copy link
Member

Don't think it makes sense to support http-based URLs when you are specifying an upload destination, maybe for downloading only

@endre-seqera
Copy link
Author

Don't think it makes sense to support http-based URLs when you are specifying an upload destination, maybe for downloading only

Agreed!
I've create a Pull Request which does a quick fix of removing storage account name, if present, from the path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants