Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test on some bigger DANDI datasets using datalad-fuse #6

Open
yarikoptic opened this issue Apr 15, 2024 · 8 comments
Open

Test on some bigger DANDI datasets using datalad-fuse #6

yarikoptic opened this issue Apr 15, 2024 · 8 comments

Comments

@yarikoptic
Copy link
Member

Ultimately and eventually this tool should work on any DANDI dataset with nwb files. But for initial start we can concentrate on those which have (only?) extracellular (ecephys) data.

@bendichter could recommend some specific ones. Meanwhile https://dandiarchive.org/dandiset/search?search=ecephys could give a start point to try on.

@yarikoptic
Copy link
Member Author

use https://github.com/datalad/datalad-fuse/

datalad install -r -R 1 https://github.com/dandisets
datalad fuse-mount dandisets /tmp/dandisets-fuse

and then you have access to dandisets-fuse/

example of use -- https://github.com/dandi/dandisets-healthstatus so you could pick the code there: https://github.com/search?q=repo%3Adandi%2Fdandisets-healthstatus%20fuse&type=code

@TheChymera
Copy link
Collaborator

TheChymera commented May 6, 2024

The first command fails with:

[deco]/mnt/data ❱ datalad install -r -R 1 https://github.com/dandisets
Cloning: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.00/2.00 [00:00<00:00, 8.51 candidates/s]Username for 'https://github.com': TheChymera
Password for 'https://TheChymera@github.com':
install(error): /mnt/data/dandisets (dataset) [Failed to clone from any candidate source URL. Encountered errors per each url were:
- https://github.com/dandisets
  CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false clone --progress https://github.com/dandisets /mnt/data/dandisets' failed with exitcode 128 [err: 'Cloning into '/mnt/data/dandisets'...
remote: Not Found
fatal: repository 'https://github.com/dandisets/' not found']
- https://github.com/dandisets/.git
  CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false clone --progress https://github.com/dandisets/.git /mnt/data/dandisets' failed with exitcode 128 [err: 'Cloning into '/mnt/data/dandisets'...
remote: Repository not found.
fatal: repository 'https://github.com/dandisets/.git/' not found']]
[ERROR  ] NoDatasetFound(No installed dataset found at /mnt/data/dandisets) (NoDatasetFound)
usage: datalad install [-h] [-s URL-OR-PATH] [-d DATASET] [-g] [-D DESCRIPTION] [-r] [-R LEVELS] [--reckless [auto|ephemeral|shared-...]] [-J NJOBS] [--branch BRANCH] [--version] [URL-OR-PATH ...]

However, this worked: datalad install -r -R 1 git@github.com:dandisets/000628.git

Following that, I think the fuse-mount extension wasn't properly installed:

(mydev) [deco]/mnt/data/datalad ❱ datalad fuse-mount 000628/ /tmp/000628
datalad: Unknown command 'fuse-mount'.  See 'datalad --help'.

(mydev) [deco]/mnt/data/datalad ❱ datalad --help | rg fuse -C 5
  aggregate-metadata
      Aggregate metadata of one or more datasets for later query

*DataLad FUSE command suite*

  fusefs
      FUSE File system providing transparent access to files under DataLad
  fsspec-head
      Show leading lines/bytes of an annexed file by fetching its data from a
  fsspec-cache-clear
      Clear fsspec cache

Any idea how I can check inside the python console whether it's installed? Tried installing it both via the package manager and via PIP, and neither of them seem to work.

better move that part of the discussion to issue tracker as you suggested → datalad/datalad-fuse#111

@TheChymera TheChymera changed the title Test on some DANDI datasets Test on some bigger DANDI datasets using datalad-fuse May 6, 2024
@yarikoptic
Copy link
Member Author

use datalad install -r -R 1 https://github.com/dandi/dandisets -- that is the superdataset for all dandisets, and then datalad fusefs -d dandisets dandisets-fuse or alike (check datalad fusefs --help)

@TheChymera
Copy link
Collaborator

oh, ok, got the first one as well now.

@TheChymera
Copy link
Collaborator

@yarikoptic even if I use the ssh clone URI, I get prompts asking me for my GitHub password. Even if I enter it, they fail. I assume these are embargoed datasets? In any case, is there any way to skip them?

datalad install -r -R 1 git@github.com:dandi/dandisets.git

@TheChymera
Copy link
Collaborator

Here's an example:

[INFO   ] Remote origin not usable by git-annex; setting annex-ignore
[INFO   ] https://github.com/dandisets/000222.git/config download failed: Not Found
[INFO   ] access to 2 dataset siblings dandi-dandisets-dropbox, dandiapi not auto-enabled, enable with:
| 		datalad siblings -d "/mnt/data/datalad/dandisets/000222" enable -s SIBLING
[INFO   ] Remote origin not usable by git-annex; setting annex-ignore
[INFO   ] https://github.com/dandisets/000223.git/config download failed: Not Found
[INFO   ] access to 2 dataset siblings dandiapi, dandi-dandisets-dropbox not auto-enabled, enable with:
| 		datalad siblings -d "/mnt/data/datalad/dandisets/000223" enable -s SIBLING
Installing:  25%|████████████████████▋                                                              | 155/621 [06:09<10:37, 1.37s/ datasetsUsername for 'https://github.com': TheChymera                                                           | 0.00/3.00 [00:00<?, ? candidates/s]
Password for 'https://TheChymera@github.com':
  [146 similar messages have been suppressed; disable with datalad.ui.suppress-similar-results=off]
install(error): /mnt/data/datalad/dandisets/000224 (dataset) [Failed to clone from any candidate source URL. Encountered errors per each url were:
- https://github.com/dandisets/000224.git
  CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false clone --progress https://github.com/dandisets/000224.git /mnt/data/datalad/dandisets/000224' failed with exitcode 128 [err: 'Cloning into '/mnt/data/datalad/dandisets/000224'...
remote: Support for password authentication was removed on August 13, 2021.
remote: Please see https://docs.github.com/get-started/getting-started-with-git/about-remote-repositories#cloning-with-https-urls for information on currently recommended modes of authentication.
fatal: Authentication failed for 'https://github.com/dandisets/000224.git/'']
- https://github.com/dandisets/000224.git/.git
  CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false clone --progress https://github.com/dandisets/000224.git/.git /mnt/data/datalad/dandisets/000224' failed with exitcode 128 [err: 'Cloning into '/mnt/data/datalad/dandisets/000224'...
remote: Not Found
fatal: repository 'https://github.com/dandisets/000224.git/.git/' not found']
- git@github.com:dandi/dandisets.git/000224
  CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false clone --progress git@github.com:dandi/dandisets.git/000224 /mnt/data/datalad/dandisets/000224' failed with exitcode 128 [err: 'Cloning into '/mnt/data/datalad/dandisets/000224'...
fatal: remote error:
 dandi/dandisets.git/000224 is not a valid repository name
Visit https://support.github.com/ for help
CommandError: 'ssh -o ControlPath=/home/chymera/.cache/datalad/sockets/7b668231 -o SendEnv=GIT_PROTOCOL git@github.com 'git-upload-pack '"'"'dandi/dandisets.git/000224'"'"''' failed with exitcode 1']]

@yarikoptic
Copy link
Member Author

just datalad install -r -R 1 https://github.com/dandi/dandisets

those subdatasets which fail to clone - just ignore, must be private since enbargoed

@TheChymera
Copy link
Collaborator

@yarikoptic even with the https link it still waits on every dataset I can't download. Is there any auto-skip feature? I looked in the help, nothing stood out 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants