Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to operate solely on already extracted metadata of DANDI dandisets #7

Open
yarikoptic opened this issue Apr 15, 2024 · 1 comment
Assignees

Comments

@yarikoptic
Copy link
Member

We do have metadata across all dandisets assets extracted and made available both in

Ideally the tool should be able to operate (might be a mode option of some kind) just on the metadata records and provide e.g. as output a json/tsv list of records with target filename per each asset.

If metadata is lacking, we should extend it at https://github.com/dandi/dandi-schema/ level and https://github.com/dandi/dandi-cli to support extraction/harmonization where needed.

Before even doing that, internal code internally should be aware of such target use-case -- should get a clear separation of steps of

  1. metadata-extraction/harmonization, e.g. get_metadata_from_files(files: list[Path]) -> list[AssetMetadata]
  2. analytics for BIDS files construction based on metadata, e.g. get_bids_filenames(list[AssetMetadata]) -> list[BIDSFile]
    • tricky part is that some files would be "generated" and not correspond to specific asset but rather often "summary" over assets, e.g. dataset_description.json, participants.tsv
      • could be done via creating a ConcreteBIDSFile subclass of BIDSFile which would just store the content of the target file
  3. BIDS dataset instantiation, e.g. populate_bids_files(list[BIDSFile]) -> None - which given the list of files from above would instantiate. Could have options of various kinds or have different implementations (e.g. creating datalad dataset via https://docs.datalad.org/en/stable/generated/man/datalad-addurls.html if originally operating on list of URLs; or another one which downloads etc)
@TheChymera
Copy link
Collaborator

Do I understand correctly that this would require nwb2bids to depend on DANDI?
If so I think that would be a problem, because neuroconv should depend on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants