Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] expose programmatic access to dandiset metadata? #1361

Open
sneakers-the-rat opened this issue Nov 21, 2023 · 4 comments
Open

[question] expose programmatic access to dandiset metadata? #1361

sneakers-the-rat opened this issue Nov 21, 2023 · 4 comments

Comments

@sneakers-the-rat
Copy link
Contributor

hello again :)

question for y'all, gauging interest in a PR - looking through the docs and the source to see the best way to query dandiset metadata and having trouble finding the "right way" that it's supposed to be done.

It seems like this package is primarily oriented around the click cli commands, and the internal functions aren't necessarily expected to be used except in 'advanced' cases - eg. dandi.download.download doesn't have a docstring, etc. I can't tell if y'all intend this to be an SDK for downstream packages to build off, but it would be nice to be able to do that!

So it seems like to get the dandiset metadata that one would do

dandi download <DANDI:ID> --download dandiset.yaml

and then parse the resulting yaml file.

That's not terribly convenient for programmatic use, where I might want to do

from dandi import get_metadata
meta = get_metadata('DANDI:ID') # type: dandischema.models.Dandiset

I see one previous issue ( #1205 ) that shows what might be the recommended method:

from dandi.dandiapi import DandiAPIClient

client = DandiAPIClient()
dandisets = list(client.get_dandisets())

# this works
dandisets[-3].get_metadata()

and it seems like this also works:

from dandi.dandiarchive import parse_dandi_url

url = parse_dandi_url('DANDI:0000N')
dset = url.get_dandiset(url.get_client()) # type: dandi.dandiapi.RemoteDandiset
meta = dset.get_metadata() # type: dandischema.models.Dandiset

so I'm curious if I can help with either some docs or some helper functions, let me know which, if any of these yall would be interested to have in a PR:

  • Documenting how to get metadata programmatically using the above syntax - this could go in an 'examples' or 'guide' directory that we might also leave some dangling stubs to entertain future documentation PRs?
  • Documenting some of the surrounding objects and how they're used - there are already docs for some of the relevant classes (eg. https://dandi.readthedocs.io/en/latest/modref/dandiarchive.html#dandi.dandiarchive.ParsedDandiURL ) but I wouldn't really know how to use them if not for reading the source code. eg. it's not altogether obvious that a DandisetURL would yield a RemoteDandiset which can get_metadata, but it might be more obvious with some high-level description of how these objects are intended to be used
  • Convenience function for get_metadata that wraps the above steps - not sure how applicable this would be to the more general download function, since typically that will download a series of files, but at least for the metadata it seems like this would be a common want / wouldn't incur any additional maintenance burden since it just wraps existing code.

This also could be handled at the level of the web API docs, which I couldn't find (but might be around here somewhere!) eg. rather than using the python API at all, one could just GET https://api.dandiarchive.org/api/dandisets/000540/versions/0.230515.0530/ for example, so that might be another option but out of scope for this repo.

anyway lmk what would be useful, or if i missed something!

@sneakers-the-rat
Copy link
Contributor Author

Seeing this issue - #1363
That seems like a reasonable way to get metadata, would yall like me to draft a PR with a few examples of how to get metadata in the docs?

@yarikoptic
Copy link
Member

yarikoptic commented Nov 22, 2023

I guess it might be not a question of where to place examples but how to make them better findable? we do have on top of the description of our https://dandi.readthedocs.io/en/latest/modref/dandiapi.html an example which is a prototypical "exploration" script. Should we may be point to it from a few locations, e.g. https://www.dandiarchive.org/handbook/10_using_dandi/#dandi-python-client ?

edit: to the same page in handbook add also a little section on presence of API server https://api.dandiarchive.org/ which provides various "interactive" interfaces like https://api.dandiarchive.org/swagger/ and https://api.dandiarchive.org/redoc/ for interaction with the API.

@yarikoptic
Copy link
Member

yarikoptic commented Nov 22, 2023

It seems like this package is primarily oriented around the click cli commands, and the internal functions aren't necessarily expected to be used except in 'advanced' cases - eg. dandi.download.download doesn't have a docstring, etc. I can't tell if y'all intend this to be an SDK for downstream packages to build off, but it would be nice to be able to do that!

well - at https://dandi.readthedocs.io/en/latest/ we point to various levels of the API:

image

and above you just keep digging the "Command line interface", which is the oldest and possibly most archaic code in here ;-)

@sneakers-the-rat
Copy link
Contributor Author

I guess it might be not a question of where to place examples but how to make them better findable? we do have on top of the description of our https://dandi.readthedocs.io/en/latest/modref/dandiapi.html an example which is a prototypical "exploration" script.

ah yes that is very useful but i definitely would not have thought to look for there. I think the top-level ToC is pretty sparse, and moving (or copying, linking) that to an examples section in the top level ToC would make that a lot more visible.

Should we may be point to it from a few locations, e.g. https://www.dandiarchive.org/handbook/10_using_dandi/#dandi-python-client ?

edit: to the same page in handbook add also a little section on presence of API server https://api.dandiarchive.org/ which provides various "interactive" interfaces like https://api.dandiarchive.org/swagger/ and https://api.dandiarchive.org/redoc/ for interaction with the API.

definitely a big fan of dense linking - so yes! a clear navigation path where one would find examples would suit the bill. Since this kind of stuff is examples of how to use this package in particular, i think it would fit here.

and above you just keep digging the "Command line interface", which is the oldest and possibly most archaic code in here ;-)

Fair enough! I'm playing the role here of "naive user trying to help with docs" since i'm sure this all makes sense to y'all and the question is probably annoying. I promise i'm not just being dense on purpose and trying to help smooth some of those documentation desire paths :). My baseless assertion "seems like its primarily oriented around the cli commands" was mostly from the way they are highlighted in the docs, and when i go to the high-, mid-, or low-level interfaces links i just get a big wall of links which signaled to me "not the usual thing we intend people to look at" - eg. the download function is this lonely soul at the bottom of the page and I assumed i wasn't supposed to be touching it.

Screen Shot 2023-11-22 at 6 15 31 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants