Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client.dandi_authenticate blocks during programmatic use #1362

Open
sneakers-the-rat opened this issue Nov 21, 2023 · 4 comments
Open

client.dandi_authenticate blocks during programmatic use #1362

sneakers-the-rat opened this issue Nov 21, 2023 · 4 comments

Comments

@sneakers-the-rat
Copy link
Contributor

Hello!

I'm downloading datasets programmatically, and ran into a stumbling block with auth.

Problem

Currently, when trying to download an embargoed dataset, the client (when invoked with the cli download command or the underlying download function) will stop and wait for user authentication input. This happens in the ParsedDandiURL.navigation method:

try:
dandiset = self.get_dandiset(client, lazy=not strict)
except requests.HTTPError as e:
if (
e.response is not None
and e.response.status_code == 401
and authenticate is not False
):
lgr.info("Resource requires authentication; authenticating ...")
client.dandi_authenticate()
dandiset = self.get_dandiset(client, lazy=not strict)

called by the Downloader.download_generator method:

with self.url.navigate(strict=True) as (client, dandiset, assets):

This makes sense for interactive use, but makes programmatic use difficult - since the exception is not thrown, there is no way to catch it. So eg. when downloading a list of datasets where we aren't certain in advance whether we have access to them, i would want to skip the ones we don't have access to, but in order to do that I had to pre-check each dataset like this:

def check_auth(url:ParsedDandiURL) -> bool:
    with url.get_client() as client:
        try:
            dandiset = url.get_dandiset(client, lazy=False)
        except requests.HTTPError as e:
            if e.response.status_code == 401:
                return False
            else:
                raise
        return True

Options

Detect click

click allows you to check whether you are inside of a click context, so the dandi_authenticate method could check if we're in an interactive session and only ask for credentials if so (eg. before getting to the interactive block here: https://github.com/dandi/dandi-cli/blob/master/dandi/dandiapi.py#L499

import click

def is_interactive() -> bool:
    try:
        _ = click.get_current_context()
        return True
    except RuntimeError as e:
        if 'no active click context' in str(e).lower():
            return False
        else:
            raise

def dandi_authenticate(self) -> None:
    # ... 
    if is_interactive():
        while True:
            # ...
    else:
        raise UnauthorizedError('need to provide credentials...') # or whatever the exception would be named

I see that there already is a is_interactive function, so not sure if that is the more appropriate check or whether looking for a click context is.

Provide Auth explicitly

Currently the authenticate arg to navigate is a bool, and it's not possible to pass auth information from the top-level download function. If instead the download function had a credentials or key or whatever you want to call it arg, then absence of credentials could always raise an exception rather than prompting when not invoked via the click download command.

It is currently possible to provide credentials to the DandiAPIClient object, though it is not possible to provide an already-instantiated client to the download method, so it is possible to authenticate properly in a few ways, but it is not possible to purposely skip authentication when we intend/expect to fail auth. Propagating creds from the top-level download function would be one way of a) allowing explicit auth while also b) allowing explicit non-auth, depending on whether creds are provided.


as usual, lmk if there is something i'm missing, or if there is an intended way to skip unauthorized downloads when using the API programmatically. I'd be happy to draft PRs for either of the above options or for docs if i'm just missing some intended use.

@jwodder
Copy link
Member

jwodder commented Nov 21, 2023

@sneakers-the-rat Do you actually have a Dandi API key? If you do, you can set it via the DANDI_API_KEY environment variable or the keyring, and dandi_authenticate() will pick it up.

@sneakers-the-rat
Copy link
Contributor Author

sneakers-the-rat commented Nov 21, 2023

I do not - would that fix this problem even when my API key wouldnt grant me access to an embargoed dataset, and is that the expected usage? This issue is mostly about not blocking for interactive input during non-interactive use, rather than how to input auth, but if one is always expected to use an API key that would be nice to have in the docs in a "getting started" page

@yarikoptic
Copy link
Member

I do not - would that fix this problem even when my API key wouldnt grant me access to an embargoed dataset, and is that the expected usage?

no -- if it is embargoed and you are not among its authors, you do not have access with or without key.

Overall it feels not unlike discussions we are (still) having in datalad on telling an exactly desired behavior from it - should we interact or not. Or here more even -- should we try to authenticate or not. What you are really looking here is not to force authentication on you if you do not have the key, right @sneakers-the-rat ?

@sneakers-the-rat
Copy link
Contributor Author

Right right - for a programmatic mode where I can catch the auth error from a call to download, but also keep interactive mode when invoked from click since interactive prompt makes sense there. A clean separation between interactive and noninteractive would be v useful for tools built on top of DANDI - rather than needing to write something separate to access API, would be nice to use this package.

there are ways to work around the problem, but they feel like hacks / prevent using the straightforward download method. So as a mild motivating example - if im writing a graph crawler, as I am, and I attempt to dereference some link to an embargoed dataset, I want to be able to handle that as a crawl failure as a standard part of the crawler API (ie. DANDI as one of many crawlers that have shared logic) rather than writing special workaround logic to handle that.

It might be an opportunity to standardize some of the args pass through the relevant functions too, and like I said im happy to draft any PR depending on what approach yall like best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants