`client.dandi_authenticate` blocks during programmatic use #1362

sneakers-the-rat · 2023-11-21T03:31:46Z

Hello!

I'm downloading datasets programmatically, and ran into a stumbling block with auth.

Problem

Currently, when trying to download an embargoed dataset, the client (when invoked with the cli download command or the underlying download function) will stop and wait for user authentication input. This happens in the ParsedDandiURL.navigation method:

dandi-cli/dandi/dandiarchive.py

Lines 170 to 180 in 10d076e

    
           try: 
        
               dandiset = self.get_dandiset(client, lazy=not strict) 
        
           except requests.HTTPError as e: 
        
               if ( 
        
                   e.response is not None 
        
                   and e.response.status_code == 401 
        
                   and authenticate is not False 
        
               ): 
        
                   lgr.info("Resource requires authentication; authenticating ...") 
        
                   client.dandi_authenticate() 
        
                   dandiset = self.get_dandiset(client, lazy=not strict)

called by the Downloader.download_generator method:

dandi-cli/dandi/download.py

Line 229 in 10d076e

with self.url.navigate(strict=True) as (client, dandiset, assets):

This makes sense for interactive use, but makes programmatic use difficult - since the exception is not thrown, there is no way to catch it. So eg. when downloading a list of datasets where we aren't certain in advance whether we have access to them, i would want to skip the ones we don't have access to, but in order to do that I had to pre-check each dataset like this:

def check_auth(url:ParsedDandiURL) -> bool:
    with url.get_client() as client:
        try:
            dandiset = url.get_dandiset(client, lazy=False)
        except requests.HTTPError as e:
            if e.response.status_code == 401:
                return False
            else:
                raise
        return True

Options

Detect click

click allows you to check whether you are inside of a click context, so the dandi_authenticate method could check if we're in an interactive session and only ask for credentials if so (eg. before getting to the interactive block here: https://github.com/dandi/dandi-cli/blob/master/dandi/dandiapi.py#L499

import click

def is_interactive() -> bool:
    try:
        _ = click.get_current_context()
        return True
    except RuntimeError as e:
        if 'no active click context' in str(e).lower():
            return False
        else:
            raise

def dandi_authenticate(self) -> None:
    # ... 
    if is_interactive():
        while True:
            # ...
    else:
        raise UnauthorizedError('need to provide credentials...') # or whatever the exception would be named

I see that there already is a is_interactive function, so not sure if that is the more appropriate check or whether looking for a click context is.

Provide Auth explicitly

Currently the authenticate arg to navigate is a bool, and it's not possible to pass auth information from the top-level download function. If instead the download function had a credentials or key or whatever you want to call it arg, then absence of credentials could always raise an exception rather than prompting when not invoked via the click download command.

It is currently possible to provide credentials to the DandiAPIClient object, though it is not possible to provide an already-instantiated client to the download method, so it is possible to authenticate properly in a few ways, but it is not possible to purposely skip authentication when we intend/expect to fail auth. Propagating creds from the top-level download function would be one way of a) allowing explicit auth while also b) allowing explicit non-auth, depending on whether creds are provided.

as usual, lmk if there is something i'm missing, or if there is an intended way to skip unauthorized downloads when using the API programmatically. I'd be happy to draft PRs for either of the above options or for docs if i'm just missing some intended use.

The text was updated successfully, but these errors were encountered:

jwodder · 2023-11-21T13:48:25Z

@sneakers-the-rat Do you actually have a Dandi API key? If you do, you can set it via the DANDI_API_KEY environment variable or the keyring, and dandi_authenticate() will pick it up.

sneakers-the-rat · 2023-11-21T21:55:56Z

I do not - would that fix this problem even when my API key wouldnt grant me access to an embargoed dataset, and is that the expected usage? This issue is mostly about not blocking for interactive input during non-interactive use, rather than how to input auth, but if one is always expected to use an API key that would be nice to have in the docs in a "getting started" page

yarikoptic · 2023-11-21T22:27:59Z

I do not - would that fix this problem even when my API key wouldnt grant me access to an embargoed dataset, and is that the expected usage?

no -- if it is embargoed and you are not among its authors, you do not have access with or without key.

Overall it feels not unlike discussions we are (still) having in datalad on telling an exactly desired behavior from it - should we interact or not. Or here more even -- should we try to authenticate or not. What you are really looking here is not to force authentication on you if you do not have the key, right @sneakers-the-rat ?

sneakers-the-rat · 2023-11-21T22:54:20Z

Right right - for a programmatic mode where I can catch the auth error from a call to download, but also keep interactive mode when invoked from click since interactive prompt makes sense there. A clean separation between interactive and noninteractive would be v useful for tools built on top of DANDI - rather than needing to write something separate to access API, would be nice to use this package.

there are ways to work around the problem, but they feel like hacks / prevent using the straightforward download method. So as a mild motivating example - if im writing a graph crawler, as I am, and I attempt to dereference some link to an embargoed dataset, I want to be able to handle that as a crawl failure as a standard part of the crawler API (ie. DANDI as one of many crawlers that have shared logic) rather than writing special workaround logic to handle that.

It might be an opportunity to standardize some of the args pass through the relevant functions too, and like I said im happy to draft any PR depending on what approach yall like best

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`client.dandi_authenticate` blocks during programmatic use #1362

`client.dandi_authenticate` blocks during programmatic use #1362

sneakers-the-rat commented Nov 21, 2023

jwodder commented Nov 21, 2023

sneakers-the-rat commented Nov 21, 2023 •

edited

yarikoptic commented Nov 21, 2023

sneakers-the-rat commented Nov 21, 2023

client.dandi_authenticate blocks during programmatic use #1362

client.dandi_authenticate blocks during programmatic use #1362

Comments

sneakers-the-rat commented Nov 21, 2023

Problem

Options

Detect click

Provide Auth explicitly

jwodder commented Nov 21, 2023

sneakers-the-rat commented Nov 21, 2023 • edited

yarikoptic commented Nov 21, 2023

sneakers-the-rat commented Nov 21, 2023

`client.dandi_authenticate` blocks during programmatic use #1362

`client.dandi_authenticate` blocks during programmatic use #1362

sneakers-the-rat commented Nov 21, 2023 •

edited