Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permission denied while getting drive credentials: ADC with impersonation #1204

Open
adamcunnington-mlg opened this issue Nov 29, 2022 · 17 comments
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@adamcunnington-mlg
Copy link

adamcunnington-mlg commented Nov 29, 2022

I consider myself pretty familiar with the various google auth flows available via the python SDK - and how this interacts with gcloud-generated credentials.

We are using the bq SDK in the typical way; client = bigquery.Client() and we make use of ADC so our code is interoperable between dev and prod. Our code interacts with external tables that are sourced from sheets on google drive. We know that we need to provide the necessary scopes (and of course, permission to the underlying sheets).

The following works fine for a user identity with the necessary permissions:
gcloud auth application-default login --scopes=https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/cloud-platform

However, the following does not:
gcloud auth application-default login --scopes=https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/cloud-platform --impersonate-service-account=hand-of-god@mlg-apollo-data-prod.iam.gserviceaccount.com

We receive google.api_core.exceptions.Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.

I can replicate the same issue with my user credential if 1 of the following 2 things are true:

  1. I don't pass google drive scopes.
  2. I don't have access to the underlying file.

The service account that I am impersonating definitely has access to the file and I can see the BigQuery job failure with non-descript error message (a feature request has been raised for this with the BigQuery REST API team). My suspicion is that when impersonating a service account, the scopes (that are presumably buried in the credential) are not passed through / correctly read by the SDK (WHEN the ADC was generated using SA impersonation only). Maybe a similar issue is happening with my above note when the project cannot be inferred from the environment.

See below screenshot proof of correct permissions being in place:
image

Very grateful for some direction here...

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Nov 29, 2022
@adamcunnington-mlg
Copy link
Author

adamcunnington-mlg commented Nov 29, 2022

For the next poor soul that encounters this, I have concluded that indeed, the scopes are ignored by the python SDK (might not be isolated to just here) relating to ADC credentials generated using service account impersonation.

Interestingly, the same issue does NOT happen when using googleapiclient (google-python-api-client) so I think that library does something smarter than google-cloud-core does.

This can be worked around in various ways by explicitly setting the project and scopes within the code but this makes for a brittle implementation that is not interoperable with different credential types and environments.

I found the best way to workaround this is by passing the lesser known client_options object (https://googleapis.dev/python/google-api-core/latest/client_options.html#google.api_core.client_options.ClientOptions) which supports explicit scopes

An alternative is to create an ADC object explicitly with scopes; e.g. google.auth.default(scopes=...)

@adamcunnington-mlg
Copy link
Author

I've raised a case with Google Cloud support to confirm this bug

@tswast
Copy link
Contributor

tswast commented Dec 8, 2022

Thanks for raising this issue. I see you have already discovered the client_options and credentials via google.auth.default workarounds.

Some related code for further investigation. We set default scopes here:

https://github.com/googleapis/python-bigquery/blob/40e4da78bb690ff4c94832321377bb1590e2eeaf/google/cloud/bigquery/client.py#L210-L213

These scopes are used here:

https://github.com/googleapis/python-cloud-core/blob/8ca0faa17e87aa842d154b965be5ef39f1f7490d/google/cloud/client/__init__.py#L169

Potentially there's a difference between an impersonated service account and user credentials, where the former can be scoped down? I recall that user credentials aren't really affected by the scopes parameter after they're created.

@tswast
Copy link
Contributor

tswast commented Dec 8, 2022

Looking at

def with_scopes_if_required(credentials, scopes, default_scopes=None):
I think perhaps we should be setting default_scopes here https://github.com/googleapis/python-cloud-core/blob/8ca0faa17e87aa842d154b965be5ef39f1f7490d/google/cloud/client/__init__.py#L181 instead of scopes.

@tswast tswast added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Dec 8, 2022
@tswast
Copy link
Contributor

tswast commented Dec 8, 2022

On second thought, this may not be a bug. I think no drive scope is the correct default, so clients that need these scopes should be passing it in via the client_options.

Perhaps we reclassify this as a documentation issue to update the code sample at https://cloud.google.com/bigquery/docs/external-data-drive#python now that client_options are available?

@adamcunnington-mlg
Copy link
Author

adamcunnington-mlg commented Dec 8, 2022

Thanks so much for looking at this but I don't quite agree its a docs issue. The key point here is that the scopes are correctly extracted from ADC when ADC is of type authorized user but NOT when they are of type impersonated_service_account. I think this requires a fix in google.auth.

@tswast
Copy link
Contributor

tswast commented Dec 8, 2022

The key point here is that the scopes are correctly extracted from ADC when ADC is of type authorized user but NOT when they are of type impersonated_service_account

I suppose there's a subtlety here. We don't want to downscope credentials that already have cloud-platform or bigquery scopes. The only reason we're not doing that for authorized user is that downscoping isn't supported in google-auth. If it were supported, we wouldn't want to be downscoping in that case, either.

@adamcunnington-mlg
Copy link
Author

adamcunnington-mlg commented Dec 8, 2022

But as far as google.auth is concerned, an authorised user or an authorised user that is impersonating a service account, is the same category of thing. It's still an authorised user credential, and when I'm generating ADC, I'm providing explicit scopes, which in this case are wider than what is coming from python-bigquery (cloud platform PLUS drive scopes) but bigquery is irrelevant in the discussion here - this issue should probably be ported to google.auth repo. It's not BQ specific at all.

@rafael-guevara-ONE

This comment was marked as off-topic.

@krampepampe

This comment was marked as off-topic.

@adamcunnington-mlg
Copy link
Author

@rafael-guevara-ONE @krampepampe I doubt you are having the same issue.

How are you authenticating? You are probably missing the google drive scope. That is not what this issue is about.

@rafael-guevara-ONE

This comment was marked as off-topic.

@adamcunnington-mlg
Copy link
Author

adamcunnington-mlg commented Dec 14, 2022

Ok, this is muddying the water of this issue. Thanks for that!

@tswast please can you advise following my previous response?

@tswast
Copy link
Contributor

tswast commented Dec 14, 2022

But as far as google.auth is concerned, an authorised user or an authorised user that is impersonating a service account, is the same category of thing. It's still an authorised user credential, and when I'm generating ADC, I'm providing explicit scopes, which in this case are wider than what is coming from python-bigquery (cloud platform PLUS drive scopes) but bigquery is irrelevant in the discussion here - this issue should probably be ported to google.auth repo. It's not BQ specific at all.

Oh, I agree that it's a subtley that shouldn't exist. I'm not 100% sure how google-auth would detect that it shouldn't try to downscope, but it's probably worth moving over to that repo.

Alternatively, the issue may be in google-cloud-core, because it isn't using the default_scopes argument for scopes that come from the client class definition. https://github.com/googleapis/python-cloud-core/blob/8ca0faa17e87aa842d154b965be5ef39f1f7490d/google/cloud/client/__init__.py#L181

One thing that will make this more difficult with respect to bigquery is that cloud-platform is a superset of the permissions in the bigquery scope. I've filed googleapis/python-bigquery#1444 to standardize the scopes to avoid potential confusion.

@tswast tswast transferred this issue from googleapis/python-bigquery Dec 14, 2022
@adamcunnington-mlg
Copy link
Author

adamcunnington-mlg commented Dec 15, 2022

@tswast thanks for the response. Also, I raised a support case in Google Cloud - I'm not sure if you ended up inputting there but just to bring this altogether, here is the response I got there - which honestly, I'm a little dubious about. It centres around there not being a "scope" argument in the ADC JSON but there never is. Presumably the scope information is baked into the refresh token or something.

All the same, here it is:
I got an update from the Product Team regarding your issue. when you are using gcloud command[1], it adds the scopes to the source credential instead of the impersonated creds, and the dumped ADC file doesn't have a scope field, so this info is lost when users load from the ADC file. Hence you’re facing this issue. I would like to inform you that we won't support scope + impersonate-service-account flags at the moment..

I also discussed with Eng team for possibility of supporting scope with impersonated service account and they replied as below:

Adding scopes + impersonate-service-account support in gcloud for this command is not in the current road map. This is a major effort so we don't think this will happen in a short time (we need to add scopes to the ADC file, but any ADC file change has a big impact: we not only need to update gcloud, but also auth libraries in every supported language).

However, the Product Team informed us that they will add a warning message for command [1] saying scopes will be ignored but there is no promised ETA at the moment.

@mau21mau
Copy link

Trying to use google.auth.default(scopes=['https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/cloud-platform']) doesn't work for me. For some reason, the scopes of the credentials will be ignored and the scopes will be set to None

@Linchin Linchin removed the api: bigquery Issues related to the BigQuery API. label May 8, 2024
@EricSeastrand
Copy link

EricSeastrand commented May 27, 2024

For anyone else facing this, here's the exact code that worked for me:

from google.cloud import bigquery
client = bigquery.Client(client_options={
    "scopes": ['https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/cloud-platform']
})
results = client.query_and_wait(sql)

While I understand the nuance here (and how this might not be a "bug" per se), it certainly can be unexpected behavior that costs manhours. I think it's a deeper issue with how GCP handles permissions on the server side and can't really be addressed in the language SDKs (other than with one-off hacks as described here).
My rationale: The authentication between BQ to GDrive happens on the server side. It should be up to the server side to look up the service account's permissions, see that it can access a GSheet-backed table, know that it's allowed to talk to Gdrive, and make the connection.
The way it works now: the service account making the API call into BigQuery API needs to somehow "know" that a table is Gsheet-backed, so that it can include the right access scopes. That feels problematic to me. It's a leaky abstraction at best.

But it sounds like GCP isn't interested in addressing it. I guess I get it: that's a really big change with huge implications. So for now at least we have GoogleSearch to help the next poor dev find this GH issue and get past this odd behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

7 participants