Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Xray with stscreds is difficult and awkward #3213

Closed
copumpkin opened this issue Mar 18, 2020 · 4 comments · Fixed by #3223 or #3245
Closed

Using Xray with stscreds is difficult and awkward #3213

copumpkin opened this issue Mar 18, 2020 · 4 comments · Fixed by #3223 or #3245

Comments

@copumpkin
Copy link

copumpkin commented Mar 18, 2020

The Xray integration with this SDK seems to rely a lot on passing Contexts around, which is easy in most cases, except when you need to go through AssumeRole with stscreds, where the type signature doesn't really give stscreds enough power to be able to call AssumeRoleWithContext on the underlying STS client.

It would be nice if there could be an endorsed way use stscreds with Xray, because if you just leave it alone right now, you get the following panic:

panic: failed to begin subsegment named 'sts': segment cannot be found.

In an ideal world, it seems like if an API call using the stscreds provider needs to refresh the credentials, then the AssumeRole should use the context of the API triggering the refresh. That would allow it to respect timeouts and such, as well as obviously letting Xray see into the bowels of a request that triggers an AssumeRole.

I'd welcome ideas for workarounds. Right now we've had to just make an AssumeRoler that looks into a global variable for the context to use, and then we have remember to set the global properly.

@varming
Copy link

varming commented Mar 18, 2020

I would go a little further and say that stscreds should use the context object passed into *WithContext methods when refreshing credentials.

@copumpkin
Copy link
Author

It looks like @jasdel just recently added a GetWithContext to credentials that seemingly appears to help with this (assuming stscreds had an AssumeRolerWithContext or similar), except it has a comment on it saying that it can't pass the context further down. I don't quite follow the justification though:

func (c *Credentials) GetWithContext(ctx Context) (Value, error) {
if curCreds := c.creds.Load(); !c.isExpired(curCreds) {
return curCreds.(Value), nil
}
// Cannot pass context down to the actual retrieve, because the first
// context would cancel the whole group when there is not direct
// association of items in the group.

@skmcgrail
Copy link
Member

skmcgrail commented Mar 19, 2020

The reason GetWithContext does not pass down the context to the providers' retrieve implementation is to prevent the cancellation of the context causing a failure to refresh the providers' credentials. The update that introduced GetWithContext introduced an underlying implementation change where the refreshing and waiting on credentials to be refreshed is no longer gated around a sync.Mutex lock, but now occurs using a singleflight.Group. The previous solution could cause issues with client side memory exhaustion and additionally the client could issue large numbers of credential refresh attempts outside the normal retry strategy.

The design change ensures that if a request to refresh credentials is currently in flight, all other calls to Get will wait on that same request, and will receive the same output or error response from said request. Thus to ensure a single context does not force the cancellation of a refresh for other Get requests we've opted not to pass in the original context down during retrieval. We are still able to honor a context cancellation though, and the individual call to Get will appropriately cancel and return from the Get regardless of whether there is an inflight refresh occurring.

@agacek
Copy link

agacek commented Mar 20, 2020

Thanks Sean. In that case, what's the recommended way to use X-Ray in the Go SDK when using credentials from an AssumeRoler?

aws-sdk-go-automation pushed a commit that referenced this issue Apr 2, 2020
===

### Service Client Updates
* `service/gamelift`: Updates service API and documentation
  * Public preview of GameLift FleetIQ as a standalone feature. GameLift FleetIQ makes it possible to use low-cost Spot instances by limiting the chance of interruptions affecting game sessions. FleetIQ is a feature of the managed GameLift service, and can now be used with game hosting in EC2 Auto Scaling groups that you manage in your own account.
* `service/medialive`: Updates service API, documentation, and waiters
  * AWS Elemental MediaLive now supports Automatic Input Failover. This feature provides resiliency upstream of the channel, before ingest starts.
* `service/monitoring`: Updates service API and documentation
  * Amazon CloudWatch Contributor Insights adds support for tags and tagging on resource creation.
* `service/rds`: Updates service documentation
  * Documentation updates for RDS: creating read replicas is now supported for SQL Server DB instances
* `service/redshift`: Updates service documentation
  * Documentation updates for redshift

### SDK Enhancements
* `aws/credentials`: `ProviderWithContext` optional interface has been added to support passing contexts on credential retrieval ([#3223](#3223))
  * Credential providers that implement the optional `ProviderWithContext` will have context passed to them
  * `ec2rolecreds.EC2RoleProvider`, `endpointcreds.Provider`, `stscreds.AssumeRoleProvider`, `stscreds.WebIdentityRoleProvider` have been updated to support the `ProviderWithContext` interface
  * Fixes [#3213](#3213)
* `aws/ec2metadata`: Context aware operations have been added `EC2Metadata` client ([#3223](#3223))
aws-sdk-go-automation added a commit that referenced this issue Apr 2, 2020
Release v1.30.3 (2020-04-02)
===

### Service Client Updates
* `service/gamelift`: Updates service API and documentation
  * Public preview of GameLift FleetIQ as a standalone feature. GameLift FleetIQ makes it possible to use low-cost Spot instances by limiting the chance of interruptions affecting game sessions. FleetIQ is a feature of the managed GameLift service, and can now be used with game hosting in EC2 Auto Scaling groups that you manage in your own account.
* `service/medialive`: Updates service API, documentation, and waiters
  * AWS Elemental MediaLive now supports Automatic Input Failover. This feature provides resiliency upstream of the channel, before ingest starts.
* `service/monitoring`: Updates service API and documentation
  * Amazon CloudWatch Contributor Insights adds support for tags and tagging on resource creation.
* `service/rds`: Updates service documentation
  * Documentation updates for RDS: creating read replicas is now supported for SQL Server DB instances
* `service/redshift`: Updates service documentation
  * Documentation updates for redshift

### SDK Enhancements
* `aws/credentials`: `ProviderWithContext` optional interface has been added to support passing contexts on credential retrieval ([#3223](#3223))
  * Credential providers that implement the optional `ProviderWithContext` will have context passed to them
  * `ec2rolecreds.EC2RoleProvider`, `endpointcreds.Provider`, `stscreds.AssumeRoleProvider`, `stscreds.WebIdentityRoleProvider` have been updated to support the `ProviderWithContext` interface
  * Fixes [#3213](#3213)
* `aws/ec2metadata`: Context aware operations have been added `EC2Metadata` client ([#3223](#3223))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants