Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workload Identity Federation only working with uniform bucket level access? #3556

Closed
Dadavan opened this issue Apr 10, 2024 · 2 comments
Closed

Comments

@Dadavan
Copy link

Dadavan commented Apr 10, 2024

We are running Tempo on GKE and using a GCS bucket with uniform bucket level access set to 'false' as a storage backend. We have had this setup for quite some time and it's been running without problems. Recently we switched from using the "old" Workload Identity Federation (creating both a Kubernetes and a IAM Service account, linking them using an annotation on the Kubernetes account, and granting the necessary roles to the IAM account) to using the updated form of Workload Identity Federation (see here) - Basically, giving Kubernetes Principals permissions on GCP resources directly without the need of also linking to IAM service accounts.

To allow Tempo to access the bucket we have given the following principal the storage.objectAdmin role on the bucket and also permissions to list all buckets in the project:

principal://iam.googleapis.com/projects/XXX/locations/global/workloadIdentityPools/XXX.svc.id.goog/subject/ns/tracing/sa/tempo

We are now seeing errors such as the following every few minutes in the logs:

level=error ts=2024-04-10T09:27:41.032325318Z caller=tempodb.go:462 msg="failed to poll blocklist. using previously polled lists" err="googleapi: got HTTP response code 412 with body: <?xml version='1.0' encoding='UTF-8'?><Error><Code>PreconditionFailed</Code><Message>The operation requires that Uniform Bucket Level Access be enabled.</Message><Details>The type of authentication token used for this request requires that Uniform Bucket Level Access be enabled.</Details></Error>"

And also:

level=error ts=2024-04-09T16:01:09.385524514Z caller=poller.go:195 msg="failed to write tenant index" tenant=single-tenant err="googleapi: Error 412: The type of authentication token used for this request requires that Uniform Bucket Level Access be enabled., conditionNotMet"

We had no issues whatsoever when using the previous form of Workload Identity and it's not documented anywhere that in order to use this type of authorization you need to enable uniform bucket level access. Is this an issue with Tempo? Or a general issue with how the updated Workload Identity Federation works?

Steps to reproduce the behavior:

  1. Run Tempo (2.0.1, but I tried also on 2.4.1 and still get the same problem) on Kubernetes with a GCS bucket storage backend and Workload Identity enabled on the cluster.
  2. Give the tempo service account principal permissions on the GCS bucket
  3. Wait for Tempo to start up and view the log

Expected Behaviour:
Tempo reports no errors regarding the GCS bucket.

Environment:

  • Infrastructure: GKE
  • Deployment tool: Custom Helm Chart

Additional Context:

tempo.yaml:

target: scalable-single-binary

multitenancy_enabled: false

server:
  http_listen_port: 3100
  log_level: info

distributor:
  log_received_spans:
    enabled: false # for debugging only, should be set to false on production

  receivers:
    jaeger:
      protocols:
        thrift_compact:
          endpoint: 0.0.0.0:6831
        thrift_binary:
          endpoint: 0.0.0.0:6832
        grpc:
          endpoint: 0.0.0.0:14250
        thrift_http:
          endpoint: 0.0.0.0:14268

ingester:
  lifecycler:
    heartbeat_period: 100ms
    ring:
      kvstore:
        store: memberlist

compactor:
  ring:
    kvstore:
      store: memberlist
  compaction:
    compacted_block_retention: 24h

memberlist:
  abort_if_cluster_join_fails: false
  bind_port: 7946
  join_members:
    - tempo-headless.tracing.svc.cluster.local:7946

storage:
  trace:
    backend: gcs
    gcs:
      bucket_name: XXX-tempo
    local:
      path: /var/tempo/traces
    wal:
      path: /var/tempo/wal

query_frontend:
  search:
    max_duration: 720h1m0s

overrides:
  max_search_bytes_per_trace: 100000

querier:
  frontend_worker:
    frontend_address: tempo-headless.tracing.svc.cluster.local:9095
@Dadavan Dadavan changed the title Workload Identity Federation only working with unifrom bucket level access? Workload Identity Federation only working with uniform bucket level access? Apr 10, 2024
@joe-elliott
Copy link
Member

Thanks for the report. I honestly don't know the details of Uniform Bucket Access to comment directly, but let's dig in a bit.

Let's focus on the second of the two errors. It's logged here:

level.Error(p.logger).Log("msg", "failed to write tenant index", "tenant", tenantID, "err", err)

Which is passed through here:

func (w *writer) WriteTenantIndex(ctx context.Context, tenantID string, meta []*BlockMeta, compactedMeta []*CompactedBlockMeta) error {

And ultimately lands on this call:

func (rw *readerWriter) Write(ctx context.Context, name string, keypath backend.KeyPath, data io.Reader, _ int64, _ *backend.CacheInfo) error {

I'm not sure what underlying GCS call that maps to, but I believe we're using the standard SDK in an appropriate manner. What's interesting is Tempo writes data all the time using this call for the blocks. Are you seeing any issues with your ingesters or compactors flushing/creating blocks?

The main difference I can think of is where in the object hierarchy the objects are written:

is broken:

gs://<bucket>/<tenant>/index.json.gz

seemingly works:

gs://<bucket>/<tenant>/<block guid>/<various block files>

@Dadavan
Copy link
Author

Dadavan commented May 15, 2024

Google have updated their docs and now explain why this is happening. Apparently setting 'Uniform Bucket Level Access' to true is a requirement when using IAM principals for Workload Identity Federation. The solution they suggest is as I described in the issue - using the 'old' method of linking a k8s account to an IAM account. See here: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#kubernetes-sa-to-iam

@joe-elliott Thanks for your assistance and time!

@Dadavan Dadavan closed this as completed May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants