Kubernetes client rate-limiting #2942

EronWright · 2024-04-11T18:45:00Z

What happened?

I was running the kubernetes provider in a debugger, and attaching to it using PULUMI_DEBUG_PROVIDERS. I used the same process for numerous deployments, and eventually the provider transitioned to a failure state, apparently due to client-side rate limiting. Once I restarted the provider process, the problem was fixed.

I decided to file an issue because, though my specific case is exotic, there might be a deeper scalability problem in the provider related to rate-limiting in the kube client.
See kubernetes/kubernetes#111880 for more background.

Diagnostics:
  kubernetes:apps/v1:Deployment (deployment):
    error: update of resource "urn:pulumi:dev::issue-xyz::kubernetes:apps/v1:Deployment::deployment" failed 
    because the Kubernetes API server reported that it failed to fully initialize or become live: 
    client rate limiter Wait returned an error: context canceled

  pulumi:pulumi:Stack (issue-xyz-dev):
    error: update failed

Here's the update made just prior to the first rate-limit error. I'd deliberately used an invalid image nginxfoo.

Diagnostics:
  kubernetes:apps/v1:Deployment (deployment):
    warning: Refreshed resource is in an unhealthy state:
    * Resource 'mydeployment' was created but failed to initialize
    * Minimum number of Pods to consider the application live was not attained
    * [Pod eron/mydeployment-65df56c569-dnqzh]: containers with unready status: [nginx]
    error: update of resource "urn:pulumi:dev::issue-2455::kubernetes:apps/v1:Deployment::deployment" failed because the Kubernetes API server reported that it failed to fully initialize or become live: Resource operation was cancelled for "mydeployment"

Example

name: issue-2942
runtime: yaml
description: A minimal Kubernetes Pulumi YAML program
config:
  pulumi:tags:
    value:
      pulumi:template: kubernetes-yaml
outputs:
  name: ${deployment.metadata.name}
resources:
  deployment:
    properties:
      metadata:
        name: mydeployment
      spec:
        replicas: 1
        selector:
          matchLabels: ${appLabels}
        template:
          metadata:
            labels: ${appLabels}
          spec:
            containers:
            - image: nginx
              name: nginx
              env:
              - name: DEMO_GREETING
                value: "16"
    type: kubernetes:apps/v1:Deployment
variables:
  appLabels:
    app: nginx

N/A

Output of `pulumi about`

CLI          
Version      3.108.1
Go Version   go1.22.0
Go Compiler  gc

Plugins
NAME        VERSION
kubernetes  unknown
yaml        unknown

Host     
OS       darwin
Version  14.4.1
Arch     arm64

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

The text was updated successfully, but these errors were encountered:

EronWright · 2024-04-11T19:58:01Z

Here's what happened in my case: the provider was sent a Cancel RPC, causing the provider's internal context to be canceled. In subsequent requests, the kube client logic is the first to hit upon the cancelled context.

Two possible follow-ups:

double-check the qps settings
teach the provider to reset the cancelation signal when it receives Configure RPC.

The low-level throttling code is here:

https://github.com/kubernetes/client-go/blob/46588f2726fa3e25b1704d6418190f424f95a990/rest/request.go#L986-L991

blampe · 2024-04-12T15:02:04Z

Is there another alternative where we generously bump the QPS ceiling if running under debug? A quick workaround like that might be prudent if this is impacting the debug loop but not end-users.

Related #1748

EronWright added impact/performance Something is slower than expected kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Apr 11, 2024

blampe added area/tools kind/engineering Work that is not visible to an external user area/inner-dev-loop and removed needs-triage Needs attention from the triage team labels Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes client rate-limiting #2942

Kubernetes client rate-limiting #2942

EronWright commented Apr 11, 2024 •

edited

EronWright commented Apr 11, 2024 •

edited

blampe commented Apr 12, 2024

Kubernetes client rate-limiting #2942

Kubernetes client rate-limiting #2942

Comments

EronWright commented Apr 11, 2024 • edited

What happened?

Example

Output of pulumi about

Additional context

Contributing

EronWright commented Apr 11, 2024 • edited

blampe commented Apr 12, 2024

EronWright commented Apr 11, 2024 •

edited

Output of `pulumi about`

EronWright commented Apr 11, 2024 •

edited