Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need help: KSVCs are not scaling down to zero #15191

Closed
kiranmenon opened this issue May 7, 2024 · 5 comments
Closed

Need help: KSVCs are not scaling down to zero #15191

kiranmenon opened this issue May 7, 2024 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@kiranmenon
Copy link

kiranmenon commented May 7, 2024

I deployed sample autoscale app on 1.28.2 k8s clusters I created on OCI. I have the annotation autoscaling.knative.dev/min-scale: "0" set in that ksvc and also using the default values in config-autoscaler cm. I see that the autoscaler-go sample deployment is not scaling down to zero. Is there any other configs we need to set there to enable scale down to 0 ? Also how we can debug these kind of issues ?

In the config-autoscaler cm I tried adding below values explicitly but that didnot helped.

enable-scale-to-zero: "true"
min-scale: "0"
panic-threshold-percentage: "150.0"
panic-window-percentage: "50.0"
pod-autoscaler-class: kpa.autoscaling.knative.dev
requests-per-second-target-default: "500"
stable-window: 30s

What version of Knative?

1.12.3

Output of git describe --dirty
knative-v1.12.3

Expected Behavior

the autoscaler-go sample deployment should scale down to zero after some time of inactivity.

Actual Behavior

The sample app is not scaling down to zero after waiting for couple of minutes of zero traffic

@kiranmenon kiranmenon added the kind/bug Categorizes issue or PR as related to a bug. label May 7, 2024
@kiranmenon
Copy link
Author

I have some logs from autoscaler pod, if it helps.

{"severity":"DEBUG","timestamp":"2024-05-08T12:26:15.105875396Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:15.574772129Z","logger":"autoscaler","caller":"scaling/autoscaler.go:190","message":"For metric concurrency observed values: stable = 0.000; panic = 0.000; target = 7.000 Desired StablePodCount = 1, PanicPodCount = 1, ReadyEndpointCount = 1, MaxScaleUp = 1000, MaxScaleDown = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:15.574804962Z","logger":"autoscaler","caller":"scaling/autoscaler.go:247","message":"Operating in stable mode.","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:15.574821794Z","logger":"autoscaler","caller":"scaling/autoscaler.go:286","message":"PodCount=1 Total1PodCapacity=10.000 ObsStableValue=0.000 ObsPanicValue=0.000 TargetBC=211.000 ExcessBC=-202.000","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:16.105992928Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:17.105149099Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:17.575085132Z","logger":"autoscaler","caller":"scaling/autoscaler.go:190","message":"For metric concurrency observed values: stable = 0.000; panic = 0.000; target = 7.000 Desired StablePodCount = 1, PanicPodCount = 1, ReadyEndpointCount = 1, MaxScaleUp = 1000, MaxScaleDown = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:17.575129516Z","logger":"autoscaler","caller":"scaling/autoscaler.go:247","message":"Operating in stable mode.","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:17.575149254Z","logger":"autoscaler","caller":"scaling/autoscaler.go:286","message":"PodCount=1 Total1PodCapacity=10.000 ObsStableValue=0.000 ObsPanicValue=0.000 TargetBC=211.000 ExcessBC=-202.000","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:18.105966467Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:19.106160855Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:19.574716593Z","logger":"autoscaler","caller":"scaling/autoscaler.go:190","message":"For metric concurrency observed values: stable = 0.000; panic = 0.000; target = 7.000 Desired StablePodCount = 1, PanicPodCount = 1, ReadyEndpointCount = 1, MaxScaleUp = 1000, MaxScaleDown = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:19.574744596Z","logger":"autoscaler","caller":"scaling/autoscaler.go:247","message":"Operating in stable mode.","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:19.574755507Z","logger":"autoscaler","caller":"scaling/autoscaler.go:286","message":"PodCount=1 Total1PodCapacity=10.000 ObsStableValue=0.000 ObsPanicValue=0.000 TargetBC=211.000 ExcessBC=-202.000","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:20.105121105Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:21.10635343Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:21.574547051Z","logger":"autoscaler","caller":"scaling/autoscaler.go:190","message":"For metric concurrency observed values: stable = 0.000; panic = 0.000; target = 7.000 Desired StablePodCount = 1, PanicPodCount = 1, ReadyEndpointCount = 1, MaxScaleUp = 1000, MaxScaleDown = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:21.574568401Z","logger":"autoscaler","caller":"scaling/autoscaler.go:247","message":"Operating in stable mode.","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:21.574579782Z","logger":"autoscaler","caller":"scaling/autoscaler.go:286","message":"PodCount=1 Total1PodCapacity=10.000 ObsStableValue=0.000 ObsPanicValue=0.000 TargetBC=211.000 ExcessBC=-202.000","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:22.105742773Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:23.106140867Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:23.575112472Z","logger":"autoscaler","caller":"scaling/autoscaler.go:190","message":"For metric concurrency observed values: stable = 0.000; panic = 0.000; target = 7.000 Desired StablePodCount = 1, PanicPodCount = 1, ReadyEndpointCount = 1, MaxScaleUp = 1000, MaxScaleDown = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:23.575132559Z","logger":"autoscaler","caller":"scaling/autoscaler.go:247","message":"Operating in stable mode.","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:23.575144551Z","logger":"autoscaler","caller":"scaling/autoscaler.go:286","message":"PodCount=1 Total1PodCapacity=10.000 ObsStableValue=0.000 ObsPanicValue=0.000 TargetBC=211.000 ExcessBC=-202.000","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:24.105558593Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:25.104668295Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:25.574780873Z","logger":"autoscaler","caller":"scaling/autoscaler.go:190","message":"For metric concurrency observed values: stable = 0.000; panic = 0.000; target = 7.000 Desired StablePodCount = 1, PanicPodCount = 1, ReadyEndpointCount = 1, MaxScaleUp = 1000, MaxScaleDown = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:25.574821329Z","logger":"autoscaler","caller":"scaling/autoscaler.go:247","message":"Operating in stable mode.","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:25.574845234Z","logger":"autoscaler","caller":"scaling/autoscaler.go:286","message":"PodCount=1 Total1PodCapacity=10.000 ObsStableValue=0.000 ObsPanicValue=0.000 TargetBC=211.000 ExcessBC=-202.000","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:26.106526085Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:27.104674857Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:27.5747907Z","logger":"autoscaler","caller":"scaling/autoscaler.go:190","message":"For metric concurrency observed values: stable = 0.000; panic = 0.000; target = 7.000 Desired StablePodCount = 1, PanicPodCount = 1, ReadyEndpointCount = 1, MaxScaleUp = 1000, MaxScaleDown = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:27.574811329Z","logger":"autoscaler","caller":"scaling/autoscaler.go:247","message":"Operating in stable mode.","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:27.574826227Z","logger":"autoscaler","caller":"scaling/autoscaler.go:286","message":"PodCount=1 Total1PodCapacity=10.000 ObsStableValue=0.000 ObsPanicValue=0.000 TargetBC=211.000 ExcessBC=-202.000","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:28.105982514Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:29.10561138Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}

{"severity":"DEBUG","timestamp":"2024-05-08T12:26:29.574439514Z","logger":"autoscaler","caller":"scaling/autoscaler.go:190","message":"For metric concurrency observed values: stable = 0.000; panic = 0.000; target = 7.000 Desired StablePodCount = 1, PanicPodCount = 1, ReadyEndpointCount = 1, MaxScaleUp = 1000, MaxScaleDown = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:29.574460874Z","logger":"autoscaler","caller":"scaling/autoscaler.go:247","message":"Operating in stable mode.","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}
{"severity":"DEBUG","timestamp":"2024-05-08T12:26:29.574471504Z","logger":"autoscaler","caller":"scaling/autoscaler.go:286","message":"PodCount=1 Total1PodCapacity=10.000 ObsStableValue=0.000 ObsPanicValue=0.000 TargetBC=211.000 ExcessBC=-202.000","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}

{"severity":"DEBUG","timestamp":"2024-05-08T12:26:30.105266887Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:252","message":"|OldPods| = 1, |YoungPods| = 0","commit":"f1bd929","knative.dev/key":"dx-system/autoscale-go-00003"}

@kiranmenon
Copy link
Author

One observation is that we are using istio and also a custom stack which contains prometheus and jaeger tracing.
When I scaled down that stack, the autoscaler-go sample deployment also got scaled down to zero as expected.
If Im not wrong prometheus scraps the services for metrics, so does that counts as real API traffic call and because of that it was not scaled down to zero before?

@skonto
Copy link
Contributor

skonto commented May 15, 2024

Hi @kiranmenon

If Im not wrong prometheus scraps the services for metrics, so does that counts as real API traffic call and because of that it was not scaled down to zero before?

If your metrics are exposed on the same port as your normal app requests probably it is the case. See a similar issue here for probes: #14581.
Could you enable request logging and report back what queue proxy receives?
In cm config-observability you can set logging.enable-request-log: "true".

@kiranmenon
Copy link
Author

hi @skonto
I see below logs repeating periodically.

{"httpRequest": {"requestMethod": "GET", "requestUrl": "/metrics", "requestSize": "0", "status": 200, "responseSize": "762", "userAgent": "Prometheus/2.34.0", "remoteIp": "127.0.0.6:59061", "serverIp": "10.244.1.190", "referer": "", "latency": "0.000619453s", "protocol": "HTTP/1.1"}, "traceId": "[]"}

@kiranmenon
Copy link
Author

As of now I increased the prometheus scraping interval > stable-window: 30s and seems to be ok now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants