Autoscaling with multiple metrics does not work #3638

shazinahmed · 2024-04-26T21:31:47Z

/kind bug

What steps did you take and what happened:
Tried the following

Tried creating memory based autoscaling using knative annotations as below. A CPU based HPA was created instead with averageUtilization set to 80.

autoscaling.knative.dev/class: "hpa.autoscaling.knative.dev"
autoscaling.knative.dev/metric: "memory"
autoscaling.knative.dev/target: "75"

Tried creating memory based autoscaling using kserve annotations as below. A CPU based HPA was created instead with averageUtilization set to 80.

serving.kserve.io/autoscalerClass: hpa
serving.kserve.io/deploymentMode: RawDeployment
serving.kserve.io/metric: memory
serving.kserve.io/targetUtilizationPercentage: "70"

Tried creating memory based autoscaling by setting predictor.scaleMetric to memory and a memory based HPA was created. Yaay!

In scenario 1 and 2 HPAs are created as expected if the metric is set as cpu.

Now, I want my HPA to be controlled by both memory and CPU. I tried setting predictor.scaleMetric to memory and a corresponding scaleTarget. Also set CPU thresholds using serving.kserve.io/metric: cpu. But only predictor.scaleMetric is respected.

What did you expect to happen:
I want HPA to have both memory and CPU based triggers.

What's the InferenceService yaml:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  annotations:
    serving.kserve.io/autoscalerClass: hpa
    serving.kserve.io/deploymentMode: RawDeployment
    serving.kserve.io/metric: cpu
    serving.kserve.io/targetUtilizationPercentage: "70"
  name: custom-model
  namespace: dev-xyz-multimodal-ner
spec:
  predictor:
    containers:
    - env:
      - name: AWS_BUCKET
        value: xyz-xyz-xyz-xyz-xyz-xyz
      image: <custom image>
      name: kserve-container
      resources:
        limits:
          cpu: "4"
          memory: 6Gi
        requests:
          cpu: "1"
          memory: 2Gi
    maxReplicas: 4
    nodeSelector:
      nodetype: modelserving
    scaleMetric: memory
    scaleTarget: 75
    tolerations:
    - effect: NoSchedule
      key: dedicated
      operator: Equal
      value: modelserving

Anything else you would like to add:
The HPA YAML inferenceservice generates

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  annotations:
    serving.kserve.io/autoscalerClass: hpa
    serving.kserve.io/deploymentMode: RawDeployment
    serving.kserve.io/metric: cpu
    serving.kserve.io/targetUtilizationPercentage: "80"
  name: custom-model-predictor-default
  namespace: dev-xyz-multimodal-ner
  ownerReferences:
  - apiVersion: serving.kserve.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: InferenceService
    name: custom-model
spec:
  behavior:
    scaleDown:
      policies:
      - periodSeconds: 15
        type: Percent
        value: 100
      selectPolicy: Max
    scaleUp:
      policies:
      - periodSeconds: 15
        type: Pods
        value: 4
      - periodSeconds: 15
        type: Percent
        value: 100
      selectPolicy: Max
      stabilizationWindowSeconds: 0
  maxReplicas: 4
  metrics:
  - resource:
      name: memory
      target:
        averageUtilization: 75
        type: Utilization
    type: Resource
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: custom-model-predictor-default

Environment:

Istio Version: N/A (Using nginx)
Knative Version: N/A (Using Kubernetes deployment)
KServe Version: 0.11.0
Kubeflow version:
Cloud Environment:[AWS EKS]
Minikube/Kind version:
Kubernetes version: v1.28.7-eks-b9c9ed7
OS (e.g. from /etc/os-release): Amazon Linux 2

The text was updated successfully, but these errors were encountered:

yuzisun · 2024-05-04T13:53:20Z

@shazinahmed Scaling based on multiple metrics is not supported, can you elaborate how you want to scale with both of these metrics?

shazinahmed · 2024-05-31T16:57:22Z

@yuzisun Sorry, I missed this one. I want to have an HPA created with two triggers, one for CPU and one for memory, like we can have it in a normal Kubernetes deployment. This will enable us to scale both on CPU and memory triggers depending on what is over utilized.

spolti · 2024-06-03T16:00:24Z

Based on the options 1 and 2: it seems that the annotations are not used when the metrics are created, it defaults to CPU and 80% if predictor.scaleMetric | target are not set:
https://github.com/kserve/kserve/blob/master/pkg/controller/v1beta1/inferenceservice/reconcilers/hpa/hpa_reconciler.go#L60

Didn't find a doc link to HPA as well, we might be missing this part.

On the other hand, it seems k8s API supports it:
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#scaling-on-multiple-metrics

Maybe we could evaluate to bring this functionality to KServer when HPA is enabled.

wdyt?

oss-prow-bot bot added the kind/bug label Apr 26, 2024

yuzisun added kind/feature and removed kind/bug labels May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoscaling with multiple metrics does not work #3638

Autoscaling with multiple metrics does not work #3638

shazinahmed commented Apr 26, 2024 •

edited

yuzisun commented May 4, 2024

shazinahmed commented May 31, 2024 •

edited

spolti commented Jun 3, 2024

Autoscaling with multiple metrics does not work #3638

Autoscaling with multiple metrics does not work #3638

Comments

shazinahmed commented Apr 26, 2024 • edited

yuzisun commented May 4, 2024

shazinahmed commented May 31, 2024 • edited

spolti commented Jun 3, 2024

shazinahmed commented Apr 26, 2024 •

edited

shazinahmed commented May 31, 2024 •

edited