Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant Drift in HPA Resource #3277

Closed
1 task done
rocktavious opened this issue Nov 2, 2022 · 7 comments · Fixed by fluxcd/pkg#449
Closed
1 task done

Constant Drift in HPA Resource #3277

rocktavious opened this issue Nov 2, 2022 · 7 comments · Fixed by fluxcd/pkg#449
Labels
area/diff Diff related issues and pull requests

Comments

@rocktavious
Copy link

Describe the bug

I'm trying to have flux apply HPA resources to our cluster and i'm getting constant drift. We are using EKS 1.23 and autoscaling/v2/HorizontalPodAutoscaler with a Datadog External Metric.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: runner
  namespace: opslevel
spec:
  minReplicas: 1
  maxReplicas: 10
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: runner
  metrics:
  - type: External
    external:
      metric:
        name: datadogmetric@opslevel:runner
      target:
        type: AverageValue
        averageValue: "3"

Running flux diff i get the following output always

✓  Kustomization diffing...
► HorizontalPodAutoscaler/opslevel/runner drifted

spec.metrics
  + one list entry added:
    - type: External
    │ external:
    │ │ metric:
    │ │ │ name: "datadogmetric@opslevel:runner"
    │ │ target:
    │ │ │ type: AverageValue
    │ │ │ averageValue: 3

⚠️ identified at least one change, exiting with non-zero exit code

The resource in the cluster is

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"autoscaling/v2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"runner","namespace":"opslevel"},"spec":{"maxReplicas":10,"metrics":[{"external":{"metric":{"name":"datadogmetric@opslevel:runner"},"target":{"averageValue":"3","type":"AverageValue"}},"type":"External"}],"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"runner"}}}
  creationTimestamp: "2022-08-24T16:31:57Z"
  labels:
    kustomize.toolkit.fluxcd.io/name: flux-system
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  name: runner
  namespace: opslevel
  resourceVersion: "147477184"
  uid: d39239bf-6a3a-4fd9-8ae2-e862a2748a8d
spec:
  maxReplicas: 10
  metrics:
  - external:
      metric:
        name: datadogmetric@opslevel:runner
      target:
        averageValue: "3"
        type: AverageValue
    type: External
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: runner
status:
  conditions:
  - lastTransitionTime: "2022-08-24T16:32:12Z"
    message: recommended size matches current size
    reason: ReadyForNewScale
    status: "True"
    type: AbleToScale
  - lastTransitionTime: "2022-10-26T22:45:46Z"
    message: the HPA was able to successfully calculate a replica count from external
      metric datadogmetric@opslevel:runner(nil)
    reason: ValidMetricFound
    status: "True"
    type: ScalingActive
  - lastTransitionTime: "2022-11-02T11:04:54Z"
    message: the desired count is within the acceptable range
    reason: DesiredWithinRange
    status: "False"
    type: ScalingLimited
  currentMetrics:
  - external:
      current:
        averageValue: 2500m
        value: "0"
      metric:
        name: datadogmetric@opslevel:runner
    type: External
  currentReplicas: 2
  desiredReplicas: 2
  lastScaleTime: "2022-11-02T11:50:45Z"

Steps to reproduce

The flux bootstrap command we used

export GITLAB_TOKEN=XXXX
flux bootstrap gitlab \
  --components=source-controller,kustomize-controller,notification-controller,image-reflector-controller,image-automation-controller \
  --owner=jklabsinc \
  --repository=opslevel-kubernetes \
  --branch=main \
  --path=clusters/dev \
  --reconcile \
  --private \
  --read-write-key

Then ensure you have the datadog metric CRD and an instance of that resource in your flux repo
Then put in the HPA and you get the constant drift.

Expected behavior

No drift

Screenshots and recordings

No response

OS / Distro

Mac OS

Flux version

flux: v0.27.3

Flux check

► checking prerequisites
✗ flux 0.27.3 <0.36.0 (new version is available, please upgrade)
✔ Kubernetes 1.23.10-eks-15b7512 >=1.20.6-0
► checking controllers
✔ image-automation-controller: deployment ready
► ghcr.io/fluxcd/image-automation-controller:v0.18.0
✔ image-reflector-controller: deployment ready
► ghcr.io/fluxcd/image-reflector-controller:v0.14.0
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.18.2
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.19.0
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.19.2
✔ all checks passed

Git provider

Gitlab

Container Registry provider

Gitlab

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@stefanprodan
Copy link
Member

stefanprodan commented Nov 2, 2022

Can you please run flux reconcile and then post here kubectl get hpa --show-manged-fields, Flux should remove the last-applied-configuration annotation.

@souleb souleb added the area/diff Diff related issues and pull requests label Nov 22, 2022
@mceronja
Copy link

Hi @stefanprodan !

I just came across this yesterday and this doesn't seem like a flux issue.
It seems that the combination of HorizontalPodAutoscaler with apiVersion: autoscaling/v2 and with metrics: - type: External is the issue:
https://github.com/kubernetes/design-proposals-archive/blob/main/autoscaling/hpa-v2.md#external

When I described the resource I noticed Warning: autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler even though I explicitly have apiVersion: autoscaling/v2 in the manifest and I believe that this is why flux is constantly trying to configure the resource.

Options:

  1. use apiVersion: autoscaling/v2beta2 for HorizontalPodAutoscaler (I will use this)
  2. use Custom Metrics API with with apiVersion: autoscaling/v2 and with metrics: - type: Object (didn't test this)

If you agree with all of this feel free to close the ticket 😃

@stefanprodan
Copy link
Member

For apiVersion: autoscaling/v2 we're going to fix the diff in fluxcd/pkg#449

@mceronja
Copy link

Great news. I will test this when it comes out, until then I will use autoscaling/v2beta2.

@rocktavious
Copy link
Author

Thanks @mceronja for jumping in - This appears to be our issue as well.

Sorry @stefanprodan - I'm not sure why i never saw your reply back in Nov. Because of the issue we decided to put down the HPAs but i'm happy retry again once the fix is out.

@rocktavious
Copy link
Author

@stefanprodan - What release of flux will this end up in?

@stefanprodan
Copy link
Member

Flux v0.39.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/diff Diff related issues and pull requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants