Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update NodePool.Status with node drift information. #6166

Open
shreyas-badiger opened this issue May 7, 2024 · 4 comments
Open

Update NodePool.Status with node drift information. #6166

shreyas-badiger opened this issue May 7, 2024 · 4 comments
Assignees
Labels
feature New feature or request metrics-audit Issues that should be captured as part of the upcoming v1beta1 Karpenter metrics audit

Comments

@shreyas-badiger
Copy link

shreyas-badiger commented May 7, 2024

Description

What problem are you trying to solve?
Currently, there is no clear way to identify how many nodes have drifted from the current hash (nodepool hash and ec2nodeclass hash). To determine the node rotation progress, we will have to look into individual node objects, nodeclaims, nodepool and nodeclass.

Since Karpenter controller identifies and rotates the drifted nodes, I am assuming the controller already maintains the list of drifted nodes (if not, identifies the drifted nodes in every reconciliation.) It will be helpful to surface this information in the NodePool status.

for ex:

status:
  resources:
    cpu: "64"
    ephemeral-storage: 134205420Ki
    memory: 258565188Ki
    pods: "640"
    nodes:
      totalNodes: 10
      driftedNodes: 2

How important is this feature to you?
This feature will be very useful to identify the progress of node rotation whenever we change AMIs or trigger any other form of upgrades by updating the nodepool or ec2nodeclass.

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@shreyas-badiger shreyas-badiger added feature New feature or request needs-triage Issues that need to be triaged labels May 7, 2024
@jonathan-innis
Copy link
Contributor

Does a metric work for you here? Or would you like a constant update of rolled-up information directly in the status? This is definitely something we are thinking about as we are thinking about how we can improve our observability of Karpenter for v1.

@jonathan-innis
Copy link
Contributor

Does this request belong in the kubernetes-sigs/karpenter repo since it's about the netural concept of drift?

@jonathan-innis jonathan-innis added metrics-audit Issues that should be captured as part of the upcoming v1beta1 Karpenter metrics audit and removed needs-triage Issues that need to be triaged labels May 9, 2024
@jonathan-innis jonathan-innis self-assigned this May 9, 2024
@vgunapati
Copy link

It would be advantageous to have this data accessible in both metrics and CR status. Adding it to the CR status would greatly benefit other watchers in the cluster, We should also consider adding the number of Nodes that are restricted because of PDB violations.

@jonathan-innis
Copy link
Contributor

It would be advantageous to have this data accessible in both metrics and CR status

We do have to be a little thoughtful about the number of updates that this would generate. I'm not saying that it's out of the question, but a metric is a bit easier to swallow only because they're pull-based and not push-based.

there is no clear way to identify how many nodes have drifted from the current hash

You could take a look at the NodeClaim status conditions to see if a NodeClaim has a "Drifted" condition. Counting these up across the cluster (or by label) should give you the info you want. Yeah, you have to construct it, but given that this doesn't currently exist in Karpenter, this is a possible workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request metrics-audit Issues that should be captured as part of the upcoming v1beta1 Karpenter metrics audit
Projects
None yet
Development

No branches or pull requests

3 participants