You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure what is the correct place to report this, please direct me if this is not the correct place.
Goal:
I want to have EKS cluster with working observability, Bottlerocket AMI and GPU-nodes (g5* instances)
I use this helm chart by enabling amazon-cloudwatch-observability EKS add-on for my cluster.
Steps to reproduce:
I create latest version of EKS, GPU nodes with the last version of Bottlerocket AMI.
I enable the latest version of the amazon-cloudwatch-observability EKS add-on (dcgm-exporter image 602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/observability/dcgm-exporter:3.3.3-3.3.1-ubuntu22.04 is used )
All related daemonsets except for dcgm-exporter work well.
I guess this is some version incompatibility issue for the DCGM and nvidia driver (being installed to nodes via k8s-device-plugin ).
What should I do to make DCGM exporter work?
The text was updated successfully, but these errors were encountered:
I appreciate your help.
I'm just creating a node group with ami_type = "BOTTLEROCKET_x86_64_NVIDIA"
specified in Terraform config. It takes the latest version of the image family during the initial creation.
Currently I have release_version = "1.19.5-64049ba8"
imageId=ami-0f3f964e4f939bbd0
But I do not really care about the version. If you can successfully run DCGM export as a part of amazon-cloudwatch-observability EKS add-on with g5.xlarge on any BottleRocket image -- It will be enough for me. Then I will consider the issue to be my own problem and will debug it myself
I'm not sure what is the correct place to report this, please direct me if this is not the correct place.
Goal:
I want to have EKS cluster with working observability, Bottlerocket AMI and GPU-nodes (g5* instances)
I use this helm chart by enabling amazon-cloudwatch-observability EKS add-on for my cluster.
Steps to reproduce:
I guess this is some version incompatibility issue for the DCGM and nvidia driver (being installed to nodes via k8s-device-plugin ).
What should I do to make DCGM exporter work?
The text was updated successfully, but these errors were encountered: