Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate k8s-infra-prow-builds to cgroups v2 #5276

Open
BenTheElder opened this issue May 15, 2023 · 14 comments
Open

Migrate k8s-infra-prow-builds to cgroups v2 #5276

BenTheElder opened this issue May 15, 2023 · 14 comments
Assignees
Labels
area/infra Infrastructure management, infrastructure design, code in infra/ priority/backlog Higher priority than priority/awaiting-more-evidence. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@BenTheElder
Copy link
Member

We'll probably want to coordinate with test-infra-oncall to do this for the existing default cluster as well.

https://cloud.google.com/kubernetes-engine/docs/how-to/node-system-config#cgroup-mode-options

cgroups v2 is the future and eventually when kubernetes etc stops supporting v1 then we will need this to run things like KIND and local-up-cluster.sh

@BenTheElder
Copy link
Member Author

We should also check what the EKS cluster is doing.

@bobbypage
Copy link
Member

/cc

@BenTheElder
Copy link
Member Author

Quick hacky test:

kubectl run --rm -it --privileged --env DOCKER_IN_DOCKER_ENABLED=true --image=gcr.io/k8s-staging-test-infra/krte:v20230421-ec4335b54b-master test -- sh -c 'curl -Lo ./kind "https://kind.sigs.k8s.io/dl/v0.18.0/kind-$(uname)-amd64" && chmod +x ./kind && ./kind create cluster; ./kind delete cluster'
If you don't see a command prompt, try pressing enter.
wrapper.sh] [SETUP] Done setting up Docker in Docker.
fatal: not a git repository (or any of the parent directories): .git
================================================================================
wrapper.sh] [TEST] Running Test Command: `sh -c curl -Lo ./kind "https://kind.sigs.k8s.io/dl/v0.18.0/kind-$(uname)-amd64" && chmod +x ./kind && ./kind create cluster; ./kind delete cluster` ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    97  100    97    0     0    487      0 --:--:-- --:--:-- --:--:--   487
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 6808k  100 6808k    0     0  10.8M      0 --:--:-- --:--:-- --:--:-- 10.8M
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.26.3) 🖼 
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing CNI 🔌 
 ✓ Installing StorageClass 💾 
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂
Deleting cluster "kind" ...
Deleted nodes: ["kind-control-plane"]
wrapper.sh] [TEST] Test Command exit code: 0
wrapper.sh] [CLEANUP] Cleaning up after Docker in Docker ...
Stopping Docker: docker.
wrapper.sh] [CLEANUP] Done cleaning up after Docker in Docker.
================================================================================
wrapper.sh] Exiting 0
Session ended, resume using 'kubectl attach test -c test -i -t' command when the pod is running
pod "test" deleted

KIND at the latest release is working fine with our e2e docker-in-docker on cgroups v2 GKE host (v1.26 node pool in a new test cluster, double checked that v2 is detected).

@xmudrii
Copy link
Member

xmudrii commented Jun 4, 2023

/cc
for EKS cluster

@ameukam
Copy link
Member

ameukam commented Sep 19, 2023

IIUC, adding a new nodepool with GKE 1.26+ should give us cgroupv2 by default. prow-build got upgraded to 1.26. Maybe add a new nodepool ?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 28, 2024
@xmudrii
Copy link
Member

xmudrii commented Jan 28, 2024

Still relevant kubernetes/test-infra#31572
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 28, 2024
@ameukam
Copy link
Member

ameukam commented Jan 29, 2024

/sig k8s-infra
/sig testing
/priority backlog
/area infra

@k8s-ci-robot k8s-ci-robot added sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing. priority/backlog Higher priority than priority/awaiting-more-evidence. area/infra Infrastructure management, infrastructure design, code in infra/ labels Jan 29, 2024
@ameukam
Copy link
Member

ameukam commented Jan 29, 2024

/assign @BenTheElder

@xmudrii
Copy link
Member

xmudrii commented Jan 29, 2024

It appears that the EKS Prow build cluster is already using cgroups v2. Here are some references:

I did some checks as well:

$ mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel)
$ stat -f -c %T /sys/fs/cgroup/
cgroup2fs

@BenTheElder @ameukam Is there anything else that we should check?

@ameukam
Copy link
Member

ameukam commented Jan 29, 2024

I don't think there is something else to be check. Let's wait on Ben's input.

@BenTheElder
Copy link
Member Author

KIND jobs will need to be migrated to cover both and we'll need to figure out how/when we want to flip this bit on our assorted GKE clusters. Not sure on that and have been mostly focused on getting all the infra running under community accounts at the moment, but we should really be discussing this with kubernetes/enhancements#4572

@BenTheElder
Copy link
Member Author

I think this is just going to work, but we have to consider that hack/local-up-cluster.sh and KIND style coverage for any projects using them will suddenly be v2 and not v1 anymore.

In the KIND project we have some github actions aimed at ensuring we get coverage for both (... because this is a kernel / boot time option and on prow we'd have to spin up external VMs ... we may have to move that to prow anyhow as the future of macOS with virt support on github actions seems unclear)

... other projects / CI jobs have implicitly been testing on v1

@BenTheElder
Copy link
Member Author

We should probably at least send an FYI to dev@kubernetes.io when we flip the switch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/infra Infrastructure management, infrastructure design, code in infra/ priority/backlog Higher priority than priority/awaiting-more-evidence. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests

6 participants