Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rootless Docker in Docker documentation does not work #3475

Open
4 tasks done
dillon-cullinan opened this issue Apr 26, 2024 · 6 comments
Open
4 tasks done

Rootless Docker in Docker documentation does not work #3475

dillon-cullinan opened this issue Apr 26, 2024 · 6 comments
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@dillon-cullinan
Copy link

dillon-cullinan commented Apr 26, 2024

Checks

Controller Version

0.9.0

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Using own docs: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/deploying-runner-scale-sets-with-actions-runner-controller#example-running-dind-rootless
2. Deploy Runner Scale Set using `dind-rootless` values.yaml
3. `dind` container `cmd` fails

Describe the bug

Documentation does not work for rootless dind, and previous functionality that existed in RunnerDeployment was removed, breaking an already existing solution.

Describe the expected behavior

dind container should exit cleanly allowing for docker usage on the runner container.

Additional Context

---
runnerScaleSetName: <redacted>
githubConfigUrl: <redacted>
githubConfigSecret: <redacted>
maxRunners: 16
minRunners: 0
metadata:
  name: <redacted>
  namespace: gha-runner-scale-set-controller
template:
  metadata:
    annotations:
      cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
  spec:
    nodeSelector:
      cloud.google.com/gke-nodepool: gpu-single
      kubernetes.io/arch: amd64
      kubernetes.io/os: linux
    volumes:
    - name: tmpdir
      emptyDir: {}
    - name: work
      emptyDir: {}
    - name: dind-externals
      emptyDir: {}
    - name: dind-sock
      emptyDir: {}
    - name: dind-etc
      emptyDir: {}
    - name: dind-home
      emptyDir: {}
    initContainers:
    - name: init-dind-externals
      image: ghcr.io/actions/actions-runner:latest
      command: ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
      volumeMounts:
        - name: dind-externals
          mountPath: /home/runner/tmpDir
    - name: init-dind-rootless
      image: docker:dind-rootless
      command:
        - sh
        - -c
        - |
          set -x
          cp -a /etc/. /dind-etc/
          echo 'runner:x:1001:1001:runner:/home/runner:/bin/ash' >> /dind-etc/passwd
          echo 'runner:x:1001:' >> /dind-etc/group
          echo 'runner:100000:65536' >> /dind-etc/subgid
          echo 'runner:100000:65536' >>  /dind-etc/subuid
          chmod 755 /dind-etc;
          chmod u=rwx,g=rx+s,o=rx /dind-home
          chown 1001:1001 /dind-home
      securityContext:
        runAsUser: 0
      volumeMounts:
        - mountPath: /dind-etc
          name: dind-etc
        - mountPath: /dind-home
          name: dind-home
    containers:
    - name: runner
      image: ghcr.io/actions/actions-runner:latest
      command: ["/home/runner/run.sh"]
      env:
        - name: DOCKER_HOST
          value: unix:///var/run/docker.sock
      volumeMounts:
      - mountPath: /tmp
        name: tmpdir
      - name: work
        mountPath: /home/runner/_work
      - name: dind-sock
        mountPath: /var/run
      resources:
        requests:
          cpu: "2000m"
          memory: "20Gi"
          ephemeral-storage: "24Gi"
        limits:
          cpu: "3000m"
          memory: "24Gi"
          nvidia.com/gpu: 1
    - name: dind
      image: docker:dind-rootless
      args:
        - dockerd
        - --host=unix:///var/run/docker.sock
      securityContext:
        privileged: true
        runAsUser: 1001
        runAsGroup: 1001
      volumeMounts:
        - name: work
          mountPath: /home/runner/_work
        - name: dind-sock
          mountPath: /var/run
        - name: dind-externals
          mountPath: /home/runner/externals
        - name: dind-etc
          mountPath: /etc
        - name: dind-home
          mountPath: /home/runner

Controller Logs

Not applicable as the Pod is running and shutting down as expected.

Runner Pod Logs

https://gist.github.com/dillon-cullinan/82cabc257b19c8f0c172dc0b6808cf59
@dillon-cullinan dillon-cullinan added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Apr 26, 2024
@dillon-cullinan
Copy link
Author

dillon-cullinan commented May 3, 2024

To get this working, there are a couple issues that had to be fixed. There is a typo in the provided chart in the docs:
ash should of course be bash.

Secondly, the latest dind-rootless container has a few issues. Rolled back the image version for docker to docker:24.0.6-dind-rootless and that solves some problems.

The second problem is the assumed socket used by docker which is defined in the docs as --host=unix:///var/run/docker.sock. After removing this argument from the command and letting the service choose whatever socket it wants, it chose the socket based on the UID: unix:///run/user/1001/docker.sock

With these two changes, it works. Here is the working PodSpec template:

template:
  spec:
    volumes:
    - name: tmpdir
      emptyDir: {}
    - name: work
      emptyDir: {}
    - name: dind-externals
      emptyDir: {}
    - name: dind-sock
      emptyDir: {}
    - name: dind-etc
      emptyDir: {}
    - name: dind-home
      emptyDir: {}
    initContainers:
    - name: init-dind-externals
      image: ghcr.io/actions/actions-runner:latest
      command: ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
      volumeMounts:
        - name: dind-externals
          mountPath: /home/runner/tmpDir
    - name: init-dind-rootless
      image: docker:24.0.6-dind-rootless
      command:
        - sh
        - -c
        - |
          set -x
          cp -a /etc/. /dind-etc/
          echo 'runner:x:1001:1001:runner:/home/runner:/bin/ash' >> /dind-etc/passwd
          echo 'runner:x:1001:' >> /dind-etc/group
          echo 'runner:100000:65536' >> /dind-etc/subgid
          echo 'runner:100000:65536' >>  /dind-etc/subuid
          chmod 755 /dind-etc;
          chmod u=rwx,g=rx+s,o=rx /dind-home
          chown 1001:1001 /dind-home
      securityContext:
        runAsUser: 0
      volumeMounts:
        - mountPath: /dind-etc
          name: dind-etc
        - mountPath: /dind-home
          name: dind-home
    containers:
    - name: runner
      image: ghcr.io/actions/actions-runner:latest
      command: ["/home/runner/run.sh"]
      env:
        - name: DOCKER_HOST
          value: unix:///run/user/1001/docker.sock
      volumeMounts:
      - mountPath: /tmp
        name: tmpdir
      - name: work
        mountPath: /home/runner/_work
      - name: dind-sock
        mountPath: /var/run
    - name: dind
      image: docker:24.0.6-dind-rootless
      args:
        - dockerd
      securityContext:
        privileged: true
        runAsUser: 1001
        runAsGroup: 1001
      volumeMounts:
        - name: work
          mountPath: /home/runner/_work
        - name: dind-sock
          mountPath: /var/run
        - name: dind-externals
          mountPath: /home/runner/externals
        - name: dind-etc
          mountPath: /etc
        - name: dind-home
          mountPath: /home/runner

@mmckane
Copy link

mmckane commented May 3, 2024

Are you on GKE COS nodes? I was able to get things started by building an Ubuntu node pool and pining my containers there.

edit:
To add more details here I get the following error when running on COS based images in GKE regardless of utilizing docker:24.0.6-dind-rootless or docker:dind-rootless

Error Message:

time="2024-05-03T22:57:33.920775537Z" level=info msg="unable to detect if iptables supports xlock: 'iptables --wait -L -n': `iptables v1.8.10 (nf_tables): Could not fetch rule set generation id: Invalid argument`" error="exit status 4"
time="2024-05-03T22:57:33.947346735Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
time="2024-05-03T22:57:33.947888475Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
time="2024-05-03T22:57:33.947924935Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to register "bridge" driver: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables v1.8.10 (nf_tables): Could not fetch rule set generation id: Invalid argument
 (exit status 4)
[rootlesskit:child ] error: command [docker-init -- dockerd --host=unix:///socket/docker.sock] exited: exit status 1
[rootlesskit:parent] error: child exited: exit status 1

The GKE Ubuntu based OS image seems to start fine for either.

@mmckane
Copy link

mmckane commented May 3, 2024

@dillon-cullinan I also don't believe that echo 'runner:x:1001:1001:runner:/home/runner:/bin/ash' >> /dind-etc/passwd is a typo of bash I believe this is an image without bash installed and it should be /bin/ash. You can see the unmodified file in the dind-rootless container are all /bin/ash

docker run -it --rm --entrypoint /bin/sh docker:dind-rootless
/ $ cat /etc/passwd
root:x:0:0:root:/root:/bin/ash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/mail:/sbin/nologin
news:x:9:13:news:/usr/lib/news:/sbin/nologin
uucp:x:10:14:uucp:/var/spool/uucppublic:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
man:x:13:15:man:/usr/man:/sbin/nologin
postmaster:x:14:12:postmaster:/var/mail:/sbin/nologin
cron:x:16:16:cron:/var/spool/cron:/sbin/nologin
ftp:x:21:21::/var/lib/ftp:/sbin/nologin
sshd:x:22:22:sshd:/dev/null:/sbin/nologin
at:x:25:25:at:/var/spool/cron/atjobs:/sbin/nologin
squid:x:31:31:Squid:/var/cache/squid:/sbin/nologin
xfs:x:33:33:X Font Server:/etc/X11/fs:/sbin/nologin
games:x:35:35:games:/usr/games:/sbin/nologin
cyrus:x:85:12::/usr/cyrus:/sbin/nologin
vpopmail:x:89:89::/var/vpopmail:/sbin/nologin
ntp:x:123:123:NTP:/var/empty:/sbin/nologin
smmsp:x:209:209:smmsp:/var/spool/mqueue:/sbin/nologin
guest:x:405:100:guest:/dev/null:/sbin/nologin
nobody:x:65534:65534:nobody:/:/sbin/nologin
dockremap:x:100:101:Linux User,,,:/home/dockremap:/sbin/nologin
rootless:x:1000:1000:Rootless:/home/rootless:/bin/ash
/ $ ls -la /bin/ash
lrwxrwxrwx    1 root     root            12 Jan 26 17:53 /bin/ash -> /bin/busybox
/ $ 

The socket problem is for sure an issue I fought with last week. I ended up putting my socket in a volume and sharing it to /var/run/docker.sock. This is mostly due to caution as I saw this issue hanging out there #2519 where if your socket isn't at /var/run/docker.sock on the runner container side bad things happened, and I wasn't sure if that was all fixed or not.

@dillon-cullinan
Copy link
Author

Are you on GKE COS nodes? I was able to get things started by building an Ubuntu node pool and pining my containers there.

edit: To add more details here I get the following error when running on COS based images in GKE regardless of utilizing docker:24.0.6-dind-rootless or docker:dind-rootless

Error Message:

time="2024-05-03T22:57:33.920775537Z" level=info msg="unable to detect if iptables supports xlock: 'iptables --wait -L -n': `iptables v1.8.10 (nf_tables): Could not fetch rule set generation id: Invalid argument`" error="exit status 4"
time="2024-05-03T22:57:33.947346735Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
time="2024-05-03T22:57:33.947888475Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
time="2024-05-03T22:57:33.947924935Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to register "bridge" driver: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables v1.8.10 (nf_tables): Could not fetch rule set generation id: Invalid argument
 (exit status 4)
[rootlesskit:child ] error: command [docker-init -- dockerd --host=unix:///socket/docker.sock] exited: exit status 1
[rootlesskit:parent] error: child exited: exit status 1

The GKE Ubuntu based OS image seems to start fine for either.

Yes, we are using GKE COS and we have it working right now, its interesting you are running into issues as well despite the changes. We are using gke version 1.28.7-gke.1026000 just in case this matters.

@dillon-cullinan
Copy link
Author

dillon-cullinan commented May 6, 2024

@dillon-cullinan I also don't believe that echo 'runner:x:1001:1001:runner:/home/runner:/bin/ash' >> /dind-etc/passwd is a typo of bash I believe this is an image without bash installed and it should be /bin/ash. You can see the unmodified file in the dind-rootless container are all /bin/ash

[...]

Thank you for the correction, I've edited my previous comment.

On RunnerDeployments the setup is much easier from what I've experienced. The PodSpec has a value you set: dockerdWithinRunnerContainer: true .

For our containers we basically pulled bits and pieces from here: https://github.com/actions/actions-runner-controller/blob/master/runner/actions-runner-dind-rootless.ubuntu-20.04.dockerfile

Added the relevant lines from that Dockerfile into our custom stuff and it worked, you can probably just use this image as a base if it fits your use case.

Snippet of the runner container values:

        command:
          - bash
          - -c
          - "mkdir -p /home/runner/.docker/docker /home/runner/.local/share && ln -s /home/runner/.docker/docker /home/runner/.local/share/docker && /bin/bash /usr/bin/entrypoint-dind-rootless.sh"
        securityContext:
          privileged: true   

With the dockerd value set and the proper image, it all works with a singular container inside the pod, no dind container, no init containers. Much cleaner in general.

@mmckane
Copy link

mmckane commented May 8, 2024

We are currently on 1.26 due to many many developers that won't move off deprecated API versions for a few things. I will see if we can get to 1.28 and try again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

2 participants