Working non-trivial example for gha-runner-scale-set #2809

davhdavh · 2023-08-14T12:52:45Z

What would you like added?

I would really like there to be a working example that can do more than just hello world.

Why is this needed?

Because nothing really works beyond hello world

Additional context

Example Docker:

name: demo
on:
  workflow_dispatch:
jobs:
  demo:
    runs-on: demo
    steps:
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v2

dind: fails because of "ERROR: could not create a builder instance with TLS data loaded from environment."

kubernetes: fails because of "/usr/bin/tar: ../../../.docker: Cannot mkdir: Permission denied"

Example Cache volume:

name: demo
on:
  workflow_dispatch:
jobs:
  first-run:
    runs-on: demo
    steps:
    - run: echo 'demo' > $RUNNER_TOOL_CACHE/demo

  second-run:
    runs-on: demo
    needs: [first-run]
    steps:
    - run: cat $RUNNER_TOOL_CACHE/demo

default: doesn't output the expected value

Add pvc resource + extra config:

  template:
    spec:
      securityContext:
        fsGroup: 123
      containers:
      - name: runner
        image: 'ghcr.io/actions/actions-runner:latest'
        command: ["/home/runner/run.sh"]
        volumeMounts:
        - name: cache
          mountPath: /home/runner/_work/_tool
      nodeSelector:
        kubernetes.io/os: linux
      volumes:
      - name: cache
        persistentVolumeClaim:
          claimName: github-cache

and yes, you need the image and command also, or things break

dind: Works

kubernetes: $RUNNER_TOOL_CACHE is different path in running container. Fails with mountpath set to either the above or the changed path

Example windows:

default: Doesn't work with the windows dockerfile example: Cannot start and Fails with AutoscalingRunnerSet.actions.github.com "gha-windows" is invalid: [spec.template.spec.containers[0].volumeMounts: Invalid value: "null":

dind: Doesn't work with the windows dockerfile example. Cannot start and Fails to run the dind things in linux, or have a windows image for them

kubernetes: Doesn't work with the windows dockerfile example. Can start but fails on the "initialize containers" step with "
Container operations are only supported on Linux runners"

Other issues

Everything is marked with deletion protection, even though everything is ephemeral.
Helm-chart uses lookup, so requires work-around for ArgoCD

The text was updated successfully, but these errors were encountered:

github-actions · 2023-08-14T12:53:25Z

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

hamishforbes · 2023-08-15T01:53:07Z

buildx can be fixed by creating your own context as in:
docker/setup-buildx-action#105
mumoshu/actions-runner-controller-ci@e91c8c0

this is more of a quirk of buildx and the docker dind image than anything specifically related to ARC

davhdavh · 2023-08-15T04:42:24Z

Yes, thank you.

That's why I made it as a enhancement request for a full example, rather than set of bugs

nikola-jokic · 2023-08-15T13:20:10Z

Hey @davhdavh,

This section in github docs provides information on more advanced configuration options, including proxy, dind, kubernetes mode etc.

davhdavh · 2023-08-18T13:06:47Z

other weird things...
The image used for the runner is ghcr.io/actions/actions-runner, which uses /home/runner/ and docker is gid 123 and tool cache is /home/runner/_work/tool.
The images in this repo actions-runner-controller/actions-runner-controller/actions-runner, which uses /runnertmp/ and tool cache is /opt/hostedtoolcache. gid is a mess, actions-runner and dind uses gid 121 for 22.04 and 1001 for 20.04, and dind-rootless uses something else

dm3ch · 2023-08-23T01:01:53Z

Isn't it possible to workaround this issue once for all cases migrating from TLS TCP docker daemon connection to file socket in empty dir volume as it was done in #2324 ?

dm3ch · 2023-08-23T01:52:06Z

FYI: I tried to workaround this issue by using docker file socket instead of TCP and it successfully helped me with problem described in this issue without adding any changes in workflows itself .

template i used:

    # Commented to disable config override
    # containerMode:
    #   type: "dind"
    template:
      spec:
        initContainers:
        - args:
          - -r
          - -v
          - /home/runner/externals/.
          - /home/runner/tmpDir/
          command:
          - cp
          image: ghcr.io/actions/actions-runner:latest
          name: init-dind-externals
          volumeMounts:
          - mountPath: /home/runner/tmpDir
            name: dind-externals
        containers:
          - name: runner
            image: ghcr.io/actions/actions-runner:latest
            command:
              - /home/runner/run.sh
            resources:
              requests:
                cpu: 4
                memory: 8Gi
            env:
              - name: DOCKER_HOST
                value: unix:///run/docker/docker.sock
            volumeMounts:
              - name: work
                mountPath: /home/runner/_work
              - name: docker-sock
                mountPath: /run/docker
                readOnly: true
          - name: dind
            image: docker:dind
            args:
              - dockerd
              - --host=unix:///run/docker/docker.sock
              - --group=$(DOCKER_GROUP_GID)
            env:
              - name: DOCKER_GROUP_GID
                value: "123"
            securityContext:
              privileged: true
            volumeMounts:
              - mountPath: /home/runner/_work
                name: work
              - mountPath: /run/docker
                name: docker-sock
              - mountPath: /home/runner/externals
                name: dind-externals
        volumes:
          - name: work
            emptyDir: {}
          - name: docker-sock
            emptyDir: {}
          - name: dind-externals
            emptyDir: {}

So I think that maybe it worth it to reimplement fix done in #2324 (fix on controller side) for the controller of the new mode

davhdavh · 2023-08-23T02:35:58Z

I gave up totally on gha-runner-scale-set for dind or kubernetes.
I instead setup buildkitd and edited the action slightly.

- name: Set up Docker Buildx
  uses: docker/setup-buildx-action@4c0219f9ac95b02789c1075625400b2acbff50b1 # v2
  with:
    driver: remote
    endpoint: tcp://blabla:12345
    ...

I also had to manually figure out how to setup tool cache, since that is not included in any of the helms.

And tool cache is not used for the 'setup job' (The first step), so it spends 30 sec downloading the actions EVERY SINGLE TIME.

I also gave up on the pre-built images, and built my own.

I also had to give up on using a real ephemeral runner for windows builds, it took 5+ min to checkout code each time due to IO speed. Instead _work is a hostpath and I am limited to 1 runner per windows host (which is acceptable in my case). (This is not a fault of gha/arc, just windows being windows)

I also had to give up on using upload-artifact action, it is just too horrible slow. 12+min for something that took azure devops 12 seconds is insane.

and still have problem with it taking 30-60 seconds for a runner to start a job from the time of commit, even with runners idling for something to do. 15-30 sec after the github UI sees there is a job to it picking a runner, and 15-30 seconds for that runner to actually start on the job.
Plus ofcourse the 30 seconds mentioned above... So minimum time for a job to complete is almost 2 min on average.

Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>

davhdavh added enhancement New feature or request needs triage Requires review from the maintainers labels Aug 14, 2023

nikola-jokic closed this as completed Aug 18, 2023

dm3ch added a commit to dm3ch/actions-runner-controller that referenced this issue Aug 23, 2023

Fix actions#2809 : replace mTLS with unix socket

ab0502e

dm3ch mentioned this issue Aug 23, 2023

Fix #2809 : replace TCP docker daemon connection with unix socket #2833

Merged

Link- added a commit that referenced this issue Sep 22, 2023

Fix #2809 : replace TLS dockerd connection with unix socket (#2833)

16666e1

Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Working non-trivial example for gha-runner-scale-set #2809

Working non-trivial example for gha-runner-scale-set #2809

davhdavh commented Aug 14, 2023

github-actions bot commented Aug 14, 2023

hamishforbes commented Aug 15, 2023 •

edited

davhdavh commented Aug 15, 2023

nikola-jokic commented Aug 15, 2023

davhdavh commented Aug 18, 2023

dm3ch commented Aug 23, 2023

dm3ch commented Aug 23, 2023 •

edited

davhdavh commented Aug 23, 2023

Working non-trivial example for gha-runner-scale-set #2809

Working non-trivial example for gha-runner-scale-set #2809

Comments

davhdavh commented Aug 14, 2023

What would you like added?

Why is this needed?

Additional context

github-actions bot commented Aug 14, 2023

hamishforbes commented Aug 15, 2023 • edited

davhdavh commented Aug 15, 2023

nikola-jokic commented Aug 15, 2023

davhdavh commented Aug 18, 2023

dm3ch commented Aug 23, 2023

dm3ch commented Aug 23, 2023 • edited

davhdavh commented Aug 23, 2023

hamishforbes commented Aug 15, 2023 •

edited

dm3ch commented Aug 23, 2023 •

edited