Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working non-trivial example for gha-runner-scale-set #2809

Closed
davhdavh opened this issue Aug 14, 2023 · 8 comments · Fixed by #2833
Closed

Working non-trivial example for gha-runner-scale-set #2809

davhdavh opened this issue Aug 14, 2023 · 8 comments · Fixed by #2833
Labels
enhancement New feature or request needs triage Requires review from the maintainers

Comments

@davhdavh
Copy link

What would you like added?

I would really like there to be a working example that can do more than just hello world.

Why is this needed?

Because nothing really works beyond hello world

Additional context

Example Docker:

name: demo
on:
  workflow_dispatch:
jobs:
  demo:
    runs-on: demo
    steps:
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v2

dind: fails because of "ERROR: could not create a builder instance with TLS data loaded from environment."

kubernetes: fails because of "/usr/bin/tar: ../../../.docker: Cannot mkdir: Permission denied"

Example Cache volume:

name: demo
on:
  workflow_dispatch:
jobs:
  first-run:
    runs-on: demo
    steps:
    - run: echo 'demo' > $RUNNER_TOOL_CACHE/demo

  second-run:
    runs-on: demo
    needs: [first-run]
    steps:
    - run: cat $RUNNER_TOOL_CACHE/demo

default: doesn't output the expected value

Add pvc resource + extra config:

  template:
    spec:
      securityContext:
        fsGroup: 123
      containers:
      - name: runner
        image: 'ghcr.io/actions/actions-runner:latest'
        command: ["/home/runner/run.sh"]
        volumeMounts:
        - name: cache
          mountPath: /home/runner/_work/_tool
      nodeSelector:
        kubernetes.io/os: linux
      volumes:
      - name: cache
        persistentVolumeClaim:
          claimName: github-cache

and yes, you need the image and command also, or things break

dind: Works

kubernetes: $RUNNER_TOOL_CACHE is different path in running container. Fails with mountpath set to either the above or the changed path

Example windows:

default: Doesn't work with the windows dockerfile example: Cannot start and Fails with AutoscalingRunnerSet.actions.github.com "gha-windows" is invalid: [spec.template.spec.containers[0].volumeMounts: Invalid value: "null":

dind: Doesn't work with the windows dockerfile example. Cannot start and Fails to run the dind things in linux, or have a windows image for them

kubernetes: Doesn't work with the windows dockerfile example. Can start but fails on the "initialize containers" step with "
Container operations are only supported on Linux runners"

Other issues

  • Everything is marked with deletion protection, even though everything is ephemeral.
  • Helm-chart uses lookup, so requires work-around for ArgoCD
@davhdavh davhdavh added enhancement New feature or request needs triage Requires review from the maintainers labels Aug 14, 2023
@github-actions
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@hamishforbes
Copy link
Contributor

hamishforbes commented Aug 15, 2023

buildx can be fixed by creating your own context as in:
docker/setup-buildx-action#105
mumoshu/actions-runner-controller-ci@e91c8c0

this is more of a quirk of buildx and the docker dind image than anything specifically related to ARC

@davhdavh
Copy link
Author

Yes, thank you.

That's why I made it as a enhancement request for a full example, rather than set of bugs

@nikola-jokic
Copy link
Member

Hey @davhdavh,

This section in github docs provides information on more advanced configuration options, including proxy, dind, kubernetes mode etc.

@davhdavh
Copy link
Author

other weird things...
The image used for the runner is ghcr.io/actions/actions-runner, which uses /home/runner/ and docker is gid 123 and tool cache is /home/runner/_work/tool.
The images in this repo actions-runner-controller/actions-runner-controller/actions-runner, which uses /runnertmp/ and tool cache is /opt/hostedtoolcache. gid is a mess, actions-runner and dind uses gid 121 for 22.04 and 1001 for 20.04, and dind-rootless uses something else

@dm3ch
Copy link
Contributor

dm3ch commented Aug 23, 2023

Isn't it possible to workaround this issue once for all cases migrating from TLS TCP docker daemon connection to file socket in empty dir volume as it was done in #2324 ?

@dm3ch
Copy link
Contributor

dm3ch commented Aug 23, 2023

FYI: I tried to workaround this issue by using docker file socket instead of TCP and it successfully helped me with problem described in this issue without adding any changes in workflows itself .

template i used:

    # Commented to disable config override
    # containerMode:
    #   type: "dind"
    template:
      spec:
        initContainers:
        - args:
          - -r
          - -v
          - /home/runner/externals/.
          - /home/runner/tmpDir/
          command:
          - cp
          image: ghcr.io/actions/actions-runner:latest
          name: init-dind-externals
          volumeMounts:
          - mountPath: /home/runner/tmpDir
            name: dind-externals
        containers:
          - name: runner
            image: ghcr.io/actions/actions-runner:latest
            command:
              - /home/runner/run.sh
            resources:
              requests:
                cpu: 4
                memory: 8Gi
            env:
              - name: DOCKER_HOST
                value: unix:///run/docker/docker.sock
            volumeMounts:
              - name: work
                mountPath: /home/runner/_work
              - name: docker-sock
                mountPath: /run/docker
                readOnly: true
          - name: dind
            image: docker:dind
            args:
              - dockerd
              - --host=unix:///run/docker/docker.sock
              - --group=$(DOCKER_GROUP_GID)
            env:
              - name: DOCKER_GROUP_GID
                value: "123"
            securityContext:
              privileged: true
            volumeMounts:
              - mountPath: /home/runner/_work
                name: work
              - mountPath: /run/docker
                name: docker-sock
              - mountPath: /home/runner/externals
                name: dind-externals
        volumes:
          - name: work
            emptyDir: {}
          - name: docker-sock
            emptyDir: {}
          - name: dind-externals
            emptyDir: {}

So I think that maybe it worth it to reimplement fix done in #2324 (fix on controller side) for the controller of the new mode

@davhdavh
Copy link
Author

I gave up totally on gha-runner-scale-set for dind or kubernetes.
I instead setup buildkitd and edited the action slightly.

- name: Set up Docker Buildx
  uses: docker/setup-buildx-action@4c0219f9ac95b02789c1075625400b2acbff50b1 # v2
  with:
    driver: remote
    endpoint: tcp://blabla:12345
    ...

I also had to manually figure out how to setup tool cache, since that is not included in any of the helms.

And tool cache is not used for the 'setup job' (The first step), so it spends 30 sec downloading the actions EVERY SINGLE TIME.

I also gave up on the pre-built images, and built my own.

I also had to give up on using a real ephemeral runner for windows builds, it took 5+ min to checkout code each time due to IO speed. Instead _work is a hostpath and I am limited to 1 runner per windows host (which is acceptable in my case). (This is not a fault of gha/arc, just windows being windows)

I also had to give up on using upload-artifact action, it is just too horrible slow. 12+min for something that took azure devops 12 seconds is insane.

and still have problem with it taking 30-60 seconds for a runner to start a job from the time of commit, even with runners idling for something to do. 15-30 sec after the github UI sees there is a job to it picking a runner, and 15-30 seconds for that runner to actually start on the job.
Plus ofcourse the 30 seconds mentioned above... So minimum time for a job to complete is almost 2 min on average.

dm3ch added a commit to dm3ch/actions-runner-controller that referenced this issue Aug 23, 2023
Link- added a commit that referenced this issue Sep 22, 2023
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs triage Requires review from the maintainers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants