Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate the CI workflows to the new runners on EKS cluster #33371

Closed

Conversation

hussein-awala
Copy link
Member

This PR updates the labels used in runs-on to migrate the workflows to the new GHA runners running on Airflow EKS cluster, and managed by an Actions Runner Controller (ARC).

In the new cluster, we can choose between 3 different sizes for the self hosted runners (the instance type could be change):

  • small -> t4g-medium (2 cpus, 4Gb of ram) -> runs-on: ${{ fromJson(needs.build-info.outputs.runs-on-small) }}
  • medium -> t4g-xlarge (4 cpus, 16Gb of ram) -> runs-on: ${{ fromJson(needs.build-info.outputs.runs-on-medium) }}
  • large -> t4g-2xlarge (8 cpus, 32 Gb of ram) -> runs-on: ${{ fromJson(needs.build-info.outputs.runs-on-large) }}

And we can add new runner sets with different characteristics in the future, for ex create a specific docker image which contains some pre-installed packages and use it in specific jobs, or maybe create a new runner image from Airflow CI image and update it via the CI after updating the breeze or Airflow dependencies. (to be discussed later).

Also, this PR removes all the steps which setup python using actions/setup-python@v4, because of a bug in the action (actions/setup-python#705). As a workaround, I pre-installed python in the runner docker image, we can revert this change after fixing the bug, or keep it based on the performance of the runner.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@@ -115,6 +115,15 @@ jobs:
helm-test-packages: ${{ steps.selective-checks.outputs.helm-test-packages }}
debug-resources: ${{ steps.selective-checks.outputs.debug-resources }}
runs-on: ${{ steps.selective-checks.outputs.runs-on }}
runs-on-small: >-
${{ steps.selective-checks.outputs.runs-on == 'self-hosted'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have bad experiences with conditions embedded using the expreession language. It's prone to errors. Would not it be better to generate runs-on-small/medium/large as direct output of selective checks?

That has the potential that we can also base it on labels later when we connect ARC - then we could for example apply an "ARC" label to PR to test it on the ARC setup.

This has the nice property that selective checks are also very well covered by unit tests so we can easily test any logic we will apply there.

Copy link
Member

@potiuk potiuk Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Maybe the output should be named "selective-checks.outputs.small-runner", "selective-checks.outputs.medium-runner" etc.

That will be a nice "abstraction layer" (selective checks based on labels will choose what the "medium" runner really is.

That will also read better:

runs-on: "${{ fromJson(needs.build-info.outputs.medium-runner) }}"

@@ -25,12 +25,6 @@ outputs:
runs:
using: "composite"
steps:
- name: "Setup python"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid this will not work if we want to merge it and would like to freely switch between the "public", "AWS" and "ARC" runners.

I would prefer to have the case where we have a way to easily switch to choose which runners we are using on a PR-by PR basis (i.e. for example apply label to make specific PR to run on ARC - similalry as we do with "use public runners") - at least for a few week transition period.

I think what can be done is that this action can be conditional based on (yes, you guessed it) selective checks output. I think when you are calling a composite action, you are able to pass it a parameter from the calling action, so we could pass "use-arc" or simlar selective check output and based on that we could either run the setup python action or not.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Sep 29, 2023
@github-actions github-actions bot closed this Oct 4, 2023
@potiuk potiuk reopened this Oct 8, 2023
@potiuk
Copy link
Member

potiuk commented Oct 8, 2023

We might want to come back to this change. It was stale for a while but probably worth keeping it until we get to changing our CI env

@hussein-awala
Copy link
Member Author

We might want to come back to this change. It was stale for a while but probably worth keeping it until we get to changing our CI env

Recently I did a similar migration in a personal project, I will come back to this one in two weeks with a much better version.

@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 9, 2023
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Dec 28, 2023
@github-actions github-actions bot closed this Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants