Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update local tests and CI tests to use az login token or managed identity, not service principal #4003

Merged
merged 8 commits into from
May 16, 2024

Conversation

matthchr
Copy link
Member

@matthchr matthchr commented May 8, 2024

  • Update Taskfile to use az cli token rather than SP
    • Update az login step to use an interactive login, rather than an SP.
      Interactive login is good for 12 hours.
    • Remove SP-based kind cluster helpers and replace usages with
      workload-identity based.
    • Refactor make-mi-fic.sh to Python. This is required to manage the
      more limited permissions that an interactive token has with repsect
      to calling 'az role assignment list'.
  • HTTP Recording tests now use CLI or ServicePrincipal credentials
    • CLI credentials are preferred over ServicePrincipal.
  • Add unit tests with mocked azidentity calls
    • This replaces actual live testing of the various
      authentication modes. We trust that, if we call the azidentity SDK with
      the correct parameters, it does the correct thing.
  • Fix issue with credential format documentation
    • We documented the AUTH_MODE: "workloadidentity" for workload identity,
      but in actuality the code doesn't require this be set for WI on any
      secret but the global secret.
  • Remove use of MULTITENANT secret
    • Now using a dynamically created UserAssignedIdentity instead of a
      hardcoded static multitenant secret.
    • There are two advantages here:
      1. Less setup to run a test on a new subscription because fewer inputs
        to the CI job.
      2. No static secrets that can possibly be leaked or compromised.
  • Use MSI for CI instead of service principal
    • Updated cipool BICEP to provision a managed identity and give it the
      required roles.
    • Updated Taskfile to support MSI-based az login when running in the
      context of a GitHub action.
    • Removed all reference to AZURE_CLIENT_ID and AZURE_CLIENT_SECRET from
      GitHub workflows.
  • Update documentation to be more clear about authentication mode
    • Be explicit about which mode we're using in the various installation
      instructions, and provide links to the detailed authentication
      documentation so that users are more easily exposed to other
      authentication options (formats and scopes).

Closes #3930

If applicable:

  • this PR contains documentation
  • this PR contains tests
  • this PR contains YAML Samples

Taskfile.yml Show resolved Hide resolved
Taskfile.yml Show resolved Hide resolved
Taskfile.yml Outdated Show resolved Hide resolved
Taskfile.yml Show resolved Hide resolved
Taskfile.yml Show resolved Hide resolved
@matthchr matthchr force-pushed the feature/ci-refactoring-azure branch from a039421 to 14f3a17 Compare May 14, 2024 18:45
@@ -802,6 +823,8 @@ tasks:
- az-login
cmds:
- "az group delete --name {{.RESOURCE_GROUP_NAME}} -y"
- "{{.SCRIPTS_ROOT}}/delete-role-assignment.sh -d {{.AKS_WORKLOAD_IDENTITY_PATH}}"
- "rm -rf {{.AKS_WORKLOAD_IDENTITY_PATH}}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need -rf here? The combination can be dangerous if AKS_WORKLOAD_IDENTITY_PATH happens to get an odd value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately yes I think we do need -rf here. -r because there's 2 levels of subfolder to remove and -f because otherwise it'll prompt which doesn't work in unattended scenarios.

I did some digging though and realized that now rm's default is:

  --preserve-root[=all]  do not remove '/' (default);
                         with 'all', reject any command line argument
                         on a separate device from its parent

Which means actually, you can't blat your whole system with this anymore so it's much safer than it used to be. You can still accidentally delete directories you didn't mean to, but in this case I think we're pretty safe because we've hardcoded it to a specific variable .AKS_WORKLOAD_IDENTITY_PATH and this path won't ever be unset unless we delete that variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if the variable is just deleted, we'll get this:

rm: missing operand
Try 'rm --help' for more information.

@matthchr matthchr added this pull request to the merge queue May 15, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 15, 2024
@matthchr matthchr added this pull request to the merge queue May 15, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 15, 2024
@matthchr matthchr added this pull request to the merge queue May 15, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 16, 2024
 - Update az login step to use an interactive login, rather than an SP.
   Interactive login is good for 12 hours.
 - Remove SP-based kind cluster helpers and replace usages with
   workload-identity based.
 - Refactor make-mi-fic.sh to Python. This is required to manage the
   more limited permissions that an interactive token has with repsect
   to calling 'az role assignment list'.
Tested to ensure that az login allows tests to be re-recorded
The idea is that this will replace actual live testing of the various
authentication modes. We trust that, if we call the azidentity SDK with
the correct parameters, it does the correct thing.
We documented the AUTH_MODE: "workloadidentity" for workload identity,
but in actuality the code doesn't require this be set for WI on any
secret but the global secret.
Now using a dynamically created UserAssignedIdentity instead of a
hardcoded static multitenant secret.

There are two advantages here:
1. Less setup to run a test on a new subscription because fewer inputs
   to the CI job.
2. No static secrets that can possibly be leaked or compromised.
- Updated cipool BICEP to provision a managed identity and give it the
  required roles.
- Updated Taskfile to support MSI-based az login when running in the
  context of a GitHub action.
- Removed all reference to AZURE_CLIENT_ID and AZURE_CLIENT_SECRET from
  GitHub workflows.
Be explicit about which mode we're using in the various installation
instructions, and provide links to the detailed authentication
documentation so that users are more easily exposed to other
authentication options (formats and scopes).

Also update devcontainer raw cmdline to map azcli token into container
to reduce number of required az logins
@matthchr matthchr force-pushed the feature/ci-refactoring-azure branch from 14f3a17 to fa04f79 Compare May 16, 2024 00:52
@matthchr matthchr added this pull request to the merge queue May 16, 2024
Merged via the queue into main with commit c2e052e May 16, 2024
8 checks passed
@matthchr matthchr deleted the feature/ci-refactoring-azure branch May 16, 2024 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready for Release
Development

Successfully merging this pull request may close these issues.

Use Federated Identity Credentials rather than SP for GitHub Actions
3 participants