Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AKS Policy Controller Digest & Authentication Error #1391

Closed
ejohn20 opened this issue Apr 29, 2024 · 2 comments
Closed

AKS Policy Controller Digest & Authentication Error #1391

ejohn20 opened this issue Apr 29, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@ejohn20
Copy link

ejohn20 commented Apr 29, 2024

Description

Hey folks, I'm struggling to get this working in an Azure Kubernetes Service cluster. Here's what I've done up to this point, and would love any info you can provide on what I've done wrong, where to look for troubleshooting, or possible repos to PR that patch to...

  1. Build image is being created in a GL CI pipeline using a key stored in the Vault. All goes well....
cosign: A tool for Container Signing, Verification and Storage in an OCI registry.
GitVersion:    v2.2.0
GitCommit:     546f1c5b91ef58d6b034a402d0211d980184a0e5
GitTreeState:  clean
BuildDate:     2023-08-31T18:52:52Z
GoVersion:     go1.21.0
Compiler:      gc
Platform:      linux/amd64

tlog entry created with index: 89560215
Pushing signature to: dminfraacrrmz5nifl.azurecr.io/dm/api
  1. Manually verifying the image signature works fine.
$ echo ${IMAGE_NAME}@${MANIFEST_DIGEST}
dminfraacrrmz5nifl.azurecr.io/dm/api:v47@sha256:d211818dca7fa35514bcbc9a280bbbb70a1e136912d2aea9723d946219172865
$ cosign verify --key cosign.pub "${IMAGE_NAME}@${MANIFEST_DIGEST}"

Verification for dminfraacrrmz5nifl.azurecr.io/dm/api@sha256:d211818dca7fa35514bcbc9a280bbbb70a1e136912d2aea9723d946219172865 --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - Existence of the claims in the transparency log was verified offline
  - The signatures were verified against the specified public key
  1. Next, we set up the policy controller in AKS using helm:
$ helm upgrade --install policy-controller sigstore/policy-controller --version 0.6.8 \
  --namespace cosign-system --create-namespace --wait --timeout "5m31s" \
  --set-json webhook.configData="{\"no-match-policy\": \"warn\"}" \
  --set webhook.serviceAccount.name="policy-controller" \
  --set-json webhook.serviceAccount.annotations="{\"azure.workload.identity/client-id\": \"${COSIGN_SERVICE_PRINCIPAL_CLIENT_ID}\", \"azure.workload.identity/tenant-id\": \"${ARM_TENANT_ID}\"}" \
  --set-json webhook.customLabels="{\"azure.workload.identity/use\": \"'true'\"}"

The custom label and annotations maps the policy controller pod to an Azure Service Principal with AcrPull permissions. We can see that the service account is created with the annotations.

$ k describe sa -n cosign-system policy-controller

Name:                policy-controller
Namespace:           cosign-system
...
Annotations: azure.workload.identity/client-id: 1111111111
                     azure.workload.identity/tenant-id: 22222222
                     meta.helm.sh/release-name: policy-controller
                     meta.helm.sh/release-namespace: cosign-system

The pod is created correctly with the environment variables and volume mount:

$ k describe pod -n cosign-system policy-controller-webhook-79dc89496f-h4cqs

Name:             policy-controller-webhook-79dc89496f-h4cqs
Namespace:        cosign-system
Priority:         0
Service Account:  policy-controller
...
Labels:           app.kubernetes.io/instance=policy-controller
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=policy-controller
                  app.kubernetes.io/version=0.8.2
                  azure.workload.identity/use=true
...
policy-controller-webhook:
    Container ID:   containerd://f48df9f9fdfab6b824643e9d1466938ee5f9833dfa6cdd74e6bb58df8c268e1c
    Image:          ghcr.io/sigstore/policy-controller/policy-controller@sha256:f291fce5b9c1a69ba54990eda7e0fe4114043b1afefb0f4ee3e6f84ec9ef1605
    Environment:
      SYSTEM_NAMESPACE:            cosign-system (v1:metadata.namespace)
      CONFIG_LOGGING_NAME:         policy-controller-webhook-logging
      CONFIG_OBSERVABILITY_NAME:   policy-controller-webhook-observability
      METRICS_DOMAIN:              sigstore.dev/policy
      WEBHOOK_NAME:                webhook
      HOME:                        /home/nonroot
      AZURE_CLIENT_ID:             111111111
      AZURE_TENANT_ID:           22222222
      AZURE_FEDERATED_TOKEN_FILE:  /var/run/secrets/azure/tokens/azure-identity-token
      AZURE_AUTHORITY_HOST:        https://login.microsoftonline.com/
  1. Applying the following policy to the cluster
$ kubectl apply -f ./assets/policy/cosign-cluster-image-policy.yaml

apiVersion: policy.sigstore.dev/v1beta1
kind: ClusterImagePolicy
metadata:
  name: trust-signed-dm-images
spec:
  mode: warn
  images:
    - glob: "dminfra*.azurecr.io/dm/**"
  authorities:
    - key:
        data: |
          -----BEGIN PUBLIC KEY-----
          MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEp192D+udFjb3PkOCFKHGrASDeoaZ
          Fhi60FBb+UOqlK6iiUynQ7I81LWEkcu9jU5fnbwdxTIDCSA0NOySdoPtsQ==
          -----END PUBLIC KEY-----
        hashAlgorithm: sha256
  1. Creating a namespace and applying the restriction:
$ kubectl create namespace sig-test
$ kubectl label ns sig-test policy.sigstore.dev/include=true

Failures

Running the following command fails in AKS, but the command DOES work in an EKS cluster. How are these different? Is this failure really just the same authentication failure in the second command?

$ echo ${IMAGE_NAME}
dminfraacrrmz5nifl.azurecr.io/dm/api:v47
$ k run -n sig-test dm-web-test --image ${IMAGE_NAME}

Error from server (BadRequest): admission webhook "policy.sigstore.dev" denied the request: validation failed: invalid value: dminfraacrrmz5nifl.azurecr.io/dm/api:v47 must be an image digest: spec.containers[0].image

Updating the image to include the digest also fails with an UNAUTHORIZED error message when I'd expect the workload identity environment variables to enable authentication to the private registry.

 $ echo ${IMAGE_NAME}@${MANIFEST_DIGEST}
dminfraacrrmz5nifl.azurecr.io/dm/api:v47@sha256:d211818dca7fa35514bcbc9a280bbbb70a1e136912d2aea9723d946219172865
$ k run -n sig-test dm-web-test --image ${IMAGE_NAME}@${MANIFEST_DIGEST}

Warning: failed policy: trust-signed-dm-images: spec.containers[0].image
Warning: dminfraacrrmz5nifl.azurecr.io/dm/api:v47@sha256:d211818dca7fa35514bcbc9a280bbbb70a1e136912d2aea9723d946219172865 signature key validation failed for authority authority-0 for dminfraacrrmz5nifl.azurecr.io/dm/api@sha256:d211818dca7fa35514bcbc9a280bbbb70a1e136912d2aea9723d946219172865: GET https://dminfraacrrmz5nifl.azurecr.io/oauth2/token?scope=repository%3Adm%2Fapi%3Apull&service=dminfraacrrmz5nifl.azurecr.io: UNAUTHORIZED: authentication required, visit https://aka.ms/acr/authorization for more information.
@ejohn20 ejohn20 added the bug Something isn't working label Apr 29, 2024
@vaikas
Copy link
Collaborator

vaikas commented Apr 30, 2024

I've not run this on Azure, so can't say for sure. But I'd try to rule out the auth, by creating a simple test image and require signing. Easy way to test would be to just create a simple image that doesn't require auth maybe using something like ttl.sh here (in step 3):
https://www.chainguard.dev/unchained/policy-controller-101

And see if it's related to auth. Ofc if you have another way to run a container that doesn't require auth, I'd try that.

@ejohn20
Copy link
Author

ejohn20 commented May 23, 2024

The acr auth library upgrade is needed, as the acr help library is very dated and uses end of life libraries from MS. (#1424). I also never got this working using workload identity, so I think the upgrade will help with that.

However, I was able to get this working using the managed identity on the worker nodes and granting the kubelet identity permission to the ACR repository:

resource "azurerm_kubernetes_cluster" "aks" {
  ...
  identity {
    type = "SystemAssigned"
  }
  ...
}

resource "azurerm_role_assignment" "acr" {
  principal_id  = azurerm_kubernetes_cluster.aks.kubelet_identity[0].object_id
  role_definition_name  = "AcrPull"
  scope  = azurerm_container_registry.acr.id
  skip_service_principal_aad_check = true
}

Then, installing the policy controller setting the webhook's AZURE_CLIENT_ID to the kubelet's client id:

KUBELET_CLIENT_ID=$(az aks show --resource-group MY-AKS-RG --name "MY-AKS-CLUSTER" --only-show-errors | jq -r '.identityProfile.kubeletidentity.clientId')

helm upgrade --install policy-controller sigstore/policy-controller --version 0.6.3 \
  --namespace cosign-system --create-namespace --wait --timeout "5m31s" \
 --set-json webhook.configData='{"no-match-policy": "warn"}' \
 --set webhook.env.AZURE_CLIENT_ID=$KUBELET_CLIENT_ID

@ejohn20 ejohn20 closed this as completed May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants