Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Cluster)Issuer with vault auth and serviceAccountRef is not accepted by cluster due to audience #6150

Open
Garagoth opened this issue Jun 14, 2023 · 48 comments · May be fixed by cert-manager/website#1397

Comments

@Garagoth
Copy link

Garagoth commented Jun 14, 2023

I have kubernetes 1.25, cert-manager 1.12.1, external Vault 1.12 and I am trying to use new feature from cert-manager 1.12: kubernetes auth in Vault without reviewer token.
I followed documentation how to set this up (https://cert-manager.io/docs/configuration/vault/#secretless-authentication-with-a-service-account).

I have ClusterIssuer named vault-issuer and service account name created named cert-manager-vault.
I created all RoleBindings just as in documentation.
Also, I added ClusterRole auth-delegator to cert-manager-vault so it can work without reviewer token configured in Vault. (https://developer.hashicorp.com/vault/docs/auth/kubernetes#use-the-vault-client-s-jwt-as-the-reviewer-jwt)

I am having following errors:

ClusterIssuer: Error initializing issuer: while requesting a Vault token using the Kubernetes auth: error calling Vault server: Error making API request. Code: 403. Errors: * permission denied

So, I captured request to Vault, decoded token, looks fine, maybe except "aud" field, but still this is fine according to docs.
Looks like Vault is trying to call back kubernetes API, lets see:

kube-api: [authentication.go:63] "Unable to authenticate the request" err="[invalid bearer token, token audiences ["vault://vault-issuer"] is invalid for the target audiences ["https://kubernetes.default.svc.cluster.my.domain"]]"

And here I am stuck. How to fix this?
By the way, I got same setup working for external-secrets project (https://external-secrets.io/v0.8.3/provider/hashicorp-vault/#kubernetes-authentication) - but there in jwt "aud" field is set to "https://kubernetes.default.svc.cluster.my.domain/" so kube-api accepts it and validates.

I think both Vault validation of audience and API validation could be satisfied if there were two audiences in JWT token:

JWT tokens have "aud" as array, so both would fit.
kube-api docs state that only one of audiences must match cluster audience to be OK.
Not sure how Vault validates this though, but I suspect that only one from list is sufficient as well to pass validation.

@maelvls would this work?

My vault auth config:

    disable_iss_validation:        True
    disable_local_ca_jwt:        False
    issuer:
    kubernetes_ca_cert:
        -----BEGIN CERTIFICATE-----
        (...)
        -----END CERTIFICATE-----
    kubernetes_host:        https://kubernetes.default.svc.cluster.my.domain
    pem_keys: []

Cluster issuer config:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: vault-issuer
  namespace: cert-manager
spec:
  vault:
    server: http://active.vault.service.my.domain:8200
    caBundle: 
    path: pki_int/sign/platform.role
    auth:
      # https://cert-manager.io/docs/configuration/vault/#secretless-authentication-with-a-service-account
      kubernetes:
        role: "k8s-cert-manager-role"
        mountPath: "/v1/auth/k8s-sd"
        serviceAccountRef:
          name: "cert-manager-vault"
@inteon
Copy link
Member

inteon commented Jun 14, 2023

@maelvls

@inteon
Copy link
Member

inteon commented Jun 15, 2023

@Garagoth could you provide some information explaining why you cannot use https://developer.hashicorp.com/vault/docs/auth/kubernetes#use-local-service-account-token-as-the-reviewer-jwt?

@Garagoth
Copy link
Author

Because my Vault is NOT running as a pod in my cluster, it is an external installation.

@duizabojul
Copy link

Same issue here, I'm running a vault in a cluster and i try to configure an issuer in another cluster. It is working if I run a fork of cert-manager where I add https://kubernetes.default.svc in audience array.

@Karandash8
Copy link

Was fighting with it for the last few days. You can set an additional audience on your kube-apserver. Something like
--api-audiences=https://kubernetes.default.svc.cluster.my.domain,vault://vault-issuer
then it works. It is super inflexible (forget about wildcards, etc., only the exact match) but at least it works.

@Garagoth
Copy link
Author

Garagoth commented Jul 3, 2023

Yes, you can set api audiences - but adding kube-api command line parameters and restarting api every time someone adds new Issuer is, as @Karandash8 wrote, "super inflexible".

@megakid
Copy link

megakid commented Jul 18, 2023

I am also struggling with this. Is there a workaround other than customizing my kube-api command?

In the meantime I've reverted to the long lived token approach and not specifying audience on the Vault side which works fine.

@tetofonta
Copy link

In my opinion the solution is that cert-manager may add one other audience which will remain static in every token in order to configure it in the api audiences like this answer.
e.g. cert manager may use two audiences like vault://<namespace>/<issuer> and cert-manager.io/jwt where the second one is in the --api-audiences and does not change for other issuers.

I don't know if this is possible, of course this my opinion and it's just an idea, but I think it may solve some problems. I'd like to hear some security considerations about this. When i'll have some time, I'll start preparing a PR.

@inteon inteon added this to the 1.13 milestone Aug 16, 2023
@inteon
Copy link
Member

inteon commented Aug 25, 2023

I'm moving this issue to the milestone of 1.14, we haven't been able to spend enough time investigating this.

@inteon inteon modified the milestones: 1.13, 1.14 Aug 25, 2023
@maelvls
Copy link
Member

maelvls commented Sep 25, 2023

It appears that this issue arises from the fact that I haven't accounted for one of the ways to configure the field token_reviewer_jwt when configuring Vault's Kubernetes Auth. This option is the one used when configuring Vault:

vault write auth/kubernetes/config \
    token_reviewer_jwt="<your reviewer service account JWT>" \
    kubernetes_host=https://192.168.99.100:<your TCP port or leave it blank for 443> \
    kubernetes_ca_cert=@ca.crt

There are three scenarios with regards to the use of token_reviewer_jwt:

  1. ✅ In-Cluster Vault with no token_reviewer_jwt: In this situation, Vault picks up the pod's service account token, which comes with the API server's audience. Vault can use that token to authenticate with the Kubernetes API server when making a TokenReview request.

  2. ✅ Out-of-Cluster Vault with token_reviewer_jwt: In this situation, the token is manually created using the old service account token mechanism. Unlike the newer "bound" service account tokens, old service account tokens never expire. Depending on how your Kubernetes API server is set up, the audience of that token should be something like https://kubernetes.default.svc.cluster.local. This way, Vault can authenticate with the API server when making a TokenReview request.

  3. ❌ Out-of-Cluster Vault with no token_reviewer_jwt: ("secretless") In this case, the token meant to be reviewed in the TokenReview call is also used to authenticate the request. This is where things go wrong: since cert-manager creates a service account token with a calculated audience (e.g., vault://default/issuer-1), Vault fails to authenticate with the API server. I have reproduced the issue, instructions are available in https://hackmd.io/@maelvls/S1tYFmegp.

This is an oversight on my part; I didn't consider scenario (3) when I implemented #5502. I'm not sure how I would have made it work, though, since allowing the user to choose a custom audience, such as https://kubernetes.default.svc.cluster.local, would expose cert-manager to risks A and B. After a discussion with @SpectralHiss, I realize that I may have been overly conservative in the risk assessment: the Issuer object might not be the attack vector I thought it would be.

For users of cert-manager 1.12 and 1.13, I would recommend defaulting to scenario (2) by generating an old "static" Kubernetes token and passing it to token_reviewer_jwt. It's not ideal because it defeats the purpose of not having any statically generated tokens, which isn't great from a security perspective, but it's the best option for now.

Let's explore how we can address this in a future version of cert-manager.

@Garagoth
Copy link
Author

It should be possible to create token with 2 audiences: one that you create now (vault://...) and additional one that will satisfy review api (same that is needed in point 2).
Such token would be validated in 2 places then: in Vault (service account+namespace+audience1), kube-api (audience2 and role bindings from RBAC for service account).
Seems pretty tight in my opinion. Your thoughts? Feasible?

@Ruchira-R
Copy link

Ruchira-R commented Oct 17, 2023

I am currently trying to make this work (Using secretless SAs), but the issue that I'm running into is :

Error from server (BadRequest): error when creating "STDIN": ClusterIssuer in versiodev)n "v1" cannot be handled as a ClusterIssuer: strict decoding error: unknown field "spec.vault.auth.kubernetes.serviceAccountRef"
The cert-manager version is : 1.13.0
Vault version is : 0.23.0
Kubernetes version : 1.24

My use case is Out-of-Cluster Vault with no token_reviewer_jwt,
If I can't use this, can it be done with OIDC/JWT Auth Methods? I think we will still require a static token in that case too, no? Are those methods supported

@SpectralHiss
Copy link
Contributor

SpectralHiss commented Oct 26, 2023

@Ruchira-R

I am currently trying to make this work (Using secretless SAs), but the issue that I'm running into is :

Error from server (BadRequest): error when creating "STDIN": ClusterIssuer in versiodev)n "v1" cannot be handled as a ClusterIssuer: strict decoding error: unknown field "spec.vault.auth.kubernetes.serviceAccountRef" The cert-manager version is : 1.13.0 Vault version is : 0.23.0 Kubernetes version : 1.24

My use case is Out-of-Cluster Vault with no token_reviewer_jwt, If I can't use this, can it be done with OIDC/JWT Auth Methods? I think we will still require a static token in that case too, no? Are those methods supported

I think simply using the JWT method wouldn't require a static token and would hence be a good and simple workaround curious to see if the community agrees.

The other option would be to still use the Kubernetes auth method but create a service account with tokenReview (perhaps through binding to system:auth-delegator clusterrole) for each target cluster and generate a token for them ** without requiring a Kubernetes Secret to be created for them**.
This can be done through the Kubernetes api or using kubectl:
kubectl create token serviceaccount
and feeding that as token_reviewer_jwt in the Kubernetes auth setup in the external Vault.

After discussion on the daily community standup we convened that given these workarounds we will not be prioritising solving the pattern where the referenced "serviceAccountRef" sa is also used for the TokenReview self review api operation. At least not for the next release.
We will document all the options more clearly on the website as it's a confusing topic!

@lknite
Copy link

lknite commented Dec 21, 2023

Perhaps a note could be added in the 'secretless' section of the documentation to indicate the scenario where it will not currently work along with the recommended technique until its ready. -- Help save folks hours of work and typing.

And/ or a more verbose error message that direct folks in the right direction than what currently exists:

Error initializing issuer: while requesting a Vault token using the Kubernetes auth: error calling Vault server:
Error making API request. URL: POST http://vault.vault.svc:8200/v1/auth/kubernetes/login Code: 403.
Errors: * permission denied

@denniskniep
Copy link

denniskniep commented Dec 27, 2023

It might be also an option to add another way for setting the auth token, by using the k8s built in mechanism "service account token projection" (see also here). The token(s) would then be read from a specified path.
Currently not possible to define multiple audiences. Therefore I created a ticket here.

@SpectralHiss
Copy link
Contributor

Perhaps a note could be added in the 'secretless' section of the documentation to indicate the scenario where it will not currently work along with the recommended technique until its ready. -- Help save folks hours of work and typing.

And/ or a more verbose error message that direct folks in the right direction than what currently exists:

Error initializing issuer: while requesting a Vault token using the Kubernetes auth: error calling Vault server:
Error making API request. URL: POST http://vault.vault.svc:8200/v1/auth/kubernetes/login Code: 403.
Errors: * permission denied

Hey @lknite, sorry this was not clear, we would appreciate any comments on this PR documenting the various scenarios and the right way to auth to Vault:
https://github.com/cert-manager/website/pull/1397/files
You can see the relevant preview page here: https://deploy-preview-1397--cert-manager-website.netlify.app/docs/configuration/vault/

@andrey-dubnik
Copy link
Contributor

andrey-dubnik commented Jan 24, 2024

It would help if audiences can be provided as part of the kubernetes auth spec. E.g. hashicorp vault operator allows supplying the info

@kchervonets
Copy link

@SpectralHiss Hi, it is not clear to me from your doc what is the bound_audiences parameter has to be, right now I'm getting an error with audience miss mutch. I tried https://kubernetes.default.svc and JWT issuer url, no luck. Any advice?

@SpectralHiss
Copy link
Contributor

Hi @kchervonets , from what i can see you're configuring Vault JWT auth method audience check with issuer urls instead of audiences.
By default, if your namespace is sandbox and your issuer is called vault-issuer the current behaviour is to default the audience to:
vault://sandbox/vault-issuer

Be aware that the fact that this is "static" is the reason why we can't use the TokenReview/Kubernetes method but we're improving it in the next release: cf #6718

Try that and let us know if you still see the same error.

@tman5
Copy link

tman5 commented May 1, 2024

I'd like to report this issue with cert-manager as well with external vault clusters using passwordless authentication. Currently I think the issue is in the jwt token generated by cert-manager and presented to vault it's showing the internal cluster URL:
"iss": "https://kubernetes.default.svc.cluster.local", and not the external API URL. The kubernetes auth method in vault is only configured for the external URL of the cluster

@javierguzman
Copy link

Hello all,

Is there a way to make this work then? I am facing the same issue as @Garagoth

I am using Hashi Vault Cloud solution and external-secrets auth is working ok. I have created a gist with my Terraform code:

https://gist.github.com/javierguzman/a51f131e526e7e91dfa0bbe4221c9f45

I have tried several combinations/options but I do not really manage to make this work.

I would really appreciate any hint about this.

Thank you in advance

@andrey-dubnik
Copy link
Contributor

andrey-dubnik commented May 10, 2024

v15-alpha version contains an enhancement which allows you to provide additional audiences to the service account auth. It works for us.

documentation

@javierguzman
Copy link

javierguzman commented May 13, 2024

Hello @andrey-dubnik

Thank you very much for your quick response. So does audiences field go under serviceAccountRef? Or outside? Because the link to the docs show underneath but I have taken a look at the code and it seems to be outside serviceAccountRef.

In any case it has not worked. Is there anything else I need to set up? I'm setting the audience role to vault://service_account_namespace/service_account_name and the kubernetes auth in hashi is configured like this:

resource "vault_kubernetes_auth_backend_config" "connect_sa_with_vault" {
  backend                = vault_auth_backend.kubernetes.path
  kubernetes_host        = data.aws_eks_cluster.eks_cluster.endpoint
  kubernetes_ca_cert     = kubernetes_secret_v1.issuer_sa_secret.data["ca.crt"]
  issuer = "https://oidc.eks.my-region.amazonaws.com/id/randomID"
}

My cluster is running on EKS (AWS), I got the oidc link running:
kubectl get --raw /.well-known/openid-configuration | jq .issuer -r

I have not set the token_reviewer_jwt as per the docs. Am I missing something else?

Again thank you for your help, I really appreciate it as I am blocked with this.

@andrey-dubnik
Copy link
Contributor

andrey-dubnik commented May 13, 2024

@javierguzman

So does audiences field go under serviceAccountRef

It goes under, there was another PR which moved it under as it made more sense.

I guess you are also setting up a role with aud vault://service_account_namespace/service_account_name in a following way:

resource "vault_kubernetes_auth_backend_role" "example" {
 ...
  audience                         = "vault://service_account_namespace/service_account_name"
}

If so then you can add the aud you are expecting in a role to the audiences of the SA, which should be enough for the aud to be added to the issued token and validated by Vault as trusted.

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: vault-issuer
  namespace: sandbox
spec:
  vault:
    path: pki_int/sign/example-dot-com
    server: https://vault.local
    auth:
      kubernetes:
        role: my-app-1
        mountPath: /v1/auth/kubernetes
        serviceAccountRef:
          name: vault-issuer
          audiences: [vault://service_account_namespace/service_account_name]

@javierguzman
Copy link

@andrey-dubnik Thank you again for the help. I have tried several combinations, including the one you have mentioned, and no success. I have placed my most updated code here https://gist.github.com/javierguzman/6a97f0d5d19424292e32ec815cfab7dd

So you can see everything I do. The controller is always throwing the same error:

I0513 13:30:42.338574       1 sync.go:62] "Error initializing issuer: while requesting a Vault token using the Kubernetes auth: error calling Vault server: Error making API request.\n\nURL: POST https://public-vault-blabla:8200/v1/auth/kubernetes-pki/login\nCode: 403. Errors:\n\n* permission denied" logger="cert-manager.controller.issuers" resource_name="playground-issuer" resource_namespace="playground-issuer" resource_kind="Issuer" resource_version="v1"
E0513 13:30:42.338871       1 controller.go:167] "re-queuing item due to error processing" err=<
	while requesting a Vault token using the Kubernetes auth: error calling Vault server: Error making API request.

	URL: POST https://public-vault-blabla.hashicorp.cloud:8200/v1/auth/kubernetes-pki/login
	Code: 403. Errors:

	* permission denied
 > logger="cert-manager.controller.issuers" key="playground-issuer/playground-issuer"

I have also enabled cloud watch (EKS observability stuff) in the cluster and I can see an error that might be related, though I'm 100% sure:

E0513 12:53:08.024581 11 authentication.go:73] "Unable to authenticate the request" err="[invalid bearer token, unknown]"

Interestingly, I use Kubernetes auth with external secrets operator and it works.

@andrey-dubnik
Copy link
Contributor

@andrey-dubnik Thank you again for the help. I have tried several combinations, including the one you have mentioned, and no success. I have placed my most updated code here https://gist.github.com/javierguzman/6a97f0d5d19424292e32ec815cfab7dd

So you can see everything I do. The controller is always throwing the same error:

I0513 13:30:42.338574       1 sync.go:62] "Error initializing issuer: while requesting a Vault token using the Kubernetes auth: error calling Vault server: Error making API request.\n\nURL: POST https://public-vault-blabla:8200/v1/auth/kubernetes-pki/login\nCode: 403. Errors:\n\n* permission denied" logger="cert-manager.controller.issuers" resource_name="playground-issuer" resource_namespace="playground-issuer" resource_kind="Issuer" resource_version="v1"
E0513 13:30:42.338871       1 controller.go:167] "re-queuing item due to error processing" err=<
	while requesting a Vault token using the Kubernetes auth: error calling Vault server: Error making API request.

	URL: POST https://public-vault-blabla.hashicorp.cloud:8200/v1/auth/kubernetes-pki/login
	Code: 403. Errors:

	* permission denied
 > logger="cert-manager.controller.issuers" key="playground-issuer/playground-issuer"

I have also enabled cloud watch (EKS observability stuff) in the cluster and I can see an error that might be related, though I'm 100% sure:

E0513 12:53:08.024581 11 authentication.go:73] "Unable to authenticate the request" err="[invalid bearer token, unknown]"

Interestingly, I use Kubernetes auth with external secrets operator and it works.

I can see you have audiences in resource "vault_kubernetes_auth_backend_role" "sa_issuer_role" set to "vault://${kubernetes_namespace_v1.issuer_namespace.metadata[0].name}/${kubernetes_service_account_v1.issuer_sa.metadata[0].name}" - ref

and in your Issuer to audiences: ["vault://${var.environment}-issuer/${var.environment}-issuer-sa"] - ref

This should be a matching value, otherwise token validation won't pass

also 1 important moment - your Kubernetes OIDC endpoint should be publicly accessible by Vault server as Vault auth backend is going to call it while validating the token to make sure it is not revoked and is valid.

@javierguzman
Copy link

Even though audiences seem different they end up being the same when filled by Terraform. Also, I tried writing exactly the same but didn't make any difference.

About OIDC endpoint, from my local machine I have done a ping command and indeed the ping command says "Unknown host" So I guess my endpoint is not public. Could you point me out how to make it public please? I do not see any option in the Identity Provider from AWS.

Also, it's weird but I have given a try to the tokenSecretRef instead of the service accounts and it didn't work either.

As I mentioned before, with external secrets works, so I do not really understand the differences in auth with Hashi Vault between external-secrets and cert-manager.

Again thank you as always.

@andrey-dubnik
Copy link
Contributor

Even though audiences seem different they end up being the same when filled by Terraform. Also, I tried writing exactly the same but didn't make any difference.

About OIDC endpoint, from my local machine I have done a ping command and indeed the ping command says "Unknown host" So I guess my endpoint is not public. Could you point me out how to make it public please? I do not see any option in the Identity Provider from AWS.

Also, it's weird but I have given a try to the tokenSecretRef instead of the service accounts and it didn't work either.

As I mentioned before, with external secrets works, so I do not really understand the differences in auth with Hashi Vault between external-secrets and cert-manager.

Again thank you as always.

It really depends on your security setup for EKS and if external connections are allowed, I'm not EKS expert but likely this reference implies that access to a cluster endpoint may be cut-off from the internet or CIDR restricted. Vault should be able to reach k8s api endpoint.

Try removing the issuer from the config. We don't use one when registering the backend, not sure if this makes a difference.

Another thing to try is increasing log level on your cert-manager, it will post a token in the log so you can inspect the token scope to make sure it does have all the additional claims which vault expects.

@javierguzman
Copy link

Thanks @andrey-dubnik; I used the issuer because I saw a blog post from @maelvls where he used it, but for external-secrets I do not have it. Maybe it's something needed when your vault is within the cluster, which is not my case.

I have increased the log level but it doesn't give much more clue. Still the same error:

I0514 13:36:43.722669       1 sync.go:62] "Error initializing issuer: while requesting a Vault token using the Kubernetes auth: error calling Vault server: Error making API request.\n\nURL: POST https://mycloudvault:8200/v1/auth/kubernetes-pki/login\nCode: 403. Errors:\n\n* permission denied" logger="cert-manager.controller.issuers" resource_name="playground-issuer" resource_namespace="playground-issuer" resource_kind="Issuer" resource_version="v1"
E0514 13:36:43.722721       1 controller.go:167] "re-queuing item due to error processing" err=<
	while requesting a Vault token using the Kubernetes auth: error calling Vault server: Error making API request.

	URL: POST https://mycloudvault:8200/v1/auth/kubernetes-pki/login
	Code: 403. Errors:

	* permission denied
 > logger="cert-manager.controller.issuers" key="playground-issuer/playground-issuer"

However, I have realised about something that might be the problem, in external-secrets I do this:

        vault:
          server: ${var.vault_address}
          path: "secret"
          namespace: "admin/${var.environment}"
          version: "v2"
          auth:
              kubernetes

While in cert-manager issuer I do:

vault:
        server: ${var.vault_address}
        path: "${vault_mount.pki_engine.path}/sign/${local.raw_domain}"
        auth:
          kubernetes

So essentially, there is no way to specify the Hashi Vault namespace to cert-manager. Could that be the issue? Why there is no namespace field?

Do you have namespaces as well @andrey-dubnik ?

@andrey-dubnik
Copy link
Contributor

Do you have namespaces as well @andrey-dubnik ?

As far as I know namespaces is the Vault Enterprise feature, we use OSS which do not have it. It may explain the access issue as well as if you are using NS then request should have X-Vault-Namespace header set as it would influence the final vault access path.

Accordingly to the vault doc you can also construct the path without using the header and add namespace into the path e.g. <namespace_name>/secret/foo.

reg. the debug part what I do is set a level from 2 to 12 for the cert-manager deployment (I think in reality it is 0-5, don't remember exactly why I have set it to 12 but it definitely spit the token data into the logs after that)

spec:
      containers:
      - args:
        - --v=12
        - --cluster-resource-namespace=$(POD_NAMESPACE)
        - --leader-election-namespace=cert-manager
        - --acme-http01-solver-image=quay.io/jetstack/cert-manager-acmesolver:v1.15.0-alpha.0
        - --issuer-ambient-credentials=true
        - --max-concurrent-challenges=60

@javierguzman
Copy link

Thanks as always @andrey-dubnik!! just came here to say it worked as I saw in the docs about adding the namespace in the url as you have mentioned as well. That's why I was getting the auth problem all the time, I'm not sure even if the audience part is needed but as far as I am aware is better for security so I will leave it like that.

One last question though, I thought the use of service account was the way to avoid a static token i.e. secret Token. However, when I attach a secret to the service account:

resource "kubernetes_secret_v1" "issuer_sa_secret" {
  metadata {
    annotations = {
      "kubernetes.io/service-account.name" = kubernetes_service_account_v1.issuer_sa.metadata.0.name
      "kubernetes.io/service-account.namespace" = kubernetes_namespace_v1.issuer_namespace.metadata[0].name
    }
    name      = "${kubernetes_service_account_v1.issuer_sa.metadata[0].name}-secret"
    namespace = kubernetes_namespace_v1.issuer_namespace.metadata[0].name
  }
  type                           = "kubernetes.io/service-account-token"
  wait_for_service_account_token = true
}

resource "vault_kubernetes_auth_backend_config" "connect_sa_with_vault" {
  backend                = vault_auth_backend.kubernetes.path
  kubernetes_host        = var.cluster_endpoint
  token_reviewer_jwt     = kubernetes_secret_v1.issuer_sa_secret.data["token"]
  kubernetes_ca_cert     = kubernetes_secret_v1.issuer_sa_secret.data["ca.crt"]
}

And in cert-manager I do this:

spec:
      vault:
        server: ${var.vault_address}
        path: "${vault_mount.pki_engine.path}/sign/${local.raw_domain}"
        auth:
          kubernetes:
            mountPath: ${"/v1/${local.vault_namespace}/auth/${vault_auth_backend.kubernetes.path}"}
            role: ${vault_kubernetes_auth_backend_role.sa_issuer_role.role_name}
            serviceAccountRef:
              name: ${kubernetes_service_account_v1.issuer_sa.metadata[0].name}

We are essentially using a static token, aren't we? Or is cert-manager "refreshing" the token in the cluster and in the hashi vault backend from time to time?

@andrey-dubnik
Copy link
Contributor

Glad it worked!

Accordingly to the cert-manager documentation you don't need to create tokens for SA as it would use secretless auth with token refreshed every 10 min. In your Issuer configuration you didn't mention the token so it should use secretless, essentially you should not need to create a token secret for SA for the comms to work.

@javierguzman
Copy link

Glad it worked!

Accordingly to the cert-manager documentation you don't need to create tokens for SA as it would use secretless auth with token refreshed every 10 min. In your Issuer configuration you didn't mention the token so it should use secretless, essentially you should not need to create a token secret for SA for the comms to work.

I have been trying without success, I get 403 during login. What values do you use for configuring the vault? Specifically, token_reviewer_jwt and kubernetes_ca_cert.

When I made it work with a static token I did the following in order to use the secret's token & cert:

  token_reviewer_jwt     = kubernetes_secret_v1.issuer_sa_secret.data["token"]
  kubernetes_ca_cert     = kubernetes_secret_v1.issuer_sa_secret.data["ca.crt"]

However, now that has to come from the SA. So I have removed the token_reviewer_jwt and kubernetes_ca_cert seems to be mandatory so I use the cluster's CA, though I have tried to use the SA's CA as well without success.

@andrey-dubnik
Copy link
Contributor

we configure vault programmatically but it should be similar in TF terms

resource "vault_auth_backend" "kubernetes" {
  type = "kubernetes"
}

resource "vault_kubernetes_auth_backend_config" "example" {
  backend                      = vault_auth_backend.kubernetes.path
  kubernetes_host        = "http://example.com:443"
  kubernetes_ca_cert   = "-----BEGIN CERTIFICATE-----\nexample\n-----END CERTIFICATE-----"
  disable_local_ca_jwt  = true
}

@javierguzman
Copy link

we configure vault programmatically but it should be similar in TF terms

resource "vault_auth_backend" "kubernetes" {
  type = "kubernetes"
}

resource "vault_kubernetes_auth_backend_config" "example" {
  backend                      = vault_auth_backend.kubernetes.path
  kubernetes_host        = "http://example.com:443"
  kubernetes_ca_cert   = "-----BEGIN CERTIFICATE-----\nexample\n-----END CERTIFICATE-----"
  disable_local_ca_jwt  = true
}

Thanks as always! That's weird because I am using the same config and it isn't working for me, I get the following error with logLevel set to 6:

I0520 09:33:43.585044       1 setup.go:130] playground-issuer-kubernetes-pki: Failed to initialize Vault client: while requesting a Vault token using the Kubernetes auth: error calling Vault server: Error making API request.

URL: POST https://public-address:8200/v1/admin/playground/auth/kubernetes-pki/login
Code: 403. Errors:

* permission denied
I0520 09:33:43.585187       1 sync.go:62] "Error initializing issuer: while requesting a Vault token using the Kubernetes auth: error calling Vault server: Error making API request.\n\nURL: POST https://public-address:8200/v1/admin/playground/auth/kubernetes-pki/login\nCode: 403. Errors:\n\n* permission denied" logger="cert-manager.controller.issuers" resource_name="playground-issuer-kubernetes-pki" resource_namespace="playground-issuer" resource_kind="Issuer" resource_version="v1"
E0520 09:33:43.585285       1 controller.go:167] "re-queuing item due to error processing" err=<
	while requesting a Vault token using the Kubernetes auth: error calling Vault server: Error making API request.

	URL: POST https://public-address:8200/v1/admin/playground/auth/kubernetes-pki/login
	Code: 403. Errors:

	* permission denied
 > logger="cert-manager.controller.issuers" key="playground-issuer/playground-issuer-kubernetes-pki"

Also, I have managed to check the logs from the Cloud version of the vault and there isn't much but there is one line which is interesting:

2024-05-20T08:58:35.819Z [DEBUG] auth.kubernetes.auth_kubernetes_fec85866: login unauthorized: err="lookup failed: service account unauthorized; this could mean it has been deleted or recreated with a new token

So, could that mean that cert-manager refreshes the service account with a new token but it's not accepted in the vault somehow?

@andrey-dubnik
Copy link
Contributor

Try setting log level to 12. It should spit the auth token in the logs of the cert manager for you to examine, without examining the token it is hard to tell what is happening.

When vault receives the request it does a callback to the k8s cluster api to validate the token, it may be the part failing.

@andrey-dubnik
Copy link
Contributor

Also check you have applied the auth-delegator role and disabled local ca jwt. When short lived tokens are used vault will need to call k8s admin using the client's token and for that auth-delegator role is required.

https://developer.hashicorp.com/vault/docs/auth/kubernetes#use-the-vault-client-s-jwt-as-the-reviewer-jwt

@javierguzman
Copy link

Thanks @andrey-dubnik! Setting log level to 12 really works well.

In the logs I see there is a post request to https://172.20.0.1:443/api/v1/namespaces/playground-issuer/serviceaccounts/playground-issuer-sa/token . In the response, in one of the fields, I get the token which decoded is:

{
  "aud": [
    "vault://playground-issuer/playground-issuer-sa",
    "vault://playground-issuer/playground-issuer"
  ],
  "exp": 1716213067,
  "iat": 1716212467,
  "iss": "cluster-issuer-from-command",
  "kubernetes.io": {
    "namespace": "playground-issuer",
    "serviceaccount": {
      "name": "playground-issuer-sa",
      "uid": "15dd173b-45c2-4277-bd9d-3366635117de"
    }
  },
  "nbf": 1716212467,
  "sub": "system:serviceaccount:playground-issuer:playground-issuer-sa"
}

The iss seems to be the same as when I run:
kubectl get --raw /.well-known/openid-configuration | jq -r .issuer

The only thing a bit off is that there are two audiences, one is the service account and the other one is the issuer.

And yes, I set the auth-delegator cluster role to the service account and the create serviceaccounts/token:

resource "kubernetes_role_v1" "issuer_role" {
  metadata {
    name      = "${kubernetes_service_account_v1.issuer_sa.metadata[0].name}-role"
    namespace = kubernetes_namespace_v1.issuer_namespace.metadata[0].name
  }

  rule {
    api_groups     = [""]
    resources      = ["serviceaccounts/token"]
    resource_names = [kubernetes_service_account_v1.issuer_sa.metadata[0].name]
    verbs          = ["create"]
  }
}

resource "kubernetes_role_binding_v1" "issuer_sa_rb" {
  metadata {
    name      = "${kubernetes_service_account_v1.issuer_sa.metadata[0].name}-rb"
    namespace = kubernetes_namespace_v1.issuer_namespace.metadata[0].name
  }

  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind      = "Role"
    name      = kubernetes_role_v1.issuer_role.metadata[0].name
  }

  subject {
    kind      = "ServiceAccount"
    name      = helm_release.cert_manager.metadata[0].name
    namespace = kubernetes_namespace_v1.cert_manager_namespace.metadata[0].name
  }
}

resource "kubernetes_cluster_role_binding_v1" "issuer_sa_crb" {
  metadata {
    name = "${kubernetes_service_account_v1.issuer_sa.metadata[0].name}-crb"
  }

  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind      = "ClusterRole"
    name      = "system:auth-delegator"
  }

  subject {
    kind            = "ServiceAccount"
    name            = kubernetes_service_account_v1.issuer_sa.metadata[0].name
    namespace       = kubernetes_namespace_v1.issuer_namespace.metadata[0].name
  }
}

@andrey-dubnik
Copy link
Contributor

If at least one of the items from the aud list in the SA token matches the audience (also SA coordinates from sub like name and namespace) you have configured in a vault auth role it should pass under condition vault can successfully call your k8s cluster admin endpoint to validate the token and local CA validation is set to false.

@javierguzman
Copy link

If at least one of the items from the aud list in the SA token matches the audience (also SA coordinates from sub like name and namespace) you have configured in a vault auth role it should pass under condition vault can successfully call your k8s cluster admin endpoint to validate the token and local CA validation is set to false.

My role is:

Key                                 Value
---                                 -----
alias_name_source                   serviceaccount_uid
audience                            vault://playground-issuer/playground-issuer-sa
bound_service_account_names         [playground-issuer-sa]
bound_service_account_namespaces    [playground-issuer]
token_bound_cidrs                   []
token_explicit_max_ttl              0s
token_max_ttl                       1m
token_no_default_policy             false
token_num_uses                      0
token_period                        0s
token_policies                      [default pki-policy]
token_ttl                           55s
token_type                          default

And I have the local ca validation disabled:

resource "vault_kubernetes_auth_backend_config" "connect_sa_with_vault" {
  backend                = vault_auth_backend.kubernetes.path
  kubernetes_host        = var.kubernetes_host
  kubernetes_ca_cert     = var.kubernetes_ca
  disable_local_ca_jwt   = true
}

So I guess it's time to create a ticket in HashiCorp as I think all looks correct. If I find the solution I will put it in here for future people.

@andrey-dubnik
Copy link
Contributor

andrey-dubnik commented May 21, 2024

@javierguzman can you try adding a k8s host to the list of aud you specify in the cert manager SA section on top of what it already has? In case k8s api is expecting the aud in the token to match the admin api host when vault does a callback.

e.g. https://<actual k8s api host>

@javierguzman
Copy link

Hello @andrey-dubnik I think I got it working as Issuer appears to be status: True and type: Ready.

The bottom line was indeed to add more stuff into the aud; I tried the k8s api host but it didn't work. Fortunately, I took another look at the EKS's logs and saw one complaining about auth and was mentioning audience stuff related to https://kubernetes.default.svc so I added that as audience as it seems it made it work so:

serviceAccountRef:
              name: ${kubernetes_service_account_v1.issuer_sa.metadata[0].name}
              audiences: ["https://kubernetes.default.svc","vault://${kubernetes_namespace_v1.issuer_namespace.metadata[0].name}/${kubernetes_service_account_v1.issuer_sa.metadata[0].name}"]

So now it's working but I'm confused and have some last questions:

  1. In this post https://hackmd.io/@maelvls/S1tYFmegp from @maelvls for example says that for the audience you should run
    kubectl get --raw /.well-known/openid-configuration | jq .issuer -r

But this does not return to me "https://kubernetes.default.svc", it returns me "https://oidc.eks.my-region.amazonaws.com/id/someID". By any chance, do you understand what's going on?

  1. What are really the steps here? I think these are the steps but not sure if they are correct or I'm missing some:
1. Cert manager generates a short-lived token each 10 minutes and attaches it to the specified SA 
2. Issuer needs to communicate with HashiVault, so HashiVault needs to be configured to accept communication from the cluster, this is done with the Hashi Vault's role.
3. Now the vault is able to accept whatever from the cluster, the SA can send tokens. The tokens have an audience field, which I believe it means who is the cluster's element able to use that token.
4. Hashi Vault receives that tokens and performs a token review with the cluster and if everything is ok comms are established.

Are these steps correct or am I missing something?

  1. Lastly, do you have any recommendations for learning these advance topics? I am ok with basic k8s but things like this or advance networking I really struggle. I would tag this kind of stuff as advance. As official docs from cert-manager or HashiCorp don't seem enough.

Again thank you for your help, without you I'm not sure I would have achieved this.

@andrey-dubnik
Copy link
Contributor

@javierguzman what is the output of this command kubectl get --raw /api | jq ? Like to check if output of this should mandatory be in the audience in such a case.

@javierguzman
Copy link

@javierguzman what is the output of this command kubectl get --raw /api | jq ? Like to check if output of this should mandatory be in the audience in such a case.

It's like this:

{
  "kind": "APIVersions",
  "versions": [
    "v1"
  ],
  "serverAddressByClientCIDRs": [
    {
      "clientCIDR": "0.0.0.0/0",
      "serverAddress": "ip-172-blabla-.my-aws-region.compute.internal:443"
    }
  ]
}

@andrey-dubnik
Copy link
Contributor

andrey-dubnik commented May 22, 2024

@javierguzman what is the output of this command kubectl get --raw /api | jq ? Like to check if output of this should mandatory be in the audience in such a case.

It's like this:

{
  "kind": "APIVersions",
  "versions": [
    "v1"
  ],
  "serverAddressByClientCIDRs": [
    {
      "clientCIDR": "0.0.0.0/0",
      "serverAddress": "ip-172-blabla-.my-aws-region.compute.internal:443"
    }
  ]
}

Did you tried adding https://ip-172-blabla-.my-aws-region.compute.internal (without the port) to the cert-manager aud list instead of https://kubernetes.default.svc?

@javierguzman
Copy link

@javierguzman what is the output of this command kubectl get --raw /api | jq ? Like to check if output of this should mandatory be in the audience in such a case.

It's like this:

{
  "kind": "APIVersions",
  "versions": [
    "v1"
  ],
  "serverAddressByClientCIDRs": [
    {
      "clientCIDR": "0.0.0.0/0",
      "serverAddress": "ip-172-blabla-.my-aws-region.compute.internal:443"
    }
  ]
}

Did you tried adding https://ip-172-blabla-.my-aws-region.compute.internal (without the port) to the cert-manager aud list instead of https://kubernetes.default.svc?

I have just tried and it doesn't work like that. Only if I use "https://kubernetes.default.svc"

@andrey-dubnik
Copy link
Contributor

I have just tried and it doesn't work like that. Only if I use "https://kubernetes.default.svc"

@javierguzman I'm not sure if there is a direct way of figuring out what is the cluster default aud is going to be... EKS seem to have it on kubernetes.default.svc where AKS matches the API server public host.

The way to figure out the aud without accessing the admin controller is following:

Create service account

kubectl create serviceaccount test-sa

Create decode function for the SA token

function jwt_decode(){
    jq -R 'split(".") | .[1] | @base64d | fromjson' <<< "$1"
}

Extract token aud

jwt_decode $(kubectl create token test-sa) | jq -r '.aud'

Does the response have multiple aud in the output on top of https://kubernetes.default.svc (if one is present at all)?

@javierguzman
Copy link

Hello @andrey-dubnik,

So indeed the audience of that token is:

 "aud": [
    "https://kubernetes.default.svc"
  ]

So I guess the default audience depends on the kubeapi implementation, for AWS it uses the kubernetes.default.svc and for AKS is the public host and for other people would be kubectl get --raw /.well-known/openid-configuration | jq .issuer -r

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.