Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation for workload identity with Vault and federation #20097

Open
benvanstaveren opened this issue Mar 8, 2024 · 7 comments
Open

documentation for workload identity with Vault and federation #20097

benvanstaveren opened this issue Mar 8, 2024 · 7 comments
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/docs Documentation issues and enhancements theme/workload-identity

Comments

@benvanstaveren
Copy link

Proposal

The documentation for workload identity needs a lot of work; at the moment a bunch of key items (at least, that I consider key items) are missing which makes it very hard to determine what, if any, impact a future upgrade will have. Most notably lacking is an explanation how to set up workload identity across multiple federated clusters. Given the fact that the Vault JWT auth endpoint requires you point it back at the JWKS URL in Nomad, it seems to imply each cluster needs it's own endpoint. This would very much make life incredibly complicated if cluster A goes tits up, and you reschedule everything on cluster B, but now you'll need all the roles required on cluster A defined on the auth endpoint for cluster B as well.

As someone with 4 clusters federated together this is kind of a deal breaker at the moment.

Also lacking is decent pointers and/or info towards how to migrate from the current Vault integration to the new workload identity thing; at least, there is some info available but it's all sort of scattered.

Use-cases

Making my life easier and letting me decide whether we're going to pin ourselves on Nomad 1.8 or not.

@Lord-Y
Copy link

Lord-Y commented Mar 8, 2024

@benvanstaveren We are on 2 regions and 2 nomad clusters. The 2nd is declared as a secindary cluster. We created:

  • 1 jwt path per region
  • 1 role per region
  • 1 policy per region
  • 1 jwks url per region

Our setup is working but there are still issues with nomad and workload identity. Thousands 403 vault errors around 30min as my TTL is set to 1 hour. I make 2 rollbacks this week (4 in total). Hopefully I'll open new issues next week. You better wait.

@benvanstaveren
Copy link
Author

@Lord-Y um, yeah, that's my point. I don't want to create 4 identical roles on 4 different auth endpoints (we have 1 vault cluster, 4 nomad clusters) - that's asking for something to be forgotten or otherwise overlooked and then we get to play the happy fun debug time to figure out why things aren't working.

I'm not concerned about 403 errors on Vault right now, I'm more concerned about clarification in the documentation that will decide whether or not we keep Nomad at all, or fork it at 1.8 and keep our own version, or (god forbid) switch to k8s. I mean, with the level of indirection workload identity seems to be requiring we may as well...

@tgross tgross added theme/docs Documentation issues and enhancements and removed type/enhancement labels Mar 8, 2024
@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation Mar 8, 2024
@tgross tgross changed the title Documentation enhancement for workload identity needed documentation for workload identity with Vault and federation Mar 8, 2024
@tgross
Copy link
Member

tgross commented Mar 8, 2024

@Lord-Y let's keep any bug reports in a separate issue, please.

Hi @benvanstaveren! I've re-titled this issue to focus on the area that seems most directly in contention here.

Most notably lacking is an explanation how to set up workload identity across multiple federated clusters. Given the fact that the Vault JWT auth endpoint requires you point it back at the JWKS URL in Nomad, it seems to imply each cluster needs it's own endpoint. This would very much make life incredibly complicated if cluster A goes tits up, and you reschedule everything on cluster B, but now you'll need all the roles required on cluster A defined on the auth endpoint for cluster B as well.

Agreed that we're definitely lacking in guidance here. We'll make sure we get that resolved.

In the meanwhile, he Nomad keyring is replicated only within a region, so Workload Identities only apply within a single Nomad region. The way to allow multiple Nomad regions to use a single Vault cluster would be to configure the public keys in the Vault JWT Auth Method via jwt_validation_pubkeys.

You'll note that unfortunately this currently doesn't have a way of automatically keeping up-to-date the way the JWKS endpoint does. We're looking to resolve that in #19669 (cc @schmichael), and that will likely be a blocker to our deprecating the old Vault token-based workflow so that folks like you with federated clusters have an ergonomic way to operate it.

@benvanstaveren
Copy link
Author

@tgross wouldn't it be an easier thing to replicate the keyring? I'm not entirely up on the internals of it all but I have a vague recollection that clusters do replicate things to eachother on the ACL end (hence the authoritative_region setting); wouldn't it be possible to piggyback on that mechanism? At least that way the behaviour would be similar to other ACL related things. At least that's what it feels like to me :) Let me know, and I can maybe make that another issue/feature request or something?

@tgross
Copy link
Member

tgross commented Mar 8, 2024

The key metadata is in Raft and so could easily use that same mechanism, but the cryptographic material is intentionally not because we shipped the initial implementation without #14852. Once that's done, it'd at least be a possibility.

@benvanstaveren
Copy link
Author

The key metadata is in Raft and so could easily use that same mechanism, but the cryptographic material is intentionally not because we shipped the initial implementation without #14852. Once that's done, it'd at least be a possibility.

Maybe I'll create an issue for it referring back to that issue and this one to maybe keep it visible,because in reference to #19669 it seems to me that it would still require either vault to be able to automatically have the signing keys pushed to it or an external tool to sync up the new keys when they're made available - the former being a nicer thing than the latter because ideally there is no external tooling or reliance on someone remembering "oh we need to update X because..." :)

@benvanstaveren
Copy link
Author

Okay so, created a new issue (#20123 ) as a proposal for the keyring replication. I'll leave it up to the powers that be to perhaps rename/organise/de-duplicate some of this stuff :)

@jrasell jrasell moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage Apr 5, 2024
@jrasell jrasell added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/docs Documentation issues and enhancements theme/workload-identity
Projects
Development

No branches or pull requests

4 participants