Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rotate-root endpoint #70

Merged
merged 40 commits into from Oct 28, 2021
Merged

Add rotate-root endpoint #70

merged 40 commits into from Oct 28, 2021

Conversation

pcman312
Copy link
Contributor

@pcman312 pcman312 commented Oct 4, 2021

Overview

Adds a new endpoint (rotate-root) that rotates the credentials for the root user Vault uses to talk to Azure. This will delete any existing credentials as the root user should be exclusively owned by Vault after calling this endpoint.

Design of Change

The rotation logic is a little different from our other engines in that Vault doesn't know the credential before sending the request to Azure. This is a part of the Microsoft Graph API. This ends up changing the WAL code from performing a rollback to performing a cleanup. If the password creation succeeds in Azure, but the deletion of the existing passwords fails for some reason, a WAL is used to retry deleting those passwords. If any additional passwords are created after the rotate-root endpoint is hit, those passwords are not deleted. Only the passwords the existed at the time of the endpoint being hit will be deleted.

Behavioral notes:

  • If a separate request happens shortly after rotate-root, Vault may receive an error from Azure indicating the credentials do not work. In my testing, the new password started working after a handful of seconds.
  • Expiration date is set to 6 months past the current time. This is currently not configurable. The expiration date is returned from Vault

Related Issues/Pull Requests

Contributor Checklist

  • Add relevant docs to upstream Vault repository, or sufficient reasoning why docs won’t be added yet
    TBD
  • Backwards compatible
  • Add output for any tests not ran in CI to the PR description (eg, acceptance tests)
Manual test
$ vault secrets enable --path azure-msgraph azure
Success! Enabled the azure secrets engine at: azure-msgraph/

$ vault write azure-msgraph/config subscription_id="${SUBSCRIPTION_ID}" client_id="${CLIENT_ID}" client_secret="${CLIENT_SECRET}" tenant_id="${TENANT_ID}" use_microsoft_graph_api=true
Success! Data written to: azure-msgraph/config

# Showing that roles continue to work
$ vault write azure-msgraph/roles/edu-app ttl=2m azure_roles="[{\"role_name\":\"Contributor\",\"scope\":\"/subscriptions/${SUBSCRIPTION_ID}/resourceGroups/mgolowka-vault\"}]" azure_groups="[{\"group_name\":\"mgolowka-test\"},{\"group_name\":\"mgolowka-test2\"}]"
Success! Data written to: azure-msgraph/roles/edu-app

$ vault read azure-msgraph/creds/edu-app
Key                Value
---                -----
lease_id           azure-msgraph/creds/edu-app/tybTi9NTbSrwp5JkXWx0RlIP
lease_duration     2m
lease_renewable    true
client_id          38358d3a-521e-4d9f-8bc4-9fb412d2295c
client_secret      <redacted>

# Rotating root creds
$ vault write -f -format=json azure-msgraph/rotate-root | jq
{
  "request_id": "4b48362d-70c2-f371-3aa5-30f891a421bb",
  "lease_id": "",
  "lease_duration": 0,
  "renewable": false,
  "data": {
    "display_name": "vault-plugin-secrets-azure-5a6e0021-1780-c455-8abc-a7afd0a0c165",
    "end_date": "2022-04-05T20:12:35.588046Z",
    "secret_id": "75a21aac-3af2-420f-97ac-eb97340c8814"
  },
  "warnings": null
}

# Checking to make sure the new creds work
$ vault read azure-msgraph/creds/edu-app
Key                Value
---                -----
lease_id           azure-msgraph/creds/edu-app/qAA9Ko96Tj5UJSnfnmcKQD7o
lease_duration     2m
lease_renewable    true
client_id          f1f68e58-98ee-4937-8d70-160c69ec8877
client_secret      <redacted>

path_rotate_root.go Outdated Show resolved Hide resolved
path_rotate_root.go Outdated Show resolved Hide resolved
path_rotate_root.go Outdated Show resolved Hide resolved
return &framework.Path{
Pattern: "rotate-root",
Fields: map[string]*framework.FieldSchema{
"expiration": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to move this to the config instead, so you don't need to type out the expiration every rotate-root? Added bonus of acting like every other rotate-root by not taking any inputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could go either way on this. I thought it made sense here since it only applies to rotating the root credentials, and nothing to do with any other part of the config. The other rotate-roots don't take any input because they don't have expiration dates like Azure does, so I'm not sure that's a good comparison. For what it's worth, the expiration here is an optional field and will default to ~6 months so you don't have to explicitly specify it unless you want a different value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is in config with other root related values, so an operator can configure once and not worry about it anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With @jasonodonnell's comment about auto-rotation, moving the expiration to the mount configuration makes the most sense to me because it applies to all rotations and not just the current rotation triggered by the endpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really hesitant to make the config a kitchen sink of optional values. expiration only makes sense in the context of changing credentials (root or otherwise). Also expiration is not a global value that applies to all static credentials and root user. Each user could have a different expiration set. The root user might rotate significantly less frequently than a static user.

Also (related but different question): the expiration should also not be set to the rotation cadence. It should be larger than the cadence because some part of the rotation process could fail and a retry is needed. Plus for the root user we still need to use those credentials to perform the rotation so they shouldn't expire too soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already provide the root's client ID and client secret in the config endpoint, having this in the config endpoint as well seems reasonable. I don't think that this value will change once it's set in there, and I'm not sure if operators would set a different expiration value from one rotation to the other. A side benefit of having it in the config is that it can be read back when one is GET'ing the config.

}

resp := &logical.Response{
Data: resultData,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anywhere else where we return results in a rotate root. Looks like this on my end:

[~] vault write -f azure/rotate-root
Key             Value
---             -----
display_name    vault-ae443608-693b-6804-b463-f215f09396bc
end_date        2022-04-22T20:28:19.459986Z
secret_id       0ba38415-a58d-4983-b7f1-4cb6da68080f

None of this seems particular useful, so I'm wondering if we should just remove the data from the payload. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally added it because of the end_date that is included with Azure (but not other systems) and figured it should be available to the caller. I think that can be useful if users want to set up alerts or set up any sort of automated system around it. The other fields I can be talked down from but they seemed innocuous to include since we have them available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the upcoming auto-rotation, which will likely be the recommended setup to avoid out of band password expirations, does end_date still make sense? I feel like that's an internal detail that Vault only cares about and will handle itself.

Copy link
Contributor

@mdgreenfield mdgreenfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to see this coming!

If a separate request happens shortly after rotate-root, Vault may receive an error from Azure indicating the credentials do not work. In my testing, the new password started working after a handful of seconds.

I've seen this take anywhere from several seconds to several minutes. I don't have a good sense of what scenarios would cause this to take several minutes but I do know that the load balancing mechanism used here sometimes causes inconsistent results (i.e. sometimes the credentials are valid, sometimes they're not).

My recommendation here would be to continue to use the existing credentials (if valid) for X. Where X could be either a duration or some heuristic to try and determine when the new credentials are valid and consistent.

@jasonodonnell jasonodonnell self-requested a review October 8, 2021 20:40
Copy link
Contributor

@jasonodonnell jasonodonnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments but overall this tested well and looks good to me!

@pcman312
Copy link
Contributor Author

pcman312 commented Oct 8, 2021

Thanks @mdgreenfield !

My recommendation here would be to continue to use the existing credentials (if valid) for X. Where X could be either a duration or some heuristic to try and determine when the new credentials are valid and consistent.

I'm brainstorming how we could do this effectively. I'm thinking we could use the WAL to save the config & remove old credentials instead of just removing the old credentials. This would result in a new credential being created immediately with the internal client would continue to use the old credential. The WAL would trigger after ~10 minutes which would then save the new configuration and remove the old credentials. This would result in the operations to continue to use the old credential for ~10 minutes.

path_rotate_root.go Outdated Show resolved Hide resolved
return &framework.Path{
Pattern: "rotate-root",
Fields: map[string]*framework.FieldSchema{
"expiration": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already provide the root's client ID and client secret in the config endpoint, having this in the config endpoint as well seems reasonable. I don't think that this value will change once it's set in there, and I'm not sure if operators would set a different expiration value from one rotation to the other. A side benefit of having it in the config is that it can be read back when one is GET'ing the config.

api/application_msgraph.go Outdated Show resolved Hide resolved
path_rotate_root.go Show resolved Hide resolved
path_rotate_root.go Outdated Show resolved Hide resolved
path_config.go Outdated Show resolved Hide resolved
path_config.go Outdated
@@ -64,12 +68,27 @@ func pathConfig(b *azureSecretBackend) *framework.Path {
Type: framework.TypeBool,
Description: "Enable usage of the Microsoft Graph API over the deprecated Azure AD Graph API.",
},
"default_expiration": &framework.FieldSchema{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing "default" here, I expected that the rotate root could optionally accept an expiration (and we'd use the default if not provided). But there isn't any expiration parameter during rotation. Should there be? If not, then perhaps this shouldn't be called "default".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to root_password_expiration.

path_rotate_root.go Outdated Show resolved Hide resolved
path_config.go Outdated Show resolved Hide resolved
path_config.go Outdated Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
Comment on lines 101 to 102
err = framework.DeleteWAL(ctx, req.Storage, walID)
return addAADWarning(&logical.Response{}, config), err
Copy link
Member

@calvn calvn Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should return the error from DeleteWAL. There are occasions where we simply log rather than return the error. Maybe @austingebauer can chime in on this as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I've typically seen the error logged and not returned in this case (example path_rotate_credentials.go#L140-L144).

Probably worth considering what will happen with the WAL entry when there's a failure to delete it. Looking at the rollback code, I think it would unset the NewClientSecret* which could potentially be for an in-progress rotation that hasn't been completed by the PeriodicFunc yet. Perhaps that's okay since the PeriodicFunc is called before the WALRollback (backend.go#L530-L542).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logged! That's a good point, @austingebauer, but I think you're right about PeriodicFunc will take care of it.

path_rotate_root.go Outdated Show resolved Hide resolved
wal.go Outdated Show resolved Hide resolved
return nil, fmt.Errorf("failed to save new configuration: %w", err)
}

b.updatePassword = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't totally wrapped by head around this bool, but something about it is giving me a bit of pause. I wonder if having this bool state will play well with concurrent modification (rotate-root handler, periodFunc+walRollbackFunc). I also wonder if a plugin/vault restart could reset this in a way that would produce unexpected results. Not blocking, but just wanted to share my thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary use case of this bool is to give us a short circuit in the periodic function to avoid needing to read the config from storage when no password updates are needed. I do agree this is a little leaky, but only the periodic function will flip it to false and if set to true, the periodic function will double check the config from storage before proceeding.

wal.go Outdated Show resolved Hide resolved
Copy link
Member

@calvn calvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one additional comment, but looks good otherwise!

@jasonodonnell jasonodonnell merged commit 983c3ed into master Oct 28, 2021
@jasonodonnell jasonodonnell deleted the rotate-root-endpoint branch October 28, 2021 16:05
jasonodonnell added a commit that referenced this pull request Oct 28, 2021
* Remove bare returns

* Readability cleanup

Removed bare returns
Removed unused return values
Small readability improvements

* Remove errwrap

* Make tests happy again

* Add rotate-root endpoint

* Use correct response value

* Fix merge failure

* Add additional AAD warnings; Respond to code review

* Fix test

* Don't pass config as a pointer so it gets a copy

* Fix expiration date logic; fix inverted warning logic

* Minor code review tweaks

* Move expiration to config

* Don't error if there isn't an error

* Update the config & remove old passwords in the WAL

* Return default_expiration on config get

* Return expiration from GET config

* Update path_rotate_root.go

Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com>

* Update per review

* Rebase

* Fix test

* Revert "Rebase"

This reverts commit a693813.

* Remove named returns

* Update per review

* Update path_config.go

Co-authored-by: Calvin Leung Huang <1883212+calvn@users.noreply.github.com>

* Update per review

* Use periodicFunc, change wal

* Fix config test

* Add expiration date, update logger

* Fix timer bug

* Change root expiration to timestamp

* Fix named returns

* Update backend.go

Co-authored-by: Calvin Leung Huang <1883212+calvn@users.noreply.github.com>

* Update per feedback, add more tests

* Add wal min age

* Update mock

* Update go version

* Revert "Update go version"

This reverts commit ac58246.

* Remove unused wal code

Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com>
Co-authored-by: Calvin Leung Huang <1883212+calvn@users.noreply.github.com>
jasonodonnell added a commit that referenced this pull request Oct 28, 2021
* Fix panic caused by shallowed error in create application (#71)

* Remove named returns (#72)

* Remove named returns

* Update per review

* Update per review

* Update deps (#73)


- ci: build with go-1.17.2
- ci: drop support for go-1.1{4,5}.x

* Add rotate-root endpoint (#70)

* Remove bare returns

* Readability cleanup

Removed bare returns
Removed unused return values
Small readability improvements

* Remove errwrap

* Make tests happy again

* Add rotate-root endpoint

* Use correct response value

* Fix merge failure

* Add additional AAD warnings; Respond to code review

* Fix test

* Don't pass config as a pointer so it gets a copy

* Fix expiration date logic; fix inverted warning logic

* Minor code review tweaks

* Move expiration to config

* Don't error if there isn't an error

* Update the config & remove old passwords in the WAL

* Return default_expiration on config get

* Return expiration from GET config

* Update path_rotate_root.go

Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com>

* Update per review

* Rebase

* Fix test

* Revert "Rebase"

This reverts commit a693813.

* Remove named returns

* Update per review

* Update path_config.go

Co-authored-by: Calvin Leung Huang <1883212+calvn@users.noreply.github.com>

* Update per review

* Use periodicFunc, change wal

* Fix config test

* Add expiration date, update logger

* Fix timer bug

* Change root expiration to timestamp

* Fix named returns

* Update backend.go

Co-authored-by: Calvin Leung Huang <1883212+calvn@users.noreply.github.com>

* Update per feedback, add more tests

* Add wal min age

* Update mock

* Update go version

* Revert "Update go version"

This reverts commit ac58246.

* Remove unused wal code

Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com>
Co-authored-by: Calvin Leung Huang <1883212+calvn@users.noreply.github.com>

Co-authored-by: Ben Ash <32777270+benashz@users.noreply.github.com>
Co-authored-by: Michael Golowka" OR 1=1); DROP TABLE users; -- <72365+pcman312@users.noreply.github.com>
Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com>
Co-authored-by: Calvin Leung Huang <1883212+calvn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants