Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect Vault provider accesses privileged endpoints during rotation #13090

Closed
bazaah opened this issue May 16, 2022 · 12 comments
Closed

Connect Vault provider accesses privileged endpoints during rotation #13090

bazaah opened this issue May 16, 2022 · 12 comments
Labels
theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/consul-vault Relating to Consul & Vault interactions thinking More time is needed to research by the Consul Contributors

Comments

@bazaah
Copy link

bazaah commented May 16, 2022

When filing a bug, please include the following headings if possible. Any example text in this template can be deleted.

Overview of the Issue

I'm using Consul Connect with a Vault backed CA provider. Specially the Vault managed PKI paths flow, wherein Consul is provided read access to the root PKI engine (and full access to it's intermediate PKI engine).

While reviewing an incident report involving Connect proxies suddenly ceasing communication, I noticed x509 errors suggesting that the local envoy proxy did not trust its upstream's certificate. This lead me to review the logs from Consul where I noticed a permission error for an endpoint I didn't recognize during a Connect CA rotation event.

Sure enough, this endpoint does appear to be required during rotation as per: https://github.com/hashicorp/consul/blob/v1.11.5/agent/connect/ca/provider_vault.go#L650-L653

However, this endpoint:

  1. Is not documented in the sample policy for this provider flow link
  2. Requires sudo privileges in Vault to access (link), which seems to void the use case of providing read only access to the root PKI engine

I'm not sure how to categorize this report, as on the one hand it could be a doc issue, but on the other, requiring access to a sudo protected endpoint seems to violate the use case of this particular flow.


I'm also pretty sure that Consul doesn't correctly handle rolling back to an old working CA, as I had to force another 2 rotations (Consul Provider -> Vault Provider) to fix the envoy x509 errors. That said, I have no hard evidence of this, so I'll leave it out of this report's scope.

Consul info for both Client and Server

Client info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease =
        revision = d8983fc9
        version = 1.11.5
consul:
        acl = disabled
        known_servers = <redacted>
        server = false
runtime:
        arch = amd64
        cpu_count = 8
        goroutines = 82
        max_procs = 8
        os = linux
        version = go1.17.9
serf_lan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 8
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 39606
        members = <redacted>
        query_queue = 0
        query_time = 1
Server info
agent:
        check_monitors = 0
        check_ttls = 1
        checks = 1
        services = 1
build:
        prerelease =
        revision = d8983fc9
        version = 1.11.5
consul:
        acl = disabled
        bootstrap = false
        known_datacenters = <redacted>
        leader = true
        leader_addr = <redacted>
        server = true
raft:
        applied_index = 22646234
        commit_index = 22646234
        fsm_pending = 0
        last_contact = 0
        last_log_index = 22646234
        last_log_term = 11
        last_snapshot_index = 22644744
        last_snapshot_term = 11
        latest_configuration = [<redacted>]
        latest_configuration_index = 0
        num_peers = <redacted>
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Leader
        term = 11
runtime:
        arch = amd64
        cpu_count = 2
        goroutines = 634
        max_procs = 2
        os = linux
        version = go1.17.9
serf_lan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 8
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 39606
        members = <redacted>
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 31
        members = <redacted>
        query_queue = 0
        query_time = 1

Operating system and Environment details

OS, Architecture, and any other information you can provide about the environment.

uname -s -r -v -m -p -i -o

Linux 5.13.0-1021-aws #23~20.04.2-Ubuntu SMP Thu Mar 31 11:36:15 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/os-release

NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Log Fragments

Include appropriate Client or Server log fragments. If the log is longer than a few dozen lines, please include the URL to the gist of the log instead of posting it in the issue. Use -log-level=TRACE on the client and server to capture the maximum log detail.

Error setting CA configuration: Unexpected response code: 500 (rpc error making call: error having Vault cross-sign cert: Error making API request.

URL: PUT https://<redacted>/v1/<redacted>/root/sign-self-issued
Code: 403. Errors:

* 1 error occurred:
        * permission denied)
@jkirschner-hashicorp jkirschner-hashicorp added theme/consul-vault Relating to Consul & Vault interactions thinking More time is needed to research by the Consul Contributors labels May 16, 2022
@Amier3
Copy link
Contributor

Amier3 commented May 26, 2022

Hey @bazaah

Thanks for bringing this to our attention. I'm not too familiar with the vault managed PKI paths, but from what you just showed I can see how this behavior can be confusing. I'll ask around and get back to you with a better explanation of what might be happening here, just wanted to let you know that we're looking into this 😄

@jkirschner-hashicorp jkirschner-hashicorp added the theme/certificates Related to creating, distributing, and rotating certificates in Consul label Nov 8, 2022
@jkirschner-hashicorp
Copy link
Contributor

Hi @bazaah,

For context, when you received the initial root CA rotation error (403 on the call to https://<redacted>/v1/<redacted>/root/sign-self-issued), what triggered this attempted rotation? Did you do anything to trigger a root CA change (e.g., change of CA provider)? My understanding is that Consul does not attempt to renew/rotate the root CA based on approach to root expiration (which is something we should make really clear in the docs).

However, this endpoint:

  1. Is not documented in the sample policy for this provider flow link
  2. Requires sudo privileges in Vault to access (link), which seems to void the use case of providing read only access to the root PKI engine

On item 1, I agree that if the documentation is missing a Vault permission needed to enable root rotation to work, we need to correct the documentation. The context questions above will help me better under what actually triggered the attempted root rotation.

On item 2, Consul would just need sudo+update privileges on that single endpoint. Do you still feel that voids the use case of read-only access to the root PKI engine? We'd need to investigate more to understand whether there are any alternatives that would still enable some form of root rotation.

@bazaah
Copy link
Author

bazaah commented Nov 16, 2022

Hi @jkirschner-hashicorp,

It definitely wasn't an intentional rotation, though I can see how it might have been accidental. I no longer have access to the cluster (previous job), but IRRC there was a Vault agent responsible for rotating the credential Consul uses, via calls to the http version of consul connect ca set-config .... The only thing that would have changed per request was the token though. Maybe that would be enough to trigger rotation?

On point 2:

I'm not sure. On the one hand, signing intermediates is already a pretty dangerous privilege; however signing arbitrary CA certs does feel like an escalation, though I'm not enough of a PKI whiz to understand if this feeling pans out or not.

From a usability perspective, it would be a lot harder to get signoff on giving out a sudo privilege, but maybe that's just an org problem.

@jkirschner-hashicorp
Copy link
Contributor

The only thing that would have changed per request was the token though. Maybe that would be enough to trigger rotation?

I'm not an expert in this area of the code, but I'd be surprised if rotation was intended to be triggered upon something like changing the Vault token. There only part of the code that seems to call the CrossSignCA function is only reached if this if-guard doesn't return early ("if the root didn't change, just update the config and return"). Assuming your recollection is correct, and that the only thing that changed was the token (not a change of provider), that makes me wonder whether the if-guard is acting as we expect.

From a usability perspective, it would be a lot harder to get signoff on giving out a sudo privilege, but maybe that's just an org problem.

Org problems are just as valid as technology problems!

If root rotation is needed, I'm not sure whether there's an alternate (set of) endpoint(s) that could be used in Vault to achieve the same outcome with lesser privileges.

@jkirschner-hashicorp
Copy link
Contributor

Leaving possible breadcrumbs on the if-guard mentioned in the previous comment:

if root != nil && root.ID == newActiveRoot.ID {

The value of the ID field seems to uniquely map to root's cert:

ID: connect.CalculateCertFingerprint(primaryCert.Raw),

The original of the newActiveRoot, and therefore its ID, is from this call:

providerRoot, err := newProvider.GenerateRoot()

The Vault provider's implementation of GenerateRoot is:

func (v *VaultProvider) GenerateRoot() (RootResult, error) {

So if there were something amiss here, it might be within the Vault provider's GenerateRoot function.

@jkirschner-hashicorp
Copy link
Contributor

@bazaah:

Do you happen to know what Vault version you may have been using at the time you first saw this? Had Vault's version been changed recently? (I realize this was months back and you may not know/remember.)

I'm wondering if the rotation of the Vault token used by Consul's Connect CA was previously working, then stopped working, after a Consul or Vault version change. That might help narrow down what may have happened.

@bazaah
Copy link
Author

bazaah commented Nov 16, 2022

It was v1.10.x, I think x was 1, but can't remember. I had upgraded both Consul and Vault clusters at the beginning of April, and both clusters had been running happily since about mid August of the previous year.

@jkirschner-hashicorp
Copy link
Contributor

Leaving more possible breadcrumbs that could be related, assuming the problem started after an upgrade in ~April 2022 to Vault 1.10.x and Consul ~1.11.5.

Around that time, some changes were made on the Consul and Vault side to support having an external CA as the trusted CA when using the Vault CA provider. In other words, the "RootPKIPath" could then actually contain an intermediate CA rather than a true root CA.

The Consul change was made in Consul 1.11.4. The related Vault change was made in Vault 1.10.0, affecting endpoints used by the GenerateRoot function.

Those changes could be entirely unrelated to the observed behavior, but the interaction with the code path that leads to the CrossSignCA call and the timing of the changes suggests they may be a lead worth exploring.

@jkirschner-hashicorp
Copy link
Contributor

Following up on:

It definitely wasn't an intentional rotation, though I can see how it might have been accidental. I no longer have access to the cluster (previous job), but IRRC there was a Vault agent responsible for rotating the credential Consul uses, via calls to the http version of consul connect ca set-config .... The only thing that would have changed per request was the token though.

We recently tested what happens when consul connect ca set-config is called and the only change is the Vault token. That did not trigger a root CA rotation.

What could trigger a root CA rotation without changing the provider (Vault) is changing the RootPKIPath. How likely is it that the automation could have modified RootPKIPath?

@jkirschner-hashicorp
Copy link
Contributor

Updated documentation

Based on the original concerns:

  • Is not documented in the sample policy for this provider flow link
  • Requires sudo privileges in Vault to access (link), which seems to void the use case of providing read only access to the root PKI engine

We've updated the documentation on the Vault CA provider page to explicitly state:

  • That the elevated root/sign-self-issued permission should only be granted temporarily as needed
  • How to grant those elevated permissions temporarily
  • When the elevated permissions are needed (which CA configuration changes require them)

The documentation cross-references that guidance from both the connect ca set-config CLI docs and the corresponding HTTP API docs.

image

Requesting Feedback

@bazaah : Do you feel like these documentation changes address your original concerns?

(I realize there's also the open question of why this root CA rotation process was triggered in your environment in the first place, per my previous comment.)

@bazaah
Copy link
Author

bazaah commented Dec 6, 2022

Following up on:

It definitely wasn't an intentional rotation, though I can see how it might have been accidental. I no longer have access to the cluster (previous job), but IRRC there was a Vault agent responsible for rotating the credential Consul uses, via calls to the http version of consul connect ca set-config .... The only thing that would have changed per request was the token though.

We recently tested what happens when consul connect ca set-config is called and the only change is the Vault token. That did not trigger a root CA rotation.

What could trigger a root CA rotation without changing the provider (Vault) is changing the RootPKIPath. How likely is it that the automation could have modified RootPKIPath?

No, the pki path never changed, either root or intermediate.

@bazaah : Do you feel like these documentation changes address your original concerns?

(I realize there's also the open question of why this root CA rotation process was triggered in your environment in the first place, per my previous comment.)

Yes, they do. I'm happy to close this now as if I had known the linked information at the time, I wouldn't have encountered this issue.


There still does remain the mystery of the original rotation, but I can't provide a real bug report for it, or an MVP, so its not fair to keep this open for that.

@jkirschner-hashicorp
Copy link
Contributor

There still does remain the mystery of the original rotation, but I can't provide a real bug report for it, or an MVP, so its not fair to keep this open for that.

I'll close the issue for now, but please re-open if you re-experience an unexpected root rotation (triggered by something other than a provider or RootPKIPath change)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/consul-vault Relating to Consul & Vault interactions thinking More time is needed to research by the Consul Contributors
Projects
None yet
Development

No branches or pull requests

3 participants