Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: design doc for supporting sa authencation for RGW with vault #10319

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

thotz
Copy link
Contributor

@thotz thotz commented May 24, 2022

Description of your changes:
The design doc for supporting service account authentication for RGW
while configuring with Vault. The OSD encryption already support it.

Signed-off-by: Jiffin Tony Thottan jthottan@redhat.com

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

design/ceph/object/ceph-rgw-k8s-sa-authentication.md Outdated Show resolved Hide resolved
There are different ways RGW can authenticate with help of [vault agent](https://docs.ceph.com/en/latest/radosgw/vault/#vault-agent), but only service account authentication is supported here.

## Proposal details
The service account details will specified in `ConnectionDetails` of `KeyManagementServiceSpec`, using that info plus the other details a config map with <ceph-object-store>-vault-agent-cm populated for the vault agent sidecar container. The side container details will be added RGW pod spec and start sidecar container with RGW pod. The RGW will be configured with vault agent specific options.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What are the contents of the configmap? How about an example yaml of the configmap contents?
  2. Why do we need a configmap? For example, could Rook just generate the contents of the configmap in a file in an init container? Not sure if that makes sense in this case, but I'm curious to explore other approaches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configmap contains configurations for the start vault agent, I will add sample example here

The service account details will specified in `ConnectionDetails` of `KeyManagementServiceSpec`, using that info plus the other details a config map with <ceph-object-store>-vault-agent-cm populated for the vault agent sidecar container. The side container details will be added RGW pod spec and start sidecar container with RGW pod. The RGW will be configured with vault agent specific options.

### Risks and Mitigation
User will be able to modify the `vault-agent-cm` which is not preferable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who owns the content of the CM? Does the operator just generate it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rook Operator generates based on values from Connection Details in KMS config. please refer here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This risk doesn't seem necessary. Only admins are expected to have access to the rook namespace, so normal users cannot modify the configmap. If a user has access to the rook namespace, they could destroy everything, so no need to call this risk out.

design/ceph/object/ceph-rgw-k8s-sa-authentication.md Outdated Show resolved Hide resolved
@thotz thotz force-pushed the design-rgw-k8s-sa-authentication branch from a5e8b68 to 6ccfe07 Compare May 25, 2022 18:09
@thotz thotz requested a review from travisn May 25, 2022 18:09
Copy link
Member

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way for users to manually configure this today? If so, the design should also include those steps so we have an idea of what the operator needs to do to implement this?

What is the relative priority of this work compared to #10316 and #10318?

There are different ways RGW can authenticate with help of [vault agent](https://docs.ceph.com/en/latest/radosgw/vault/#vault-agent), but only service account authentication is supported here.

## Proposal details
The service account details will specified in `ConnectionDetails` of `KeyManagementServiceSpec`, using that info plus the other details a config map with <ceph-object-store>-vault-agent-cm populated for the vault agent sidecar container. The side container details will be added RGW pod spec and start sidecar container with RGW pod. The RGW will be configured with vault agent specific options. For the following details in `ConnectionDetails`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sidecar required? It is non-ideal for Rook to hard-code support for Vault, especially in today's world where the vault injector is available. Why don't we instruct users to use the injector instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the RGW can connect directly to Vault, why should we also implement this? I think that would have the same effect as this, no?

What does this do for users beyond add confusion due to a second way of configuring the same feature? Why should we go through the development and maintenance effort for this if there is a simpler alternative that accomplishes the same thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sidecar required? It is non-ideal for Rook to hard-code support for Vault, especially in today's world where the vault injector is available. Why don't we instruct users to use the injector instead?

Vault injector Job is different, it does not authenticate with applications. It just inject vault secret directly to application pod as file kinda of webhook

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the RGW can connect directly to Vault, why should we also implement this? I think that would have the same effect as this, no?

What does this do for users beyond add confusion due to a second way of configuring the same feature? Why should we go through the development and maintenance effort for this if there is a simpler alternative that accomplishes the same thing?

RGW can authenticate with vault directly using token it is considered to be the primitive method. The vault agent provides different flavours and hence it preferred token authentication. Since the workload is in k8s environments, users expect a way to authenticate with the Service account. Hence it was added for OSD encryption as well but a vault agent was not used. But in RGW the authentication requires a vault agent. Another approach make RGW authenticate with vault directly using the service account. But upstream developers are not keen since ceph does not know k8s or service accounts etc.

There are different ways RGW can authenticate with help of [vault agent](https://docs.ceph.com/en/latest/radosgw/vault/#vault-agent), but only service account authentication is supported here.

## Proposal details
The service account details will specified in `ConnectionDetails` of `KeyManagementServiceSpec`, using that info plus the other details a config map with <ceph-object-store>-vault-agent-cm populated for the vault agent sidecar container. The side container details will be added RGW pod spec and start sidecar container with RGW pod. The RGW will be configured with vault agent specific options. For the following details in `ConnectionDetails`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the RGW can connect directly to Vault, why should we also implement this? I think that would have the same effect as this, no?

What does this do for users beyond add confusion due to a second way of configuring the same feature? Why should we go through the development and maintenance effort for this if there is a simpler alternative that accomplishes the same thing?

@thotz
Copy link
Contributor Author

thotz commented May 26, 2022

Is there a way for users to manually configure this today? If so, the design should also include those steps so we have an idea of what the operator needs to do to implement this?

What is the relative priority of this work compared to #10316 and #10318?

Please check comment

@thotz thotz requested a review from BlaineEXE May 26, 2022 17:18
Copy link
Member

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes while this is waiting in priority line behind #10318 and #10323 (in that order).

@BlaineEXE
Copy link
Member

I decided to spend my morning looking into Vault and SA-based Auth. I think the KMS and Kubernetes integration landscape has grown compared to when Vault support was initially added.

Of note, if users set up the kubernetes auth method to allow Vault to give KMS secrets to apps via service account, I don't see anything that suggests a Vault sidecar container is required. I think it will be best if we don't have to bind Vault-awareness into Rook more than we have to, and I think we should try to avoid creating a Vault pod (sidecar) if possible. While starting a vault agent on nodes may be the best strategy for deploying RGW on bare metal, I suspect there are more K-native methods available to us for Rook. We shouldn't mirror Ceph's complexity in Rook unless it's critical.

Vault's agent injector exists, and I think this is a way we can add the vault sidecar without having to code the pod details into Rook.

Or, I think there may be another option, which seems from my research today that it won't require a sidecar or the agent injector. From what I can tell, if the "Kubernetes" auth method is set up for Vault, and the Kubernetes version is 1.21+, a key will be added to /var/run/secrets/kubernetes.io/serviceaccount automatically. What I hope this means is that we can direct the RGW to find the auth details in that directory to authenticate with Vault.

These are the chief resources I found that break down the "Kubernetes" auth method:

@thotz
Copy link
Contributor Author

thotz commented Jun 13, 2022

I decided to spend my morning looking into Vault and SA-based Auth. I think the KMS and Kubernetes integration landscape has grown compared to when Vault support was initially added.

Of note, if users set up the kubernetes auth method to allow Vault to give KMS secrets to apps via service account, I don't see anything that suggests a Vault sidecar container is required.

For service account authentication vault sidecar is not a must. But here the issues RGW cannot directly authenticate with vault using the service account. But it can authenticate it with help of a vault agent. Hence vault agent is added as sidecar to rgw pod. So RGW authenticates with vault via vault agent using service account

I think it will be best if we don't have to bind Vault-awareness into Rook more than we have to, and I think we should try to avoid creating a Vault pod (sidecar) if possible. While starting a vault agent on nodes may be the best strategy for deploying RGW on bare metal, I suspect there are more K-native methods available to us for Rook. We shouldn't mirror Ceph's complexity in Rook unless it's critical.

Vault's agent injector exists, and I think this is a way we can add the vault sidecar without having to code the pod details into Rook.

Or, I think there may be another option, which seems from my research today that it won't require a sidecar or the agent injector.

From what I can tell, if the "Kubernetes" auth method is set up for Vault, and the Kubernetes version is 1.21+, a key will be added to /var/run/secrets/kubernetes.io/serviceaccount automatically. What I hope this means is that we can direct the RGW to find the auth details in that directory to authenticate with Vault.

Yes that's correct and even I tried to add that support in RGW codebase ceph/ceph#37868, this makes RGW understands the service account tokens of k8s. From design point of I felt the right approach to vault agent. At the these service account toke is jwt token which was already supported by RGW with vault agent.

These are the chief resources I found that break down the "Kubernetes" auth method:

* https://www.vaultproject.io/docs/auth/kubernetes#kubernetes-1-21

* https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#serviceaccount-admission-controller

@thotz thotz requested a review from BlaineEXE June 13, 2022 07:39
use_auto_auth_token = true
}
listener "tcp" {
address = "127.0.0.1:8100"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a different port ever need to be configured? For example, I see port 8200 used above on line 25.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

The service account details will specified in `ConnectionDetails` of `KeyManagementServiceSpec`, using that info plus the other details a config map with <ceph-object-store>-vault-agent-cm populated for the vault agent sidecar container. The side container details will be added RGW pod spec and start sidecar container with RGW pod. The RGW will be configured with vault agent specific options.

### Risks and Mitigation
User will be able to modify the `vault-agent-cm` which is not preferable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This risk doesn't seem necessary. Only admins are expected to have access to the rook namespace, so normal users cannot modify the configmap. If a user has access to the rook namespace, they could destroy everything, so no need to call this risk out.

There are different ways RGW can authenticate with help of [vault agent](https://docs.ceph.com/en/latest/radosgw/vault/#vault-agent), but only service account authentication is supported here.

## Proposal details
The service account details will specified in `ConnectionDetails` of `KeyManagementServiceSpec`, using that info plus the other details a config map with <ceph-object-store>-vault-agent-cm populated for the vault agent sidecar container. The side container details will be added RGW pod spec and start sidecar container with RGW pod. The RGW will be configured with vault agent specific options. For the following details in `ConnectionDetails`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What image would the sidecar use?
  • What are recommended resource requests/limits?
  • How about adding a link to the vault agent docs for this?

User will be able to modify the `vault-agent-cm` which is not preferable

## Config Commands
The user can itself bring up `vault agent` as separate deployment configuring with service account authentication details. Then set following for RGW via toolbox pod:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the vault agent can be started as a separate deployment, is only one agent needed for the whole cluster? If so, should we anyway create a vault agent deployment instead of an rgw sidecar? Or why should it be a sidecar for rgw?

Copy link
Member

@BlaineEXE BlaineEXE Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question!

Also a potential consideration: can we expect users to run the agent themselves so they can have as much control over its configuration as they want/need?

Copy link
Member

@BlaineEXE BlaineEXE Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done a fair bit more research on this. I believe that Vault's model that the vault agent runs as a sidecar is merely a requirement because it assumes the agent injects secrets into a shared pod volume. If RGW is talking to vault agent directly, there is no need to run the agent as a sidecar, because there is no need to have the agent inject anything into pod shared directories.

This means we can have (and capture in the design doc) the decision to either use a sidecar or to use a standalone vault agent that is shared by all RGWs.

Using sidecars will naturally increase our resource footprint, a negative. However, the RGW can contact the sidecar on localhost within the Pod.

Using a standalone agent will decrease resources, especially for large numbers of RGWs. However, RGWs will likely be run on many nodes, and a shared agent will add additional host-to-host latency to each S3 call IIUC.


Regarding the agent injector, while it is intended to be used to inject specific secrets into Pods, I believe it has the flexibility to configure the agent as a listener as well by modifying the ConfigMap as documented here: https://www.vaultproject.io/docs/platform/k8s/injector/examples#configmap-example

We can set the listener config in the ConfigMap to have the agent sidecar automatically injected and listening on a port expected by the RGW.

In my opinion, a better design may be to allow users the flexibility to choose whether they want to run a standalone vault agent or whether they want sidecars. It would be easy to add an option to the CephObjectStore security spec that defines the address of the vault agent: spec.security.vaultAgentHostname.

If they want sidecars, then they can...

  1. set spec.security.vaultAgentHostname to localhost:<port>
  2. configure the agent listener to localhost:<port> in the configmap
  3. add the below annotations on the CephObjectStore.
      annotations:
        vault.hashicorp.com/agent-inject: 'true'
        vault.hashicorp.com/agent-configmap: 'my-configmap'

If they want to use a standalone vault agent, then they merely need to set spec.security.vaultAgentHostname to the Service that provides the vault agent.

Copy link
Contributor Author

@thotz thotz Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE @travisn: A standalone vault agent is also an option. User needs to set it up independently, as mentioned but not sure about any performance impact, for ceph it is always local to rgw server. We can add an option either CephObjectstore or in the Connection details string map. For RGW it is just an endpoint, it can be vault-server directly, vault-agent locally or separate deplotment.
After checking the examples of vault injector I still can't figure out the listener mode mentioned above. The vault injector can mount the config map of vault-agent to the pod apart from that I don't see much use case wrt the vault injector. Am I missing something??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the vault docs, listener specifies the address on which the vault agent listens. I believe this means we can (using the confimap) tell the agent that is injected to listen to RGW requests in the pod at something like localhost:6543. We can then configure vaultAgentHostname to be localhost:6543.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE: Yaa I can understand listerner with vault-agent, sorry I was confused about how it can be related to vault-injector The PR #9872 have a similar configuration or approach

@travisn travisn added this to In progress in v1.10 via automation Jul 12, 2022
@github-actions
Copy link

github-actions bot commented Aug 5, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Labeled by the stale bot label Aug 5, 2022
@thotz thotz added keepalive and removed stale Labeled by the stale bot labels Aug 9, 2022
@travisn travisn removed this from In progress in v1.10 Oct 18, 2022
@travisn travisn added this to In progress in v1.11 via automation Oct 18, 2022
@travisn travisn removed this from In progress in v1.11 Oct 18, 2022
@travisn
Copy link
Member

travisn commented Apr 4, 2023

@thotz What's the status of this PR?

@thotz
Copy link
Contributor Author

thotz commented Aug 9, 2023

Closing this PR since I am not planning to work on it and there is very little interest for the feature

@thotz thotz closed this Aug 9, 2023
@parth-gr
Copy link
Member

parth-gr commented Aug 9, 2023

I might think we can have it in,

Whatever we have now or open a issue tracking this so any one else can pick it up

@thotz
Copy link
Contributor Author

thotz commented Aug 10, 2023

If someone is interested to pick it up, we have already the design doc, proposed PR and tracker issue etc. @travisn do u way to track the features like this even if its closed, like new label or something??

@travisn
Copy link
Member

travisn commented Aug 10, 2023

If someone is interested to pick it up, we have already the design doc, proposed PR and tracker issue etc. @travisn do u way to track the features like this even if its closed, like new label or something??

A github issue should be opened if we still need to track the feature request.

@thotz thotz reopened this Mar 12, 2024
@thotz thotz force-pushed the design-rgw-k8s-sa-authentication branch from 6ccfe07 to 69d80b4 Compare March 12, 2024 14:53
@thotz
Copy link
Contributor Author

thotz commented Mar 12, 2024

@BlaineEXE @travisn please review latest verison of this PR

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thotz Can you provide more context on reopening this old design doc? Was there a request again for this feature?

@@ -0,0 +1,143 @@
---
Service authentication with vault for RGW
target-version: release-1.14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're planning on implementing it immediately for the release soon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thats the plan. The code changes for this implementation are pretty minimal. I will push the PR along with the design PR

use_auto_auth_token = true
}
listener "tcp" {
address = "127.0.0.1:8100"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

design/ceph/object/ceph-rgw-k8s-sa-authentication.md Outdated Show resolved Hide resolved
design/ceph/object/ceph-rgw-k8s-sa-authentication.md Outdated Show resolved Hide resolved
VAULT_AUTH_KUBERNETES_ROLE: rook-ceph

---
kind: ConfigMap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the user create this configmap, or does Rook generate it? We need to be clear about what the admin needs to create, and what Rook will generate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO since there are a lot of options I prefer the user to configure cm than by Rook Operator. We can provide the sample configuration(minimal configuration) as above

The design doc for supporting service account authentication for RGW
while configuring with Vault. The OSD encryption already support it.

Signed-off-by: Jiffin Tony Thottan <thottanjiffin@gmail.com>
@thotz thotz force-pushed the design-rgw-k8s-sa-authentication branch from c0e4582 to 8444479 Compare March 15, 2024 16:42
@thotz thotz requested a review from travisn March 15, 2024 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants