Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add build-in TLS support #3368

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
137 changes: 137 additions & 0 deletions rfcs/0000-build-in-tls-support/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# RFC-NNNN Built-in TLS support for Source and Notification Controllers

**Status:** provisional

**Creation date:** 2022-12-05

**Last update:** 2022-12-05

## Summary

Create built-in TLS support for both Source and Notification controllers,
so that in-cluster cross-component communications are always encrypted.

## Motivation

Due to internal company policies, Data Protection laws and regulations,
Flux users may be required to enforce encryption of data whilst in-transit.
As of `v0.37.0`, there is no built-in way to provide TLS for Flux' endpoints.

The existing endpoints being:

1. Source Controller: artifacts storage, used by Kustomize and HelmRelease
controllers.
2. Notification Controller: events receivers, used by all other Flux components
to submit events.
3. Notification Controller: webhooks for external services to contact Flux.

The webhooks endpoing would be typically used for external use only, and in
that use case the recommended approach is to enable TLS at ingress.
However, this proposal covers the first two endpoints to allow always encrypted
in-cluster communications.

### Goals

- Add TLS support for Notification and Source controllers.
- Establish a migration process when enabling TLS on existing clusters.
- Define best practices on enabling the feature.

### Non-Goals

- Change how CA trust is established across Flux containers.
- Define TLS rotation mechanisms.

## Proposal
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the proposed architecture be compatible with SPIFFE/SPIRE? I can envisage a scenario where I would like to manage identity using SPIFFE SVIDs delivered via SPIRE rather than via Kubernetes TLS secrets. In practice I think this would mean adding a unix domain socket flag to the controllers with an accompanying listener to receive SVID bundles from the SPIRE agent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question @phoban01. I see the built-in TLS support as orthogonal to the support of SPIFFE/SPIRE. The idea here is that users would be able to provision Flux with secure comms without requiring any additional components.

On that point, some service meshes should be able to implement that (mTLS based on SPIFFE/SPIRE) transparently on top of Flux, without any changes required to the source code. However, if there are good use cases for built-in support, that should be a RFC on its own right.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spire support is nice, but will require for the Spire itself to be managed outside Flux. I think using native PKI is a good way of avoiding this as mentioned in the Alternatives section.


Controllers hosting web servers will introduce a new flag to configure the
[TLS secret]: `--tls-secret`. The Secret type must be `kubernetes.io/tls`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why take dependency on a specific K8s secret hosting approach directly in the components? Can you please use a file reference instead. Then it will be easier to support other secret hosting approaches such as cloud key vaults.
For your information in our enterprise environments hosting any kind of secrets as K8s secrets is prohibited.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you using Flux then? All our APIs rely on Kubernetes Secrets, you can’t connect to Git/OCI/Helm repositories unless you store the auth in a secret.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hosting any kind of secrets as K8s secrets is prohibited

So you're using Secrets Store CSI Driver and mounting them directly into containers? I see why this might be a reasonable policy but as Stefan said, Flux doesn't work without Secrets at the moment and likely never will. Otherwise you'd have to mount all the Secrets any user might ever need into the Flux controller containers.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@makkes - correct, we use Secret CSI drivers, or you can create a wrapper/starter script to launch the binaries and this starter script pulls the secrets from key vaults and places them on tempfs volumes.

@stefanprodan - based on the documentation I see that Flux can use cloud identities to access artifact storages. We can still use K8s secrets as a config store for not secret material - such as public keys, application IDs, etc. But we cannot use K8s secrets for storing secret material like private keys.
Most Kubernetes components consume secrets as file references - then if someone wants to use K8s secrets to store secrets they can always mount them as files in the volume.

Consuming secret material directly from K8s secret resources is a gap that must be fixed for Flux to be used in different deployments that rely on consuming secrets as files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consuming secrets as files breaks Flux multi-tenancy. We rely on Kubernetes RBAC to isolate tenant, reading files from the container FS is not subject to RBAC while reading secrets from the API is. Have you considered enabling encryption at rest for etcd?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be confused here. But wouldn't the TLS k8s secret also mounted as a file in the pod. The certwatcher project suggested allows file based certs only.
https://github.com/kubernetes-sigs/controller-runtime/blob/a784ee78d04bdf216006adfbc313328a0aded88a/pkg/certwatcher/certwatcher.go#L36

Thus not sure how giving an option for a TLS on file-based key inside the pod can be less secure as even with secrets we need to do the same.

I understand going default with secret to not add any additional dependencies for the default secure path. But if the customer is willing to do the work to use a Secret CSI drivers to mount a file directly, shouldn't that be allowed.

Copy link
Member

@makkes makkes Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Secrets aren't mounted into the Flux Pods but read into memory through the Kubernetes API at runtime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a flag for pointing to a cert on disk, then it's up to the Flux admin to figure out the mounts and so on. With this flag, we'll read the cert at startup only, the Flux admin can bump some pod annotation in Git to restart the controllers to reload the cert. This flag e.g. --tls-file should be mutually exclusive with --tls-secret, if both are set, the controllers will panic and crash loop.

Copy link

@anagg929 anagg929 Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks , this should resolve the cases where admin want to mount the cert on their own.
A bit of query, why not use use the certwatcher on the --tls-file also to avoid the restart ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use use the certwatcher on the --tls-file also to avoid the restart ?

Ah I haven't looked into that, if it supports it, then sure.


Invalid secret types, or content, will cause the controller to fail to start.

### Source Controller

When TLS is enabled, the File Storage endpoint will serve both HTTP and HTTPS.
Existing source objects previously reconciled won't be patched to a new
`.status.artifact.url`, but all new reconciliations will cause that field to
start with `https://`.

The HTTP endpoint will only serve a permanent redirect (301) to the HTTPS endpoint,
enabling a smooth transition when users enable this feature in existing clusters.

The HTTPS endpoint will be served on port `9443` by default, which will be
configurable via new flag `--storage-https-addr` or environment variable
`STORAGE_HTTPS_ADDR`. The existing `source-controller` service will bind port
`443` to `9443`.

### Notification Controller

When TLS is enabled, the HTTP endpoints will be disabled.

#### Events endpoint

The events HTTPS endpoint will be served on port `9443` by default, which will be
configurable via new flag `--events-https-addr`. The existing `notification-controller`
service will bind port `443` to `9443`.

All flux components will need to be started with the (existing) flag below, for the
TLS only endpoint to be used:
`--events-addr=https://notification-controller.flux-system.svc.cluster.local./`

### User Stories

<!--
Optional if existing discussions and/or issues are linked in the motivation section.
-->

### Alternatives

#### Delegate in-trasit data encryption to CNI or Service Mesh

Instead of built-in support, it was considered the option of delegating this to
external components that support the enablement of end-to-end encryption via TLS
or mTLS. One of the issues with this approach is that the data would still come
out of Flux unencrypted, and therefore privileged co-located components in the
same worker node could be privy to the exchanged plain-text.

Another point considered for not pursuing this alternative was that, being one of
the first workloads inside a cluster, Flux should be self-reliant in assuring the
confidentiality and integrity of its communications.

#### Auto-detection of TLS events endpoint

Instead of having to change the `--event-addr` on each Flux component, an auto
detection mechanism could be implemented to prefer the TLS endpoint whenever the
notification controller's service has one working.

The challenge with such approach is that the security of cross component
communication would be non deterministic from a client perspective, and would depend
on the Notification controller having an active TLS endpoint at the time in which each
Flux component started. The current approach requires additional configuration,
but fails safe.

## Design Details

### TLS Versions and Ciphers Supported

For reduced complexity and increased security, only TLS 1.3 is supported.

### Detect TLS Certificate changes

Controllers will watch the TLS secret for changes, once detected it will trigger a
pod restart to refresh the certificates being used.
Comment on lines +120 to +121
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/certwatcher/certwatcher.go and reload the cert without a pod restart. Especially for source-controller, a restart is very expensive and should be avoided.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea. This would cause all mid-flight requests to be dropped, which is a minor impact compared to rebuilding the storage. Besides, the mid-fight requests would be automatically retried within seconds anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely that will be a considerable saving over the expensive restarts.


### Expand TLS support for the webhooks endpoint

The TLS support could be expanded to the Notification controller webhooks endpoint
to enable secure in-cluster communications.
Comment on lines +125 to +126
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue I see with supporting this is the hybrid scenario in which we need to support in-cluster and external (via ingress). In such scenarios, the ingress would need to trust the CA used by Flux, which can be problematic from a security stand-point.


## Implementation History

<!--
Major milestones in the lifecycle of the RFC such as:
- The first Flux release where an initial version of the RFC was available.
- The version of Flux where the RFC graduated to general availability.
- The version of Flux where the RFC was retired or superseded.
-->

[TLS secret]: https://kubernetes.io/docs/concepts/configuration/secret/#tls-secrets