-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[5756 Identity Management Overhaul] Rewrite sentry to authenticate clients before signing #6171
Conversation
6dcf061
to
bb43bcf
Compare
Codecov Report
@@ Coverage Diff @@
## master #6171 +/- ##
==========================================
- Coverage 65.02% 64.52% -0.51%
==========================================
Files 216 226 +10
Lines 19705 20082 +377
==========================================
+ Hits 12814 12957 +143
- Misses 5853 6069 +216
- Partials 1038 1056 +18
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my first round of review, although there are still a few files missing.
One question is on why we are still using both a root CA cert and and intermediate CA cert. Can't we remove one from the chain, and just use the root CA?
Also, since this is a major rewrite and reviewing this PR is not super easy (because the diff isn't much helpful), would you be available for a review during a call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first set of comments.
panic(err) | ||
} | ||
if err := corev1.AddToScheme(scheme); err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are there panics during init? what would trigger this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An error here would signal that something is fundamentally broken in our APIs, or those imported. We should catch these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still run time panic, this is not going to be during dev alone right? Or is it expected that this will help catch issues during dev?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this will catch issuing during dev. If this errors, the only thing we can ever do it terminate the program with an error anyway.
Thanks for the review @ItalyPaleAle!
Could do in future, but since this is what we are doing already today we should keep doing that until we are in a future version, past needing to rely on any old PKI. Remember that all components are currently consuming the issuing certificate and private key and using that for serving and clients. Changing up the chain structure will add to the backwards/forwards compat complexity. Ideally, we should not care what PKI setup Dapr is using so long as it follows correct PKI.
Yes, definitely makes sense to do this! 🙂 |
@JoshVanL what will be the effect of these sentry changes wrt the CLI renew certificate command? |
The problem with the current approach is that because we throw away the root key, we can not renew the root CA certs. 10 years is a long time (and is probably too long), and yet that's how we got the Y2K and Y2K38 problems :) Since we're making so drastic changes to Sentry, I think this would be the perfect time to remove this choice, which IMHO doesn't even make much sense (I get protecting the root CA key, but if the intermediate CA can sign certificates, this makes no difference at the end of the day). |
@ItalyPaleAle Completely agree- lets do this in a followup PR. I would like to see a better option set for sentry so the user can tune these things (with more sane defaults). This is also where we can introduce external CAs. |
} | ||
|
||
// SentryID returns the SPIFFE ID of the sentry server. | ||
func SentryID(sentryTrustDomain, sentryNamespace string) (spiffeid.ID, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like Spiffe Id for Sentry, do we need different Spiffe Ids for other control plane services as well, for Operator, Injector and Placement; as they may be also considered different workloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
app: dapr-sentry | ||
{{- range $key, $value := .Values.global.k8sLabels }} | ||
{{ $key }}: {{ tpl $value $ }} | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to enable encryption or provide an option for that? https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/#configuration-and-determining-whether-encryption-at-rest-is-already-enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EncryptionConfiguration
is a Kubernetes API Server configuration option which as a Kubernetes application have no control over (i.e. not a Dapr concern). This option would be configured by the cluster administrators or the cloud provider for managed Kubernetes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. It helps.
Just for my understanding: So, when we say that Trust Anchor can be accessed from ConfigMap using volume mount, so can any non-Dapr process inside Pod be also able to access this volume and use this Trust Anchor to get itself Authenticated from Sentry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This trust anchor is used by clients (operator, placement, injector) to verify the server (sentry) connection. We use a ConfigMap here because:
- we don't want any component, apart from Sentry, to be mounting the intermediate certificate and private key,
- ConfigMap is a better resource type in this scenario because the TrustAnchor is not secret data,
- In future when we implement other CA signing sources (such as cert-manager), the intermediate Secret may never exist, so using dedicated ConfigMap for the TrustAnchor means that will always exist regardless of the signing/intermediate CA provider.
Daprd gets this TrustAnchor from the injector, set as a environment variable, which itself gets it from this ConfigMap. There may be an argument for implementing a controller to distribute the TrustAnchor to all (dapr enabled) namespaces as well so that daprd
s are able to update their trust store without being restarted (environment variables cannot be updated in place, whereas file mounts can).
We likely want to move to using ClusterTrustBundles
once all supported Kubernetes versions serve this API.
Sure thing, more than happy to make this change! |
6b56f62
to
891614b
Compare
This pull request has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
9f43e56
to
48afbae
Compare
2fbc131
to
d6e8a33
Compare
@JoshVanL Good news :)
|
/test |
/test-version-skew |
prevent panic Signed-off-by: joshvanl <me@joshvanl.dev>
Signed-off-by: joshvanl <me@joshvanl.dev>
configuration crd Signed-off-by: joshvanl <me@joshvanl.dev>
Signed-off-by: joshvanl <me@joshvanl.dev>
011eecd
to
2fae31e
Compare
/ok-to-test |
Dapr E2E testCommit ref: 2fae31e ✅ Build succeeded for linux/amd64
✅ Infrastructure deployed
✅ Build succeeded for windows/amd64
✅ Tests succeeded on linux/amd64
✅ Tests succeeded on windows/amd64
|
/test-sdk-all |
Dapr SDK Python testCommit ref: 2fae31e ✅ Python SDK tests passed |
Dapr SDK Java testCommit ref: 2fae31e ❌ Java SDK tests failedPlease check the logs for details on the error. |
Dapr SDK Go testCommit ref: 2fae31e ✅ Go SDK tests passed |
Dapr SDK JS testCommit ref: 2fae31e ❌ JS SDK tests failedPlease check the logs for details on the error. |
/test-sdk-all |
Dapr SDK Go testCommit ref: c4e440c ✅ Go SDK tests passed |
Dapr SDK Python testCommit ref: c4e440c ✅ Python SDK tests passed |
Dapr SDK Java testCommit ref: c4e440c ❌ Java SDK tests failedPlease check the logs for details on the error. |
Dapr SDK JS testCommit ref: c4e440c ✅ JS SDK tests passed |
This PR is breaking service invocation and tracing ITs in Java SDK. |
@artursouza those Java failing tests look unrelated to this PR. Here is a similar failure which was executed from the SDK test schedule https://github.com/dapr/dapr/actions/runs/5686772228 |
@JoshVanL This will be a release blocker. |
resp, err := c.SignCertificate(context.Background(), | ||
&sentryv1pb.SignCertificateRequest{ | ||
CertificateSigningRequest: certPem, | ||
CertificateSigningRequest: csrPem, | ||
Id: getSentryIdentifier(id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This intentionally not just the app ID? (Instead, being SENTRY_LOCAL_IDENTITY which can be > 64 characters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @a-shoemaker, indeed it has been done to ensure backwards compatibility with a 1.11 sentry server. The intention is to use the app ID in 1.13.
// is wrong- dapr identities are based on daprd namespace + _app ID_. | ||
// Remove this allowance in v1.12. | ||
if req.Namespace+":"+pod.Spec.ServiceAccountName == req.Id { | ||
req.Id = expID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you validated the req.ID is < 64 characters before this even though it wasn't the app ID. Which subsequently seems to make 1.12 unusable if you have any namespaces + app ID's of much length at ll.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @a-shoemaker, indeed this looks like a bug in 1.12 which needs to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a-shoemaker I have opened up an issue here to track.
Part of #5756
This is the first in a chain or PRs to complete this issue. The following PRs will implement the security changes for each of the other components.
This rewrites the sentry service to now authenticate clients before signing for their identity in a backwards compatible way.
It introduces a new
pkg/security
package. This package is designed to manage the components SPIFFE ID (fetched from sentry), and expose connection options for authentication and authorizing servers.The sentry flag
--token-audience
has now been deprecated. The audience of the kubernetes token used by clients should always be the SPIFFE ID of sentry (not implemented in this PR). For backwards compatibility, sentry will still acceptdapr.io/sentry
which should be remove inv1.13
.Sentry will ensure clients request their app ID. Sentry will still accept clients sending a CSR request with the ID set to
$namespace:$sa
, however this exception should be removed in1.13
.Sentry will still send the trust anchors in the sign response, but should be removed in
1.13
as trust distribution should happen out of band of signing.Sentry will continue to include
cluster.local
as a DNS SAN on its serving cert, since prev1.12
client authorize on this. This should be removed inv1.13
Sentry will also continue to include
cluster.local
as DNS SAN on the issuer certificate since this is used by the control plane components as serving certificates. Also to be removed inv1.13
.global.mtls.controlPlaneTrustDomain
is a new top level helm chart to set the control plane trust domain. Currently only read by sentry.The root CA is now created by sentry (no longer helm) and written to the
dapr-trust-bundle
secret. It also write the trust anchors to the config mapdapr-trust-bundle
. The control plane components will read the trust anchors from this ConfigMap as a volume mount.Updates sentry validators:
mtls.tokenValidators
option to the "daprsentry" Configuration resource to configure additional validators. This is an array that contains that contains:mtls.tokenValidators[*].name
: name of the validator.mtls.tokenValidators[*].options
: optional dictionary of key-values that are options that can be passed to validator.jwks
validator that validates .source
is the location of a JWKS that contains the public keys to validate the keys. This can be one of:sub
must be the SPIFFE ID of the app, for examplespiffe://localhost/ns/default/myapp
aud
must contain the SPIFFE ID of Sentry, for examplespiffe://localhost/ns/dapr-system/dapr-sentry
daprd
can load a token to send to Sentry that is signed with an arbitrary key. These are defined using:DAPR_SENTRY_TOKEN_FILE
DAPR_
for security reasons, since those are not accessible in the localenv secret store)