New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support dynamic reloading of TLS files for raft and http protocols #1554
Comments
I was thinking about this one and I think a simpler implementation would be to replace population of the tls.Config.Certificates field with GetCertificate and GetClientCertificate callbacks, which point to a function that reads the cert/key from disk, caching it for a period of time (30 seconds or so). This means the cert loading happens in-band of the connection, and this should work equally well for both egress and ingress TLS connections. I'll do up a PR. |
What is the standard approach, isn't responding to |
All manner of things are harder test when systems do async things like expiring caches. For example, what if one of the refreshes from disk fails? How do we inform the user? Continually log every 30 seconds? It gets messy. |
It tends to be a mix between dynamic reloading and SIGHUP. I see a couple options here:
In both cases though replacing tls.Config.Certificates with the callbacks would be needed. Let me know what your preference is. |
A goroutine that os.Stat polls the file every second or so would be acceptable. That way I can control testing more precisely -- touch the file. Ideally the poll period is configurable, at the unit test level (it would not need to be user-configurable). That way we can set the poll period to, say, 50ms, to make testing faster. If you do decide to add this functionality, I would suggest writing a generic io.Stat-based poller, for any file, which takes some sort of generic callback or supplies a channel. Then we unit-test that, and only then integrate it with the Certs stuff. Roughly the same interface as the fsnotify module. I would prefer not to drag in a whole new module for something that seems easy enough to implement just using the standard library. You could put it in |
Ack. My instinct right now is a callback, since with a channel something ought to be on the receiving end of that continuously, but Though of course the supplied callback could do that too. I'll start with the callback.
Using the kernel for file notifications is much faster and more efficient, but, yeah, when you're polling a single digit number of files per second it's not going to make any difference. Context switching is a rounding error, and VFS cache will prevent any increase in physical disk I/O.
Putting the code that implements the callbacks for tls.Config which uses the polling file monitor makes sense. (AFAICT the monitor would be setup from The node cert/key gets passed to these functions multiple times, so either we live with polling the same file multiple times (from different goroutines), or maintain some sort of global state, probably in the form of a singleton of a struct holding the file monitor state. There's nothing TLS-specific about such a file monitor though. And in fact it would make sense to use it for #1553 if you consider it worth implementing. It's more of a common utility function/struct. Would you still want to put it in rtls/file if it's used to watch the credentials file? I can sketch out something in a PR and then you can decide if you want to tweak it, gut it, or toss it. |
True, if we have a use for it already, maybe it's a top-level module. Or maybe we pull in some other module, but I prefer to pull high-quality, well-tested code only if I can help it. Yes, I would be interested in a PR, let's see what you have in mind. |
Certificate rotations are a standard operational requirement. Today, certificate rotations require a rolling restart of the cluster.
It's increasingly common for certificates to be issued through automated means with relatively short validity periods to mitigate certificate compromises (because revocation in PKI is notoriously broken). For example, in Kubernetes, one can use cert-manager to manage generation and rotation of certificates. An operator may choose to use, for example, Let's Encrypt for the the client-facing HTTP port, which currently issues certificates with 90 day lifetimes (and has said they may consider ratcheting down the lifetime in future). Inter-node certificates could be generated using, say, GCP CAS or AWS ACM PCA, or even cert-manager's native CA issuer.
In all these cases, the certificate rotation is fully automated, and updated PEM files are automatically swapped into the running rqlite pods (typically by means of a mounted Kubernetes secret).
This FR is a small quality-of-life enhancement to offload certificate management to outside orchestration such as cert-manager, for both the HTTP API and internode communication.
As with #1553, this could be done using fsnotify or polling mtime every few seconds, and should follow symlinks.
The text was updated successfully, but these errors were encountered: