Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: hot reload Kafka on changes in brokerCertChainAndKey instead of a rolling update #9994

Open
vpedosyuk opened this issue Apr 18, 2024 · 5 comments

Comments

@vpedosyuk
Copy link

Related problem

From the docs:

When the certificate or key in the brokerCertChainAndKey secret is updated, the operator will automatically detect it in the next reconciliation and trigger a rolling update of the Kafka brokers to reload the certificate.

In an environment where a Kafka broker restart is very undesirable, it becomes hard to keep external TLS certificates short-lived (e.g. 24 hours with a 3rd-party PKI) because each change of certificates will cause a Kafka restart and usually a downtime.

In general, it'd be great to have as few reasons for a broker restart as possible.

Suggested solution

Once a Kubernetes secret referenced in brokerCertChainAndKey got changed, Strimzi Operator will dynamically replace old certificates with the new ones without restarting the brokers.

Alternatives

A proper HA configuration might reduce the effects of such restarts but it's not always possible.

Additional context

It seems like Kafka itself supports hot-swapping of certificates.

@scholzj
Copy link
Member

scholzj commented Apr 18, 2024

Isn't this already tracked in some other issue? In any case, it should be kept in mind that:

  • Improved support for reloading certificates without any major limitations such as DN changes was added only in Kafka 3.7.0. So it is not easy to implement this while supporting Kafka 3.6.x.
  • Updating the configuration is only one part of the problem as you also need to be able to prepare / load the certificates on the fly.
  • In reality, the benefits will be also limited as there are parts where TLS certificates might not be reloadable even with Kafka 3.7.0 (for example because they are part of a plugin configurations). So in many setups, the rolling updates will be still needed.

I do not want to make it sound like this is not worth the effort -> just pointing out that this is not as simple as it might sound and has some obstacles. (I actually wrote the KIP-978 in Kafka exactly for this purpose, it just takes a long time to bubble through)

@vpedosyuk
Copy link
Author

@scholzj yes, I've seen your KIP, thanks. In our case SAN and DN remain unchanged, the only thing that changes is expiration time, which is a common case for certificates renewal I believe.

@scholzj
Copy link
Member

scholzj commented Apr 18, 2024

The problem is that unless you can change it all the time, it is basically not feasible because of the complexity. So that is why that KIP is important as it should allow to use it all the time (for the Kafka parts at least).

@vpedosyuk
Copy link
Author

Understood. Anyways, thank you for your efforts!

P.S. I couldn't find a similar issue reported here, hence, created one.

@scholzj
Copy link
Member

scholzj commented Apr 18, 2024

Discussed on the community call on 18.4.2024: Should be kept and implemented. A proposal will be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants