You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Scaling of KRaft controller-only nodes currently doesn't seem to work. Below are described the situations and issues:
Scale-up
The scale-up seems to currently work like this:
The controller node pool is scaled up
First, the new controller nodes are started with the new voter configuration
Next, the old controller nodes are rolled with the new voter configuration
As the old controller nodes are rolled, the leader-controller is shifting and it seems it typically ends up on one of the new nodes.
At that point, all broker nodes seem to fail with the following error as they are not yet rolled and do not know the new leader controller.
2023-12-04 14:16:50,669 ERROR Encountered fatal fault: Unexpected error in raft IO thread (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler) [kafka-2001-raft-io-thread]
java.lang.IllegalStateException: Cannot transition to Follower with leaderId=4 and epoch=48 since it is not one of the voters [0, 1, 2]
at org.apache.kafka.raft.QuorumState.transitionToFollower(QuorumState.java:382)
at org.apache.kafka.raft.KafkaRaftClient.transitionToFollower(KafkaRaftClient.java:522)
at org.apache.kafka.raft.KafkaRaftClient.maybeTransition(KafkaRaftClient.java:1575)
at org.apache.kafka.raft.KafkaRaftClient.maybeHandleCommonResponse(KafkaRaftClient.java:1532)
at org.apache.kafka.raft.KafkaRaftClient.handleFetchResponse(KafkaRaftClient.java:1113)
at org.apache.kafka.raft.KafkaRaftClient.handleResponse(KafkaRaftClient.java:1609)
at org.apache.kafka.raft.KafkaRaftClient.handleInboundMessage(KafkaRaftClient.java:1735)
at org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:2310)
at kafka.raft.KafkaRaftManager$RaftIoThread.doWork(RaftManager.scala:64)
at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:130)
After rolling all the controllers, the operator rolls the brokers to introduce the new voter configuration and gets them working properly again
This seemed to work for me this was several times for scaling from 3->5 and 3->4->5.
I guess we could improve this by adding the new nodes after rolling the brokers? E.g.:
Do the rolling => roll the controllers to expect more voters and roll the brokers to expect more voters
Add the new nodes
If that is done in steps without breaking the quorum, it will likely work without causing the error in the broker nodes?
The scaling up of mixed nodes seems similar. But because the brokers are controllers as well, they do not error as with dedicated controller nodes.
Scale-down
Scale-down currently works like this (in theory only):
Remove the controllers to be scaled down
Roll the remaining controllers to reconfigure the voters
Roll the brokers to reconfigure the voters
Scale-down from 5->4 nodes seems to work fine by following the steps above as it does not break the quorum. However, scale-down from 4->3 nodes will break the quorum:
You remove the 4th controllers
You need to roll the remaining 3 controllers
But they are configured with 4 voters so quorum needs 3 nodes online and as we have only 3 nodes left, we cannot shutdown any of them to roll it / update the voters to 3.
As a result, it gets stuck with the following error:
2023-12-04 17:13:06 WARN KafkaQuorumCheck:98 - Reconciliation #1609(timer) Kafka(myproject/my-cluster): No valid lastCaughtUpTimestamp is found for controller 3
Scale-down of mixed nodes seems to go similarly. It looks like there is a race condition and sometimes the deleted node is still seen as in-sync and the first node is rolled. But it gets stuck on the next in that case there the controller 3 has again no valid timestamp.
Would removing the controllers only at the end help? Can the KafkaRoller force this despite breaking the quorum when there is a scale-down?
Next steps
Implementing these things could be quite complicated (even assuming the delayed creation / removal of the scaled nodes really works as I did not test it). Is it required for the GA of KRaft in Strimzi? Should we wait for KIP-853 to be implemented that might allow us to change the controller quorums dynamically?
The text was updated successfully, but these errors were encountered:
I agree, we should wait for a proper fix in Kafka upstream. Making it working with workaround and/or manual intervention is more than an hack, losing all the automation that the operator should provide in a such operation like scaling up/down.
Is it required for the GA of KRaft in Strimzi?
Controllers are replacing ZooKeeper nodes. I was wondering how many users need to scale up/down a ZooKeeper ensemble today. I don't think that many. Of course, right now you can scale ZooKeeper if needed. Without KIP-853 you could not scale controllers properly, so it would be a not parity on features. If by GA we want to provide features parity we should wait otherwise I am fine to go GA without it.
Scaling of KRaft controller-only nodes currently doesn't seem to work. Below are described the situations and issues:
Scale-up
The scale-up seems to currently work like this:
At that point, all broker nodes seem to fail with the following error as they are not yet rolled and do not know the new leader controller.
This seemed to work for me this was several times for scaling from 3->5 and 3->4->5.
I guess we could improve this by adding the new nodes after rolling the brokers? E.g.:
If that is done in steps without breaking the quorum, it will likely work without causing the error in the broker nodes?
The scaling up of mixed nodes seems similar. But because the brokers are controllers as well, they do not error as with dedicated controller nodes.
Scale-down
Scale-down currently works like this (in theory only):
Scale-down from 5->4 nodes seems to work fine by following the steps above as it does not break the quorum. However, scale-down from 4->3 nodes will break the quorum:
As a result, it gets stuck with the following error:
Scale-down of mixed nodes seems to go similarly. It looks like there is a race condition and sometimes the deleted node is still seen as in-sync and the first node is rolled. But it gets stuck on the next in that case there the controller 3 has again no valid timestamp.
Would removing the controllers only at the end help? Can the
KafkaRoller
force this despite breaking the quorum when there is a scale-down?Next steps
Implementing these things could be quite complicated (even assuming the delayed creation / removal of the scaled nodes really works as I did not test it). Is it required for the GA of KRaft in Strimzi? Should we wait for KIP-853 to be implemented that might allow us to change the controller quorums dynamically?
The text was updated successfully, but these errors were encountered: