-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: TopicOperator removes throttled.replicas topic configuration of throttled reassignment/rebalance #9972
Comments
Hi @khartld, this is an interesting use case. Thanks for raising. We recently added an environment variable that allows to specify which topic configurations are reconciled (ALL by default), everything else is ignored, including deletions. This is applied to all topics. Example: spec:
entityOperator:
template:
topicOperatorContainer:
env:
- name: STRIMZI_ALTERABLE_TOPIC_CONFIG
value: "retention.ms,segment.bytes" I haven't tried yet, but the above configuration may work to avoid the issue. |
Discussed on the community call on 18.4.: Let's keep it for the next community call where hopefully the UTO SMEs will be present. It would be interesting to understand if the suggestion above works. But if it interferes with the |
I was able to reproduce the issue: Reconciliation #9(upsert) KafkaTopic(test/rebalance-test): Config changes [AlterConfigOp{opType=DELETE, configEntry=ConfigEntry(name=leader.replication.throttled.replicas, value=null, source=UNKNOWN, isSensitive=false, isReadOnly=false, synonyms=[], type=UNKNOWN, documentation=null)}, AlterConfigOp{opType=DELETE, configEntry=ConfigEntry(name=follower.replication.throttled.replicas, value=null, source=UNKNOWN, isSensitive=false, isReadOnly=false, synonyms=[], type=UNKNOWN, documentation=null)}] Then, I applied my suggestion to your example: echo -e "apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
version: 3.7.0
replicas: 2
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: 1
transaction.state.log.replication.factor: 1
transaction.state.log.min.isr: 1
default.replication.factor: 1
min.insync.replicas: 1
inter.broker.protocol.version: "3.7"
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 100Gi
deleteClaim: false
zookeeper:
replicas: 1
storage:
type: persistent-claim
size: 100Gi
deleteClaim: false
entityOperator:
template:
topicOperatorContainer:
env:
- name: STRIMZI_ALTERABLE_TOPIC_CONFIG
value: \"retention.ms,segment.bytes\"
topicOperator:
logging:
type: inline
loggers:
rootLogger.level: INFO
logger.top.name: io.strimzi.operator.topic
logger.top.level: DEBUG
logger.toc.name: io.strimzi.operator.topic.TopicOperator
logger.toc.level: TRACE
logger.clients.level: DEBUG
userOperator: {}
cruiseControl: {}
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: rebalance-test
labels:
strimzi.io/cluster: my-cluster
spec:
topicName: rebalance-test
partitions: 100" | kubectl create -f - As expected, I don't see any DELETE alter config operation in the TO logs, and looks like those configurations are adjusted dynamically (which also includes removal while the rebalance is still in progress). ...
Topic: rebalance-test TopicId: S-tX0dPmSG-5QyVoFATtVA PartitionCount: 100 ReplicationFactor: 1 Configs: leader.replication.throttled.replicas=17:0,17:1,25:0,25:1,min.insync.replicas=1,follower.replication.throttled.replicas=17:0,17:1,25:0,25:1,message.format.version=3.0-IV1
---
Topic: rebalance-test TopicId: S-tX0dPmSG-5QyVoFATtVA PartitionCount: 100 ReplicationFactor: 1 Configs: min.insync.replicas=1,message.format.version=3.0-IV1
---
Topic: rebalance-test TopicId: S-tX0dPmSG-5QyVoFATtVA PartitionCount: 100 ReplicationFactor: 1 Configs: leader.replication.throttled.replicas=73:0,73:1,min.insync.replicas=1,follower.replication.throttled.replicas=73:0,73:1,message.format.version=3.0-IV1
---
Topic: rebalance-test TopicId: S-tX0dPmSG-5QyVoFATtVA PartitionCount: 100 ReplicationFactor: 1 Configs: leader.replication.throttled.replicas=13:0,13:1,45:0,45:1,min.insync.replicas=1,follower.replication.throttled.replicas=13:0,13:1,45:0,45:1,message.format.version=3.0-IV1
---
Topic: rebalance-test TopicId: S-tX0dPmSG-5QyVoFATtVA PartitionCount: 100 ReplicationFactor: 1 Configs: leader.replication.throttled.replicas=47:0,47:1,81:0,81:1,min.insync.replicas=1,follower.replication.throttled.replicas=47:0,47:1,81:0,81:1,message.format.version=3.0-IV1
... So this works, but I'm now wondering if and how we should make this behavior automatic when CC integration is enabled. Give me some time to re-read these KIPs. |
But this would mean I have to whitelist all potential configurations that I want to allow to be set in KafkaTopics right? An additional blacklist option could potentially be good enough. |
Here the CO and TO reconciliations are a bit different. The former only reverts configuration changes that are different from
I think a way forward here could be to change the STRIMZI_ALTERABLE_TOPIC_CONFIG="ALL_EXCEPT:leader.replication.throttled.replicas,follower.replication.throttled.replicas" I expect this configuration to be automatically set by the Topic Operator when CC integration is enabled and the environment variable is set to default (ALL). @tombentley @scholzj wdyt? |
I think that we will need to have some way to blacklist some options like this. That said, I'm not sure what you suggested is the best format. Maybe a brand new option would be better and cleaner? |
I'm fine with introducing a new env var. Having something like |
@khartld are you planning to work on this? Otherwise I can do it. |
Maybe this should go first through the triage to make sure everyone agrees on what to do. |
Ok, let's wait then. |
I'd be happy to implement it given a specification :)
|
Sure, let's wait the next community meeting (May 2nd) to find an agreement on specs.
That's a good question. My preference would be to log and update the status with a warning. I don't see a use case where one would need to use both. The default value would be |
To make the |
Yes, but only if CC integration is enabled. |
Triaged on the community call on 2.5.2024: We should implement the following to address this:
@khartld Do you still plan to look into this? |
Yes, I will have a look at implementing it. Thanks for the specs! |
@khartld hi, what's the state of this fix? No rush, but let me know if you need any help. |
Hi @fvaleri, I attached the MR with the fix. I'm currently working on getting the docker build to work for me to test it locally |
Bug Description
When running
kafka-reassign-partitions.sh
with the--throttle
flag or running aKafkaRebalance
withreplicationThrottle
, Kafka will set these configurations on the topics that are currently moved:follower.replication.throttled.replicas: <currently moving replicas>, leader.replication.throttled.replicas: <currently moving replicas>
(KIP-542).The topic operator then picks up the configuration discrepancy and removes the configurations, which leads to unthrottled replica movements.
I would like to create a PR for this if you agree with the expected behavior.
Steps to reproduce
DEBUG
logging for topic operatorKafkaTopic
KafkaRebalance
in which you remove a broker and setreplicationThrottle
(see Additional Context)KafkaRebalance
DEBUG [LoopRunnable-0] BatchingTopicController:987 - Reconciliation #7(upsert) KafkaTopic(kafka/rebalance-test): Config changes [AlterConfigOp{opType=DELETE, configEntry=ConfigEntry(name=follower.replication.throttled.replicas, value=null, source=UNKNOWN, isSensitive=false, isReadOnly=false, synonyms=[], type=UNKNOWN, documentation=null)}, AlterConfigOp{opType=DELETE, configEntry=ConfigEntry(name=leader.replication.throttled.replicas, value=null, source=UNKNOWN, isSensitive=false, isReadOnly=false, synonyms=[], type=UNKNOWN, documentation=null)}]
Expected behavior
I guess the preferred behavior would be to ignore the topic configurations
follower.replication.throttled.replicas
andleader.replication.throttled.replicas
if they are set on the broker, but not explicitly specified in theKafkaTopic
resource spec.If they are explicitly defined, the topic operator might be expected to overwrite them again.
Strimzi version
0.39.0, 0.40.0
Kubernetes version
v1.27.3
Installation method
Helm chart
Infrastructure
Kind, Bare-metal
Configuration files and logs
Additional context
The text was updated successfully, but these errors were encountered: