Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: TopicOperator removes throttled.replicas topic configuration of throttled reassignment/rebalance #9972

Open
khartld opened this issue Apr 16, 2024 · 18 comments
Labels

Comments

@khartld
Copy link

khartld commented Apr 16, 2024

Bug Description

When running kafka-reassign-partitions.sh with the --throttle flag or running a KafkaRebalance with replicationThrottle, Kafka will set these configurations on the topics that are currently moved: follower.replication.throttled.replicas: <currently moving replicas>, leader.replication.throttled.replicas: <currently moving replicas> (KIP-542).

The topic operator then picks up the configuration discrepancy and removes the configurations, which leads to unthrottled replica movements.

I would like to create a PR for this if you agree with the expected behavior.

Steps to reproduce

  1. Create cluster with 2 brokers, cruise control and enable DEBUG logging for topic operator
  2. Create some test KafkaTopic
  3. Create KafkaRebalance in which you remove a broker and set replicationThrottle (see Additional Context)
  4. Approve KafkaRebalance
  5. Wait until Cruise Control starts rebalance
  6. Observe in topic operator logs:
    DEBUG [LoopRunnable-0] BatchingTopicController:987 - Reconciliation #7(upsert) KafkaTopic(kafka/rebalance-test): Config changes [AlterConfigOp{opType=DELETE, configEntry=ConfigEntry(name=follower.replication.throttled.replicas, value=null, source=UNKNOWN, isSensitive=false, isReadOnly=false, synonyms=[], type=UNKNOWN, documentation=null)}, AlterConfigOp{opType=DELETE, configEntry=ConfigEntry(name=leader.replication.throttled.replicas, value=null, source=UNKNOWN, isSensitive=false, isReadOnly=false, synonyms=[], type=UNKNOWN, documentation=null)}]
  7. You can also check the topic configuration in Kafka to see that they are not set

Expected behavior

I guess the preferred behavior would be to ignore the topic configurations follower.replication.throttled.replicas and leader.replication.throttled.replicas if they are set on the broker, but not explicitly specified in the KafkaTopic resource spec.

If they are explicitly defined, the topic operator might be expected to overwrite them again.

Strimzi version

0.39.0, 0.40.0

Kubernetes version

v1.27.3

Installation method

Helm chart

Infrastructure

Kind, Bare-metal

Configuration files and logs

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    version: 3.7.0
    replicas: 2
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      default.replication.factor: 1
      min.insync.replicas: 1
      inter.broker.protocol.version: "3.7"
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 100Gi
        deleteClaim: false
  zookeeper:
    replicas: 1
    storage:
      type: persistent-claim
      size: 100Gi
      deleteClaim: false
  entityOperator:
    topicOperator:
      logging:
        type: inline
        loggers:
          rootLogger.level: INFO
          logger.top.name: io.strimzi.operator.topic
          logger.top.level: DEBUG
          logger.toc.name: io.strimzi.operator.topic.TopicOperator
          logger.toc.level: TRACE
          logger.clients.level: DEBUG
    userOperator: {}
  cruiseControl: {}
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: rebalance-test
  labels:
    strimzi.io/cluster: my-cluster
spec:
  topicName: rebalance-test
  partitions: 100

Additional context

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaRebalance
metadata:
  name: rebalance-test
  labels:
    strimzi.io/cluster: my-cluster
spec:
  mode: remove-brokers
  brokers: [1]
  replicationThrottle: 10
@fvaleri
Copy link
Contributor

fvaleri commented Apr 18, 2024

Hi @khartld, this is an interesting use case. Thanks for raising.

We recently added an environment variable that allows to specify which topic configurations are reconciled (ALL by default), everything else is ignored, including deletions. This is applied to all topics.

Example:

spec:
  entityOperator:
    template:
      topicOperatorContainer:
        env:
          - name: STRIMZI_ALTERABLE_TOPIC_CONFIG
            value: "retention.ms,segment.bytes"

I haven't tried yet, but the above configuration may work to avoid the issue.

@scholzj
Copy link
Member

scholzj commented Apr 18, 2024

Discussed on the community call on 18.4.: Let's keep it for the next community call where hopefully the UTO SMEs will be present. It would be interesting to understand if the suggestion above works. But if it interferes with the KafkaRebalance resource, we might also need some default solution.

@fvaleri
Copy link
Contributor

fvaleri commented Apr 19, 2024

I was able to reproduce the issue:

Reconciliation #9(upsert) KafkaTopic(test/rebalance-test): Config changes [AlterConfigOp{opType=DELETE, configEntry=ConfigEntry(name=leader.replication.throttled.replicas, value=null, source=UNKNOWN, isSensitive=false, isReadOnly=false, synonyms=[], type=UNKNOWN, documentation=null)}, AlterConfigOp{opType=DELETE, configEntry=ConfigEntry(name=follower.replication.throttled.replicas, value=null, source=UNKNOWN, isSensitive=false, isReadOnly=false, synonyms=[], type=UNKNOWN, documentation=null)}]

Then, I applied my suggestion to your example:

echo -e "apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka         
metadata:
  name: my-cluster
spec:
  kafka:                          
    version: 3.7.0
    replicas: 2
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      default.replication.factor: 1
      min.insync.replicas: 1
      inter.broker.protocol.version: "3.7"
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 100Gi
        deleteClaim: false
  zookeeper:
    replicas: 1
    storage:
      type: persistent-claim
      size: 100Gi
      deleteClaim: false
  entityOperator:    
    template:
      topicOperatorContainer:
        env:
          - name: STRIMZI_ALTERABLE_TOPIC_CONFIG
            value: \"retention.ms,segment.bytes\"
    topicOperator:
      logging:
        type: inline
        loggers:
          rootLogger.level: INFO
          logger.top.name: io.strimzi.operator.topic
          logger.top.level: DEBUG
          logger.toc.name: io.strimzi.operator.topic.TopicOperator
          logger.toc.level: TRACE
          logger.clients.level: DEBUG
    userOperator: {}
  cruiseControl: {}
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: rebalance-test
  labels:
    strimzi.io/cluster: my-cluster
spec:
  topicName: rebalance-test
  partitions: 100" | kubectl create -f -

As expected, I don't see any DELETE alter config operation in the TO logs, and looks like those configurations are adjusted dynamically (which also includes removal while the rebalance is still in progress).

...
Topic: rebalance-test	TopicId: S-tX0dPmSG-5QyVoFATtVA	PartitionCount: 100	ReplicationFactor: 1	Configs: leader.replication.throttled.replicas=17:0,17:1,25:0,25:1,min.insync.replicas=1,follower.replication.throttled.replicas=17:0,17:1,25:0,25:1,message.format.version=3.0-IV1
---
Topic: rebalance-test	TopicId: S-tX0dPmSG-5QyVoFATtVA	PartitionCount: 100	ReplicationFactor: 1	Configs: min.insync.replicas=1,message.format.version=3.0-IV1
---
Topic: rebalance-test	TopicId: S-tX0dPmSG-5QyVoFATtVA	PartitionCount: 100	ReplicationFactor: 1	Configs: leader.replication.throttled.replicas=73:0,73:1,min.insync.replicas=1,follower.replication.throttled.replicas=73:0,73:1,message.format.version=3.0-IV1
---
Topic: rebalance-test	TopicId: S-tX0dPmSG-5QyVoFATtVA	PartitionCount: 100	ReplicationFactor: 1	Configs: leader.replication.throttled.replicas=13:0,13:1,45:0,45:1,min.insync.replicas=1,follower.replication.throttled.replicas=13:0,13:1,45:0,45:1,message.format.version=3.0-IV1
---
Topic: rebalance-test	TopicId: S-tX0dPmSG-5QyVoFATtVA	PartitionCount: 100	ReplicationFactor: 1	Configs: leader.replication.throttled.replicas=47:0,47:1,81:0,81:1,min.insync.replicas=1,follower.replication.throttled.replicas=47:0,47:1,81:0,81:1,message.format.version=3.0-IV1
...

So this works, but I'm now wondering if and how we should make this behavior automatic when CC integration is enabled. Give me some time to re-read these KIPs.

@khartld
Copy link
Author

khartld commented Apr 19, 2024

But this would mean I have to whitelist all potential configurations that I want to allow to be set in KafkaTopics right? An additional blacklist option could potentially be good enough.

@fvaleri
Copy link
Contributor

fvaleri commented Apr 19, 2024

Here the CO and TO reconciliations are a bit different. The former only reverts configuration changes that are different from .spec.kafka.config. Instead, the latter reverts configuration changes that are not included or are different from .spec.config.

But this would mean I have to whitelist all potential configurations that I want to allow to be set in KafkaTopics right? An additional blacklist option could potentially be good enough.

I think a way forward here could be to change the STRIMZI_ALTERABLE_TOPIC_CONFIG to allow all configurations except a specific subset, without introducing a new environment variable. For example:

STRIMZI_ALTERABLE_TOPIC_CONFIG="ALL_EXCEPT:leader.replication.throttled.replicas,follower.replication.throttled.replicas"

I expect this configuration to be automatically set by the Topic Operator when CC integration is enabled and the environment variable is set to default (ALL).

@tombentley @scholzj wdyt?

@scholzj
Copy link
Member

scholzj commented Apr 19, 2024

I think that we will need to have some way to blacklist some options like this. That said, I'm not sure what you suggested is the best format. Maybe a brand new option would be better and cleaner?

@fvaleri
Copy link
Contributor

fvaleri commented Apr 23, 2024

I'm fine with introducing a new env var. Having something like STRIMZI_IGNORED_TOPIC_CONFIG could help to resolve this and similar race conditions. We already have a method called skipNonAlterableConfigs which could be refactored to include this logic. If the user adds one of the ignored configurations, the status should be updated with a warning.

@fvaleri
Copy link
Contributor

fvaleri commented Apr 23, 2024

@khartld are you planning to work on this? Otherwise I can do it.

@scholzj
Copy link
Member

scholzj commented Apr 23, 2024

Maybe this should go first through the triage to make sure everyone agrees on what to do.

@fvaleri
Copy link
Contributor

fvaleri commented Apr 23, 2024

Ok, let's wait then.

@khartld
Copy link
Author

khartld commented Apr 24, 2024

@khartld are you planning to work on this? Otherwise I can do it.

I'd be happy to implement it given a specification :)
Another behavior question I would have is what happens if both env variables would be set.

  • Throw an error
  • Only apply ALTERABLE_TOPIC_CONFIGs which can be further restricted by IGNORED_TOPIC_CONFIGs
  • Apply all configs except IGNORED_TOPIC_CONFIGs which can be reverted by ALTERABLE_TOPIC_CONFIGs

@fvaleri
Copy link
Contributor

fvaleri commented Apr 24, 2024

I'd be happy to implement it given a specification :)

Sure, let's wait the next community meeting (May 2nd) to find an agreement on specs.

Another behavior question I would have is what happens if both env variables would be set.

That's a good question. My preference would be to log and update the status with a warning. I don't see a use case where one would need to use both.

The default value would be IGNORED_TOPIC_CONFIG = "NONE", the reverse of ALTERABLE_TOPIC_CONFIG = "ALL".

@khartld
Copy link
Author

khartld commented Apr 29, 2024

The default value would be IGNORED_TOPIC_CONFIG = "NONE", the reverse of ALTERABLE_TOPIC_CONFIG = "ALL".

To make the replicationThrottle from KafkaRebalance work by default, IGNORED_TOPIC_CONFIG would need to be follower.replication.throttled.replicas,leader.replication.throttled.replicas

@fvaleri
Copy link
Contributor

fvaleri commented Apr 29, 2024

To make the replicationThrottle from KafkaRebalance work by default, IGNORED_TOPIC_CONFIG would need to be follower.replication.throttled.replicas,leader.replication.throttled.replicas

Yes, but only if CC integration is enabled.

https://github.com/strimzi/strimzi-kafka-operator/blob/main/topic-operator/src/main/java/io/strimzi/operator/topic/TopicOperatorConfig.java#L137

@scholzj
Copy link
Member

scholzj commented May 2, 2024

Triaged on the community call on 2.5.2024: We should implement the following to address this:

  • Add new variable named something like UNALTERABLE_TOPIC_CONFIG
  • The options specified in this option will be ignored when diffing the topics (with a warning)
  • Default should be NONE
  • In Cluster Operator, UNALTERABLE_TOPIC_CONFIG will be set automatically to ignore the replication throttle option if CC is enabled. It can be also set directly by users when using the standalone CC.
  • If both UNALTERABLE_TOPIC_CONFIG and ALTERABLE_TOPIC_CONFIG are set, a warning will be raised

@khartld Do you still plan to look into this?

@khartld
Copy link
Author

khartld commented May 6, 2024

Yes, I will have a look at implementing it. Thanks for the specs!

@fvaleri
Copy link
Contributor

fvaleri commented May 30, 2024

@khartld hi, what's the state of this fix? No rush, but let me know if you need any help.

@kahartma
Copy link

Hi @fvaleri, I attached the MR with the fix. I'm currently working on getting the docker build to work for me to test it locally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants