Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync with upstream #1

Open
wants to merge 6,137 commits into
base: trunk
Choose a base branch
from
Open

Sync with upstream #1

wants to merge 6,137 commits into from

Conversation

janelletavares
Copy link

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

KevinZTW and others added 30 commits May 11, 2024 23:38
…etion (#15902)

Write events create and add a TimerTask to schedule the timeout operation. The issue is that we pile up the number of timer tasks which are essentially no-ops if replication was successful. They stay in memory for 15 seconds (default write timeout) and as the rate of write increases, the impact on memory usage increases.

Instead, cancel the corresponding write timeout task when the write event is committed to the log. This also applies to complete transaction events.

Reviewers: David Jacot <djacot@confluent.io>
…ater (#15882)

Uses the new remove operation of the state updater that returns
a future to handle task assignment.

Reviewer: Lucas Brutschy <lbrutschy@confluent.io>
This patch introduces the SubscriptionType to the group state and passes it along to the partition assignor. A group is "homogeneous" when all the members are subscribed to the same topics; or it is "heterogeneous" otherwise. This mainly helps the uniform assignor because it does not have to re-compute this information to determine which algorithm to use.

trunk:
Benchmark                                       (assignmentType)  (assignorType)  (isRackAware)  (memberCount)  (partitionsToMemberRatio)  (subscriptionModel)  (topicCount)  Mode  Cnt    Score    Error  Units
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false            100                         10          HOMOGENEOUS           100  avgt    5    0.136 ±  0.001  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false            100                         10          HOMOGENEOUS          1000  avgt    5    0.198 ±  0.002  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false           1000                         10          HOMOGENEOUS           100  avgt    5    1.767 ±  0.138  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false           1000                         10          HOMOGENEOUS          1000  avgt    5    1.540 ±  0.020  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10          HOMOGENEOUS           100  avgt    5   32.419 ±  7.173  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10          HOMOGENEOUS          1000  avgt    5   26.731 ±  1.985  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false            100                         10          HOMOGENEOUS           100  avgt    5    0.242 ±  0.006  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false            100                         10          HOMOGENEOUS          1000  avgt    5    1.002 ±  0.006  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false           1000                         10          HOMOGENEOUS           100  avgt    5    2.544 ±  0.168  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false           1000                         10          HOMOGENEOUS          1000  avgt    5   10.749 ±  0.207  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10          HOMOGENEOUS           100  avgt    5   26.832 ±  0.154  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10          HOMOGENEOUS          1000  avgt    5  106.209 ±  0.301  ms/op
JMH benchmarks done

patch:
Benchmark                                       (assignmentType)  (assignorType)  (isRackAware)  (memberCount)  (partitionsToMemberRatio)  (subscriptionType)  (topicCount)  Mode  Cnt   Score   Error  Units
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false            100                         10         HOMOGENEOUS           100  avgt    5   0.131 ± 0.001  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false            100                         10         HOMOGENEOUS          1000  avgt    5   0.185 ± 0.004  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false           1000                         10         HOMOGENEOUS           100  avgt    5   1.943 ± 0.091  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false           1000                         10         HOMOGENEOUS          1000  avgt    5   1.450 ± 0.139  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10         HOMOGENEOUS           100  avgt    5  30.803 ± 2.644  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10         HOMOGENEOUS          1000  avgt    5  24.251 ± 1.230  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false            100                         10         HOMOGENEOUS           100  avgt    5   0.155 ± 0.004  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false            100                         10         HOMOGENEOUS          1000  avgt    5   0.235 ± 0.010  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false           1000                         10         HOMOGENEOUS           100  avgt    5   1.602 ± 0.046  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false           1000                         10         HOMOGENEOUS          1000  avgt    5   1.901 ± 0.174  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10         HOMOGENEOUS           100  avgt    5  16.098 ± 1.905  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10         HOMOGENEOUS          1000  avgt    5  17.681 ± 0.174  ms/op
JMH benchmarks done

Reviewers: David Jacot <djacot@confluent.io>
)

Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
)

Uses the new remove operation of the state updater that returns
a future to shutdown the task manager.

Reviewer: Lucas Brutschy <lbrutschy@confluent.io>
This patch deprecates `offsets.commit.required.acks` in Apache Kafka 3.8 as described in KIP-1041: https://cwiki.apache.org/confluence/x/9YobEg.

Reviewers: Justine Olshan <jolshan@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
…#15727)

This patch adds integration tests for consumerGroupDescribe API.

Reviewers: David Jacot <djacot@confluent.io>
Improve consumer log for expired poll timer, by showing how much time was the max.poll.interval.ms exceeded. This should be helpful in guiding the user to tune that config on the common case of long-running processing causing the consumer to leave the group. Inspired by other clients that log such information on the same situation.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Matthias Sax <mjsax@apache.org>, 
Andrew Schofield <andrew_schofield@live.com>, Kirk True <kirk@kirktrue.pro>
…a port (#15923)

Reviewers: Gaurav Narula <gaurav_narula2@apple.com>, Chia-Ping Tsai <chia7712@gmail.com>
Follow the pattern used by the other modules.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Greg Harris <gharris1727@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>,  Lianet Magrans <lianetmr@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
The tests testRealProducerConfigWithSyncSendShouldNotThrowException and testRealProducerConfigWithSyncSendAndNotIgnoringExceptionsShouldThrowException create real producer instances, which are leaked when the test exits.

Instead, each test should be followed by a cleanup operation where the registered appender is removed and closed.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Validate that a control batch in the batch accumulator has at least one control record.

Reviewers: Jun Rao <junrao@apache.org>, Chia-Ping Tsai <chia7712@apache.org>
Reviewers: José Armando García Sancio <jsancio@apache.org>
Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
…ducerFencedException (#15919)

KStreams is able to handle the ProducerFenced (among other errors) cleanly. It does this by closing the task dirty and triggering a rebalance amongst the worker threads to rejoin the group. The producer is also recreated. Due to how streams works (writing to and reading from various topics), the application is able to figure out the last thing the fenced producer completed and continue from there. InvalidPidMappingException should be treated the same way.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Justine Olshan <jolshan@confluent.io>
…15921)

The heartbeat api to the consumer group with classic protocol members schedules the session timeout. At present, there's no way to get the classic member session timeout in heartbeat to consumer group.

This patch stores the session timeout into the ClassicMemberMetadata in ConsumerGroupMemberMetadataValue and update it when it's provided in the join request.

Reviewers: David Jacot <djacot@confluent.io>
)

This PR fixes two kinds of bugs in the new(ish) rack-aware part of the sticky assignment algorithm:

First, when reassigning "owned partitions" to their previous owners, we now have to take the rack placement into account and might not immediately assign a previously-owned partition to its old consumer during this phase. There is a small chance this partition will be assigned to its previous owner during a later stage of the assignment, but if it's not then by definition it has been "revoked" and must be removed from the assignment during the adjustment phase of the CooperativeStickyAssignor according to the cooperative protocol. We need to make sure any partitions removed in this way end up in the "partitionsTransferringOwnership".

Second, the sticky algorithm works in part by keeping track of how many consumers are still "unfilled" when they are at the "minQuota", meaning we may need to assign one more partition to get to the expected number of consumers at the "maxQuota". During the rack-aware round-robin assignment phase, we were not properly clearing the set of unfilled & minQuota consumers once we reached the expected number of "maxQuota" consumers (since by definition that means no more minQuota consumers need to or can be given any more partitions since that would bump them up to maxQuota and exceed the expected count). This bug would result in the entire assignment being failed due to a correctness check at the end which verifies that the "unfilled members" set is empty before returning the assignment. An IllegalStateException would be thrown, failing the rebalancing and sending the group into an endless rebalancing loop until/unless it was lucky enough to produce a new assignment that didn't hit this bug

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
…amsState (#15920)

This PR implements read-only container classes for ApplicationState and KafkaStreamsState, and initializes those within StreamsPartitionAssignor#assign.

New internal methods were also added to the ClientState to easily pass this data through to the KafkaStreamsState.

One test was added to check the lag sorting within the implementation of KafkaStreamsState, which is the counterpart to the test that existed for the ClientState class.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
This PR changes KafkaStreamsAssignment from an interface to a container class, and implements said class.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers:  Jeff Kim <kimkb2011@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…erTest (#15885)

Some tests in TopicBasedRemoteLogMetadataManagerTest flake because waitUntilConsumerCatchesUp may break early before consumer manager has caught up with all the events.

This PR adds an expected offsets for leader/follower metadataOffset partitions and ensures we wait for the offset to be at least equal to the argument to avoid flakyness.

Reviewers: Satish Duggana <satishd@apache.org>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
sjhajharia and others added 30 commits June 5, 2024 22:36
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…schemas without inner schemas (#16161)

Signed-off-by: Greg Harris <greg.harris@aiven.io>
Reviewers: Chris Egerton <chrise@aiven.io>
Fixed the calculation of the store name list based on the subtopology being accessed.

Also added a new test to make sure this new functionality works as intended.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
FK left-join was changed via KIP-962. This PR updates the docs accordingly.

Reviewers: Ayoub Omari <ayoubomari1@outlook.fr>, Matthias J. Sax <matthias@confluent.io>
…5993)

A patch for KAFKA-15046 got rid of fsync on LeaderEpochFileCache#truncateFromStart/End for performance reason, but it turned out this could cause corrupted leader-epoch checkpoint file on ungraceful OS shutdown, i.e. OS shuts down in the middle when kernel is writing dirty pages back to the device.

To address this problem, this PR makes below changes: (1) Revert LeaderEpochCheckpoint#write to always fsync
(2) truncateFromStart/End now call LeaderEpochCheckpoint#write asynchronously on scheduler thread
(3) UnifiedLog#maybeCreateLeaderEpochCache now loads epoch entries from checkpoint file only when current cache is absent

Reviewers: Jun Rao <junrao@gmail.com>
…is in the voter set (#16079)

1. Changing log message from error to info - We may expect the HW calculation to give us a smaller result than the current HW in the case of quorum reconfiguration. We will continue to not allow the HW to actually decrease.
2. Logic for finding the updated LeaderEndOffset for updateReplicaState is changed as well. We do not assume the leader is in the voter set and check the observer states as well.
3. updateLocalState now accepts an additional "lastVoterSet" param which allows us to update the leader state with the last known voters. any nodes in this set but not in voterStates will be added to voterStates and removed from observerStates, any nodes not in this set but in voterStates will be removed from voterStates and added to observerStates

Reviewers: Luke Chen <showuon@gmail.com>, José Armando García Sancio <jsancio@apache.org>
…6214)

Remove usage of the partition.assignment.strategy config in the new consumer. This config is deprecated with the new consumer protocol, so the AsyncKafkaConsumer should not use or validate the property.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
…ions (#16187)

This PR is part of KAFKA-16448 which aims to bring a ProcessingExceptionHandler to Kafka Streams in order to deal with exceptions that occur during processing.

This PR brings ProcessingExceptionHandler interface and default implementations.

Co-authored-by: Dabz <d.gasparina@gmail.com>
Co-authored-by: sebastienviale <sebastien.viale@michelin.com>

Reviewer: Bruno Cadonna <cadonna@apache.org>
Reviewers: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
- Use SystemTime instead of MockTime when time is not mocked
- Use static assertions to reduce the line length
- Fold the lines if it exceeds the limit
- rename tp0 to tpId0 when it refers to TopicIdPartition

Reviewers: Kuan-Po (Cooper) Tseng <brandboat@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Greg Harris <gharris1727@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…module (#16198)

This patch moves the `PartitionAssignor` interface and all the related classes to a newly created `group-coordinator/api` module, following the pattern used by the storage and tools modules.

Reviewers: Ritika Reddy <rreddy@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
A task does not know anything about a produce error thrown
by a different task. That might lead to a InvalidTxnStateException
when a task attempts to do a transactional operation on a producer
that failed due to a different task.

This commit stores the produce exception in the streams producer
on completion of a send instead of the record collector since the
record collector is on task level whereas the stream producer
is on stream thread level. Since all tasks use the same streams
producer the error should be correctly propagated across tasks
of the same stream thread.

For EOS alpha, this commit does not change anything because
each task uses its own producer. The send error is still
on task level but so is also the transaction.

Reviewers: Matthias J. Sax <matthias@confluent.io>
This PR updates all of the streams task assignment code to use the new AssignmentConfigs public class.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
…eTopics (#16217)

Reviewers: Andrew Schofield <aschofield@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, David Arthur <mumrah@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…o NetworkClientDelegateTest (#16234)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…-provided timeout (#16031)

Improve consistency and correctness for user-provided timeouts at the Consumer network request layer, per the Java client Consumer timeouts design (https://cwiki.apache.org/confluence/display/KAFKA/Java+client+Consumer+timeouts). While the changes introduced in KAFKA-15974 enforce timeouts at the Consumer's event layer, this change enforces timeouts at the network request layer.

The changes mostly fit into the following areas:

1. Create shared code and idioms so timeout handling logic is consistent across current and future RequestManager implementations
2. Use deadlineMs instead of expirationMs, expirationTimeoutMs, retryExpirationTimeMs, timeoutMs, etc.
3. Update "preemptive pruning" to remove expired requests that have had at least one attempt

Reviewers: Lianet Magrans <lianetmr@gmail.com>, Bruno Cadonna <cadonna@apache.org>
Support for multiple log directories in KRaft exists from
MetataVersion 3.7-IV2.

When migrating a ZK broker to KRaft, we already check that
the IBP is high enough before allowing the broker to startup.

With KIP-584 and KIP-778, Brokers in KRaft mode do not require
the IBP configuration - the configuration is deprecated.
In KRaft mode inter.broker.protocol.version defaults to
MetadataVersion.MINIMUM_KRAFT_VERSION (IBP_3_0_IV1).

Instead KRaft brokers discover the MetadataVersion by reading
the "metadata.version" FeatureLevelRecord from the cluster metadata.

This change adds a new configuration validation step upon discovering
the "metadata.version" from the cluster metadata.

Reviewers: Mickael Maison <mickael.maison@gmail.com>
…t log4j dependency (#12148)


Reviewers: Mickael Maison <mickael.maison@gmail.com>
…ion unchanged (#15869)

This PR includes changes for AsyncKafkaConsumer to avoid evaluating the subscription regex on every poll if metadata hasn't changed. The metadataVersionSnapshot was introduced to identify whether metadata has changed or not, if yes then the current subscription regex will be evaluated.

This is the same mechanism used by the LegacyConsumer.

Reviewers: Lianet Magrans <lianetmr@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>
…can documentation (#16182)

Reviewers: Matthias J. Sax <matthias@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>
Reviewers: Justine Olshan <jolshan@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
The test runs forever. We disable the test temporarily to unblock CI

Reviewers: Luke Chen <showuon@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment