forked from apache/kafka
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync with upstream #1
Open
janelletavares
wants to merge
6,137
commits into
meroxa:trunk
Choose a base branch
from
apache:trunk
base: trunk
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+920,967
−197,987
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…5884) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…15897) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…etion (#15902) Write events create and add a TimerTask to schedule the timeout operation. The issue is that we pile up the number of timer tasks which are essentially no-ops if replication was successful. They stay in memory for 15 seconds (default write timeout) and as the rate of write increases, the impact on memory usage increases. Instead, cancel the corresponding write timeout task when the write event is committed to the log. This also applies to complete transaction events. Reviewers: David Jacot <djacot@confluent.io>
…ater (#15882) Uses the new remove operation of the state updater that returns a future to handle task assignment. Reviewer: Lucas Brutschy <lbrutschy@confluent.io>
This patch introduces the SubscriptionType to the group state and passes it along to the partition assignor. A group is "homogeneous" when all the members are subscribed to the same topics; or it is "heterogeneous" otherwise. This mainly helps the uniform assignor because it does not have to re-compute this information to determine which algorithm to use. trunk: Benchmark (assignmentType) (assignorType) (isRackAware) (memberCount) (partitionsToMemberRatio) (subscriptionModel) (topicCount) Mode Cnt Score Error Units ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 100 10 HOMOGENEOUS 100 avgt 5 0.136 ± 0.001 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 100 10 HOMOGENEOUS 1000 avgt 5 0.198 ± 0.002 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 1000 10 HOMOGENEOUS 100 avgt 5 1.767 ± 0.138 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 1000 10 HOMOGENEOUS 1000 avgt 5 1.540 ± 0.020 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 10000 10 HOMOGENEOUS 100 avgt 5 32.419 ± 7.173 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 10000 10 HOMOGENEOUS 1000 avgt 5 26.731 ± 1.985 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 100 10 HOMOGENEOUS 100 avgt 5 0.242 ± 0.006 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 100 10 HOMOGENEOUS 1000 avgt 5 1.002 ± 0.006 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 1000 10 HOMOGENEOUS 100 avgt 5 2.544 ± 0.168 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 1000 10 HOMOGENEOUS 1000 avgt 5 10.749 ± 0.207 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 10000 10 HOMOGENEOUS 100 avgt 5 26.832 ± 0.154 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 10000 10 HOMOGENEOUS 1000 avgt 5 106.209 ± 0.301 ms/op JMH benchmarks done patch: Benchmark (assignmentType) (assignorType) (isRackAware) (memberCount) (partitionsToMemberRatio) (subscriptionType) (topicCount) Mode Cnt Score Error Units ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 100 10 HOMOGENEOUS 100 avgt 5 0.131 ± 0.001 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 100 10 HOMOGENEOUS 1000 avgt 5 0.185 ± 0.004 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 1000 10 HOMOGENEOUS 100 avgt 5 1.943 ± 0.091 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 1000 10 HOMOGENEOUS 1000 avgt 5 1.450 ± 0.139 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 10000 10 HOMOGENEOUS 100 avgt 5 30.803 ± 2.644 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 10000 10 HOMOGENEOUS 1000 avgt 5 24.251 ± 1.230 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 100 10 HOMOGENEOUS 100 avgt 5 0.155 ± 0.004 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 100 10 HOMOGENEOUS 1000 avgt 5 0.235 ± 0.010 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 1000 10 HOMOGENEOUS 100 avgt 5 1.602 ± 0.046 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 1000 10 HOMOGENEOUS 1000 avgt 5 1.901 ± 0.174 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 10000 10 HOMOGENEOUS 100 avgt 5 16.098 ± 1.905 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL UNIFORM false 10000 10 HOMOGENEOUS 1000 avgt 5 17.681 ± 0.174 ms/op JMH benchmarks done Reviewers: David Jacot <djacot@confluent.io>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This patch deprecates `offsets.commit.required.acks` in Apache Kafka 3.8 as described in KIP-1041: https://cwiki.apache.org/confluence/x/9YobEg. Reviewers: Justine Olshan <jolshan@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
…#15727) This patch adds integration tests for consumerGroupDescribe API. Reviewers: David Jacot <djacot@confluent.io>
Improve consumer log for expired poll timer, by showing how much time was the max.poll.interval.ms exceeded. This should be helpful in guiding the user to tune that config on the common case of long-running processing causing the consumer to leave the group. Inspired by other clients that log such information on the same situation. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Matthias Sax <mjsax@apache.org>, Andrew Schofield <andrew_schofield@live.com>, Kirk True <kirk@kirktrue.pro>
…a port (#15923) Reviewers: Gaurav Narula <gaurav_narula2@apple.com>, Chia-Ping Tsai <chia7712@gmail.com>
Follow the pattern used by the other modules. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
#15903) Reviewers: David Jacot <djacot@confluent.io>
Reviewers: Greg Harris <gharris1727@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Lianet Magrans <lianetmr@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
The tests testRealProducerConfigWithSyncSendShouldNotThrowException and testRealProducerConfigWithSyncSendAndNotIgnoringExceptionsShouldThrowException create real producer instances, which are leaked when the test exits. Instead, each test should be followed by a cleanup operation where the registered appender is removed and closed. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Validate that a control batch in the batch accumulator has at least one control record. Reviewers: Jun Rao <junrao@apache.org>, Chia-Ping Tsai <chia7712@apache.org>
Reviewers: José Armando García Sancio <jsancio@apache.org>
Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
…ducerFencedException (#15919) KStreams is able to handle the ProducerFenced (among other errors) cleanly. It does this by closing the task dirty and triggering a rebalance amongst the worker threads to rejoin the group. The producer is also recreated. Due to how streams works (writing to and reading from various topics), the application is able to figure out the last thing the fenced producer completed and continue from there. InvalidPidMappingException should be treated the same way. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Justine Olshan <jolshan@confluent.io>
…15921) The heartbeat api to the consumer group with classic protocol members schedules the session timeout. At present, there's no way to get the classic member session timeout in heartbeat to consumer group. This patch stores the session timeout into the ClassicMemberMetadata in ConsumerGroupMemberMetadataValue and update it when it's provided in the join request. Reviewers: David Jacot <djacot@confluent.io>
) This PR fixes two kinds of bugs in the new(ish) rack-aware part of the sticky assignment algorithm: First, when reassigning "owned partitions" to their previous owners, we now have to take the rack placement into account and might not immediately assign a previously-owned partition to its old consumer during this phase. There is a small chance this partition will be assigned to its previous owner during a later stage of the assignment, but if it's not then by definition it has been "revoked" and must be removed from the assignment during the adjustment phase of the CooperativeStickyAssignor according to the cooperative protocol. We need to make sure any partitions removed in this way end up in the "partitionsTransferringOwnership". Second, the sticky algorithm works in part by keeping track of how many consumers are still "unfilled" when they are at the "minQuota", meaning we may need to assign one more partition to get to the expected number of consumers at the "maxQuota". During the rack-aware round-robin assignment phase, we were not properly clearing the set of unfilled & minQuota consumers once we reached the expected number of "maxQuota" consumers (since by definition that means no more minQuota consumers need to or can be given any more partitions since that would bump them up to maxQuota and exceed the expected count). This bug would result in the entire assignment being failed due to a correctness check at the end which verifies that the "unfilled members" set is empty before returning the assignment. An IllegalStateException would be thrown, failing the rebalancing and sending the group into an endless rebalancing loop until/unless it was lucky enough to produce a new assignment that didn't hit this bug Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
…amsState (#15920) This PR implements read-only container classes for ApplicationState and KafkaStreamsState, and initializes those within StreamsPartitionAssignor#assign. New internal methods were also added to the ClientState to easily pass this data through to the KafkaStreamsState. One test was added to check the lag sorting within the implementation of KafkaStreamsState, which is the counterpart to the test that existed for the ClientState class. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
This PR changes KafkaStreamsAssignment from an interface to a container class, and implements said class. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Jeff Kim <kimkb2011@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…erTest (#15885) Some tests in TopicBasedRemoteLogMetadataManagerTest flake because waitUntilConsumerCatchesUp may break early before consumer manager has caught up with all the events. This PR adds an expected offsets for leader/follower metadataOffset partitions and ensures we wait for the offset to be at least equal to the argument to avoid flakyness. Reviewers: Satish Duggana <satishd@apache.org>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…schemas without inner schemas (#16161) Signed-off-by: Greg Harris <greg.harris@aiven.io> Reviewers: Chris Egerton <chrise@aiven.io>
Fixed the calculation of the store name list based on the subtopology being accessed. Also added a new test to make sure this new functionality works as intended. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
FK left-join was changed via KIP-962. This PR updates the docs accordingly. Reviewers: Ayoub Omari <ayoubomari1@outlook.fr>, Matthias J. Sax <matthias@confluent.io>
…5993) A patch for KAFKA-15046 got rid of fsync on LeaderEpochFileCache#truncateFromStart/End for performance reason, but it turned out this could cause corrupted leader-epoch checkpoint file on ungraceful OS shutdown, i.e. OS shuts down in the middle when kernel is writing dirty pages back to the device. To address this problem, this PR makes below changes: (1) Revert LeaderEpochCheckpoint#write to always fsync (2) truncateFromStart/End now call LeaderEpochCheckpoint#write asynchronously on scheduler thread (3) UnifiedLog#maybeCreateLeaderEpochCache now loads epoch entries from checkpoint file only when current cache is absent Reviewers: Jun Rao <junrao@gmail.com>
…is in the voter set (#16079) 1. Changing log message from error to info - We may expect the HW calculation to give us a smaller result than the current HW in the case of quorum reconfiguration. We will continue to not allow the HW to actually decrease. 2. Logic for finding the updated LeaderEndOffset for updateReplicaState is changed as well. We do not assume the leader is in the voter set and check the observer states as well. 3. updateLocalState now accepts an additional "lastVoterSet" param which allows us to update the leader state with the last known voters. any nodes in this set but not in voterStates will be added to voterStates and removed from observerStates, any nodes not in this set but in voterStates will be removed from voterStates and added to observerStates Reviewers: Luke Chen <showuon@gmail.com>, José Armando García Sancio <jsancio@apache.org>
…6214) Remove usage of the partition.assignment.strategy config in the new consumer. This config is deprecated with the new consumer protocol, so the AsyncKafkaConsumer should not use or validate the property. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
…ions (#16187) This PR is part of KAFKA-16448 which aims to bring a ProcessingExceptionHandler to Kafka Streams in order to deal with exceptions that occur during processing. This PR brings ProcessingExceptionHandler interface and default implementations. Co-authored-by: Dabz <d.gasparina@gmail.com> Co-authored-by: sebastienviale <sebastien.viale@michelin.com> Reviewer: Bruno Cadonna <cadonna@apache.org>
Reviewers: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
- Use SystemTime instead of MockTime when time is not mocked - Use static assertions to reduce the line length - Fold the lines if it exceeds the limit - rename tp0 to tpId0 when it refers to TopicIdPartition Reviewers: Kuan-Po (Cooper) Tseng <brandboat@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Greg Harris <gharris1727@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…module (#16198) This patch moves the `PartitionAssignor` interface and all the related classes to a newly created `group-coordinator/api` module, following the pattern used by the storage and tools modules. Reviewers: Ritika Reddy <rreddy@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
A task does not know anything about a produce error thrown by a different task. That might lead to a InvalidTxnStateException when a task attempts to do a transactional operation on a producer that failed due to a different task. This commit stores the produce exception in the streams producer on completion of a send instead of the record collector since the record collector is on task level whereas the stream producer is on stream thread level. Since all tasks use the same streams producer the error should be correctly propagated across tasks of the same stream thread. For EOS alpha, this commit does not change anything because each task uses its own producer. The send error is still on task level but so is also the transaction. Reviewers: Matthias J. Sax <matthias@confluent.io>
This PR updates all of the streams task assignment code to use the new AssignmentConfigs public class. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
…eTopics (#16217) Reviewers: Andrew Schofield <aschofield@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, David Arthur <mumrah@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…o NetworkClientDelegateTest (#16234) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…-provided timeout (#16031) Improve consistency and correctness for user-provided timeouts at the Consumer network request layer, per the Java client Consumer timeouts design (https://cwiki.apache.org/confluence/display/KAFKA/Java+client+Consumer+timeouts). While the changes introduced in KAFKA-15974 enforce timeouts at the Consumer's event layer, this change enforces timeouts at the network request layer. The changes mostly fit into the following areas: 1. Create shared code and idioms so timeout handling logic is consistent across current and future RequestManager implementations 2. Use deadlineMs instead of expirationMs, expirationTimeoutMs, retryExpirationTimeMs, timeoutMs, etc. 3. Update "preemptive pruning" to remove expired requests that have had at least one attempt Reviewers: Lianet Magrans <lianetmr@gmail.com>, Bruno Cadonna <cadonna@apache.org>
Support for multiple log directories in KRaft exists from MetataVersion 3.7-IV2. When migrating a ZK broker to KRaft, we already check that the IBP is high enough before allowing the broker to startup. With KIP-584 and KIP-778, Brokers in KRaft mode do not require the IBP configuration - the configuration is deprecated. In KRaft mode inter.broker.protocol.version defaults to MetadataVersion.MINIMUM_KRAFT_VERSION (IBP_3_0_IV1). Instead KRaft brokers discover the MetadataVersion by reading the "metadata.version" FeatureLevelRecord from the cluster metadata. This change adds a new configuration validation step upon discovering the "metadata.version" from the cluster metadata. Reviewers: Mickael Maison <mickael.maison@gmail.com>
…t log4j dependency (#12148) Reviewers: Mickael Maison <mickael.maison@gmail.com>
…ion unchanged (#15869) This PR includes changes for AsyncKafkaConsumer to avoid evaluating the subscription regex on every poll if metadata hasn't changed. The metadataVersionSnapshot was introduced to identify whether metadata has changed or not, if yes then the current subscription regex will be evaluated. This is the same mechanism used by the LegacyConsumer. Reviewers: Lianet Magrans <lianetmr@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>
…can documentation (#16182) Reviewers: Matthias J. Sax <matthias@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>
Reviewers: Justine Olshan <jolshan@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
The test runs forever. We disable the test temporarily to unblock CI Reviewers: Luke Chen <showuon@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.
Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.
Committer Checklist (excluded from commit message)