-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create DataLink for Kafka [HZ-1985] #23886
Create DataLink for Kafka [HZ-1985] #23886
Conversation
For details on KafkaDataLink see the Javadoc.
* The behavior depends on the job's processing guarantee: | ||
* <ul> | ||
* <li><em>EXACTLY_ONCE:</em> the sink will use Kafka transactions to | ||
* commit the messages. This brings some overhead on the broker side, | ||
* slight throughput reduction (we don't send messages between snapshot | ||
* phases) and, most importantly, increases the latency of the messages | ||
* because they are only visible to consumers after they are committed. | ||
* <p> | ||
* When using transactions pay attention to your {@code | ||
* transaction.timeout.ms} config property. It limits the entire | ||
* duration of the transaction since it is begun, not just inactivity | ||
* timeout. It must not be smaller than your snapshot interval, | ||
* otherwise the Kafka broker will roll the transaction back before Jet | ||
* is done with it. Also it should be large enough so that Jet has time | ||
* to restart after a failure: a member can crash just before it's | ||
* about to commit, and Jet will attempt to commit the transaction | ||
* after the restart, but the transaction must be still waiting in the | ||
* broker. The default in Kafka 2.4 is 1 minute. | ||
* | ||
* <li><em>AT_LEAST_ONCE:</em> messages are committed immediately, the | ||
* sink ensure that all async operations are done at 1st snapshot | ||
* phase. This ensures that each message is written if the job fails, | ||
* but might be written again after the job restarts. | ||
* </ul> | ||
* | ||
* If you want to avoid the overhead of transactions, you can reduce the | ||
* guarantee just for the sink by calling {@link | ||
* Builder#exactlyOnce(boolean) exactlyOnce(false)} on the returned builder. | ||
* <p> | ||
* IO failures are generally handled by Kafka producer and do not cause the | ||
* processor to fail. Refer to Kafka documentation for details. | ||
* <p> | ||
* Default local parallelism for this processor is 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could reference this javadoc to avoid duplicated description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a mix of approaches here, I kept what was the case for the Kafka source/sinks which repeats the javadoc.
if (properties != null) { | ||
metaSupplier = writeKafkaP(properties, topic, extractKeyFn1, extractValueFn1, exactlyOnce); | ||
} else { | ||
assert dataLinkRef != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asserts are disabled by default, couldn't we throw IllegalStateException here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you look at the constructors of the builder one is always non-null and the other null, I actually added these asserts to make intellij shut up about possible NPE and it's nicer than a suppression
For details on KafkaDataLink see the Javadoc.
Checklist:
Team:
,Type:
,Source:
,Module:
) and Milestone setAdd to Release Notes
orNot Release Notes content
set@Nonnull/@Nullable
annotations@since
tags in Javadoc