Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete join semantics #143

Open
je-ik opened this issue Jun 29, 2017 · 0 comments
Open

Complete join semantics #143

je-ik opened this issue Jun 29, 2017 · 0 comments

Comments

@je-ik
Copy link
Contributor

je-ik commented Jun 29, 2017

In contrast to purely batch join, in stream processing the variety of join types is much broader. The main differences are:

  • streams have time ordering (as opposed to batch, where all elements have the same time)
  • streams can be windowed (as opposed to batch, where all elements belong to the same window)

This implies more options for joins besides the "classical" - left, right, inner, full outer:

  • join can either apply windowing on both streams or can cache left or right stream locally (windowed, leftCached, rightCached)
  • because of the time ordering the join can be performed in fashion of cartesian product of left and right elements having the same key, or only defined element from each set, yielding leftFirst, leftLast, leftAll, rightFirst, rightLast, rightAll

Of course, some combinations don't make sense, so we have to make the settings applicable only on meaningful combinations, which are:

  • {left, right, inner, full}, windowed, {left, right}{First, Last, All}
  • left, rightCached, right{First, Last}
  • right, leftCached, left{First, Last}
  • {inner, fullOuter}, leftCached, left{First, Last}
  • {inner, fullOuter}, rightCached, right{First, Last}

The client code could look like this for classical windowed left join taking last element in each window

  Join.of(left, right)
    .using(...)
    .left()
    .windowed(/* windowing comes here */)
    .leftLast()
    .rightLast()
    .output();

The following code will function as equivallent to joining KStream with KTable in Kafka streams:

  Join.of(left, right)
    .using(...)
    .left()
    .rightCached()
    .rightLast()
    .output()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant