Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add regression test for upsert sources with keys that move partitions #27092

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

guswynn
Copy link
Contributor

@guswynn guswynn commented May 14, 2024

Closes #26965

cc @morsapaes

We only want to commit to the fact that if a keys partition only increases, then we do the right thing.

Motivation

  • tests/verification

Checklist

@guswynn guswynn requested review from petrosagg and a team May 14, 2024 22:51
@guswynn guswynn changed the title storage: add regression for upsert sources with keys that move partitions storage: add regression test for upsert sources with keys that move partitions May 14, 2024
@petrosagg
Copy link
Contributor

We only want to commit to the fact that if a keys partition only increases, then we do the right thing.

I don't think we want to commit to anything (and I don't think we can even if we wanted to).

If the topic looks like this:

part0: (key1, value1)
part1: (key1, value2)

There is no guarantee that after reclocking the stream will look like this:

(key1, t1, value1)
(key1, t1, value2)

And not like this

(key1, t1, value2)
(key1, t2, value1)

In the first case key1 ends up having the value value2 and in the second case value1

Copy link
Contributor

@petrosagg petrosagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a detail that I want to verify is done correctly. Upsert should compare the tuple (into_time, from_time) of an incoming update with the stored (into_time, from_time) tuple and not just the from_time. Is this what we store and use from the upsert state?

In other words, the behavior you have documented in the test as only happening in a restart must also happen in the case where the two updates from two different partitions happen to be reclocked to different timestamps. In that case the one with the larger mz timestamp must be the one that wins, no matter what its partition is. Is this how it works?

@guswynn
Copy link
Contributor Author

guswynn commented May 20, 2024

@petrosagg We currently sort by FromTime only, except during restarts; We could switch to both, that would resolve your point on #26965, while still supporting usecases that have only-increasing partitions for keys, right?

@petrosagg
Copy link
Contributor

that would resolve your point on #26965,

yes

while still supporting usecases that have only-increasing partitions for keys, right?

If "supporting" means that when a key moves partitions we will always keep the value of the highest partition then no, we can't support that because we can't guarantee that the two updates will land in the same mz timestamp. It's only when they land in the same timestamp that we can promise to keep the highest partition, otherwise the one with the highest mztimestamp will win, and that might be the lower partition.

@petrosagg
Copy link
Contributor

Our correctness promise is correctness property #2, i.e that from_time1 <= from_time2 => mz_time1 <= mz_time2. Two kafka messages from two different partitions are not comparable so there is no guarantee we can offer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: commit to some ordering between partitions in UPSERT
2 participants