Add test case for concurrent expected last subject sequence #4319

bruth · 2023-07-18T15:11:24Z

Resolves: #4320

Signed-off-by: Byron Ruth <byron@nats.io>

Signed-off-by: Derek Collison <derek@nats.io>

derekcollison · 2023-07-18T19:25:53Z

Need to have @neilalexander or @wallyqs take a look and approve. I can handle merging etc..

kruegernet · 2023-07-18T22:25:21Z

It's hard for me to parse the git history as an outsider so instead if I may ask: was this once working and broken by a regression or may this have existed as a bug for longer than that? Are there implications for deployed, production systems that may have relied on concurrency control here?

bruth · 2023-07-18T23:01:02Z

I tested back to 2.9.15, but I suspect this always existed. In my testing, the probability of this behavior occurring was dependent on how close the two concurrent requests were. In the specific test on my laptop (for reference), I observed anything above 2ms would not exhibit the behavior. In a production cluster with more latency among nodes, the probability may be up in the 10s or 100s of milliseconds.

The issue was that this expected sequence check was not part of the Raft consensus for the write, rather only a pre-check. As a result if concurrent requests were all "in flight" doing the same pre-flight check, these could all succeed.

bruth · 2023-07-18T23:04:16Z

Are there implications for deployed, production systems that may have relied on concurrency control here?

The implication would be that the intended behavior is corrected rather than a "last writer wins" situation 😄 This should not have any performance implication since the logic for the check only has been moved to the Raft layer so it will be rejected correctly when concurrent requests come in (with the header) and the "decision to accept" occurs.

kruegernet · 2023-07-19T00:42:47Z

What I mean with the implications question was, could it be the case that extant production systems relying on sequence-protected concurrent write guarantees may have data corruption or inconsistent state? Evidently as you describe it could be the case that the latency window for unpredictable winning writes could be in the 10s or 100s of milliseconds? So maybe issue an advisory for folks who may not watch the issues and change logs super carefully? In other words, had I not tested this carefully in production code, and relied on a stated guarantee, I might have issues with KV write integrity in a given production system, had I gone to production with my code prior to observing this issue.

bruth · 2023-07-19T01:19:05Z

Yes absolutely, awareness will be raised for this issue. In terms of impact, it depends on whether the message being published resulted from a deterministic operation or not among the concurrent actors. This will be highlighted in a less terse way in a blog post indicating the implications so folks can determine the impact it may have for them.

Add test case for concurrent expected last subject sequence

e7bf1b3

Signed-off-by: Byron Ruth <byron@nats.io>

bruth mentioned this pull request Jul 18, 2023

Serializability of Expected-Last-Subject-Sequence is not guaranteed in clustered stream #4320

Closed

2 tasks

derekcollison added 2 commits July 18, 2023 11:29

Moved to end for merge with other branches, minor changes

360f807

Signed-off-by: Derek Collison <derek@nats.io>

Fix bug that would race around check for last sequence per subject

244dda8

Signed-off-by: Derek Collison <derek@nats.io>

bruth marked this pull request as ready for review July 18, 2023 18:59

bruth requested a review from a team as a code owner July 18, 2023 18:59

derekcollison requested review from wallyqs and neilalexander July 18, 2023 19:24

wallyqs approved these changes Jul 18, 2023

View reviewed changes

derekcollison merged commit 80fb29f into main Jul 18, 2023
1 check passed

derekcollison deleted the conc-last-expect-seq branch July 18, 2023 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test case for concurrent expected last subject sequence #4319

Add test case for concurrent expected last subject sequence #4319

bruth commented Jul 18, 2023 •

edited by derekcollison

derekcollison commented Jul 18, 2023

kruegernet commented Jul 18, 2023 •

edited

bruth commented Jul 18, 2023

bruth commented Jul 18, 2023

kruegernet commented Jul 19, 2023

bruth commented Jul 19, 2023

Add test case for concurrent expected last subject sequence #4319

Add test case for concurrent expected last subject sequence #4319

Conversation

bruth commented Jul 18, 2023 • edited by derekcollison

derekcollison commented Jul 18, 2023

kruegernet commented Jul 18, 2023 • edited

bruth commented Jul 18, 2023

bruth commented Jul 18, 2023

kruegernet commented Jul 19, 2023

bruth commented Jul 19, 2023

bruth commented Jul 18, 2023 •

edited by derekcollison

kruegernet commented Jul 18, 2023 •

edited