New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: clustered stream recovery, ensure PurgeEx is executed instead of full purge #4197
Conversation
server/jetstream_cluster.go
Outdated
@@ -155,7 +155,6 @@ type consumerAssignment struct { | |||
type streamPurge struct { | |||
Client *ClientInfo `json:"client,omitempty"` | |||
Stream string `json:"stream"` | |||
LastSeq uint64 `json:"last_seq"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was only used in the below isRecovering + Compact
block, so have cleaned it up.
The |
If we have a purge then add some new messages, if we try to process the purge again on a restart it will remove more than it should. I understand the fix you are proposing, but I think we still need to possibly protect against this condition which is what I was trying to do IIRC. |
Yeah, I have been thinking about this as well. This does fix all messages being purged, but there will still be some inconsistencies in certain cases.. For example when replaying a purge with |
Maybe keep the code that remembers LastSeq and the conditional on isRecovering() and instead of calling compact we call the extended version of purge with the last sequence. |
Was just thinking that as well, will try it out! 👍 |
Have been tinkering quite a bit, and found myself almost going into a refactor.. so taking a small step back to confirm I'm going into the right direction. See also the commit I made just now: c40a6a3 In a nutshell:
This works in the cases of purging on However.. where it becomes a bit harder (and where I stopped going into a refactor to ask for feedback) is how to handle purging on What could be next steps here? |
… being replayed. (#4212) This is an extension to the excellent work by @MauriceVanVeen and his original PR #4197 to fully resolve for all use cases. Signed-off-by: Derek Collison <derek@nats.io> Resolves #4196
Awesome work! Thanks, fixed in #4212 |
Resolves #NNN
git pull --rebase origin main
)Resolves #4196
Changes proposed in this pull request:
I believe this is caused by the following code:
Which was introduced in this PR: #1886
This was fine at the time, since at that point only full stream purges were supported. So this shortcut would not purge if was identified that it happened already.
However, since the introduction of more advanced purges in: #2296, purges can happen on: the whole stream, specific subjects, sequences, amount of messages to keep, etc.
But, this shortcut would always do a full stream purge on recovering, even if we only did a "partial" subject-based purge request.
This would result in any JetStream (/KV/OBJ) with partial purges to be fully purged upon restart of a node. Leading into inconsistent states across nodes, which can be observed when switching leaders.
This PR removes this shortcut, and just relies on the default purge-handling behaviour, added a test as well.