Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIXED] Make sure to process extended purge operations correctly when being replayed. #4212

Merged
merged 1 commit into from Jun 4, 2023

Conversation

derekcollison
Copy link
Member

This is an extension to the excellent work by @MauriceVanVeen and his original PR #4197 to fully resolve for all use cases.

Signed-off-by: Derek Collison derek@nats.io

Resolves #4196

…eplayed on a restart.

Signed-off-by: Derek Collison <derek@nats.io>
@derekcollison derekcollison requested a review from a team as a code owner June 4, 2023 00:58
Copy link
Member

@wallyqs wallyqs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison merged commit e1f8064 into main Jun 4, 2023
2 checks passed
@derekcollison derekcollison deleted the fix-4196 branch June 4, 2023 01:12
@MauriceVanVeen
Copy link
Contributor

Awesome! 🎉

I believe there might be a bug with both sequence and keep being set, though. Have opened a PR with a potential fix, and an explanation about the issue.

derekcollison added a commit that referenced this pull request Jun 4, 2023
PR #4212 fixed the issue I
reported in #4196.

However, I believe there might be a bug when both `sequence` and `keep`
are set during recovery.
In the `PurgeEx` the following check is done (for both `filestore.go`
and `memstore.go`):
```go
	if sequence > 1 && keep > 0 {
		return 0, ErrPurgeArgMismatch
	}
```

The `TestJetStreamClusterPurgeExReplayAfterRestart` also triggers this
case, meaning that during the test this error is returned but it
succeeds because the purge was already performed. Is this intended
behaviour?

To elaborate a bit more, I believe the following happens:
- when running the purge normally it will properly run the `keep` (since
it's not combined with `sequence` yet)
- when replaying the purge though, the `sequence` is added to the
`keep`, which errors out in the above if

Which means that during normal operation all will be well, but purges
with `keep` will be ignored upon replaying.

I'm proposing to remove the `sequence > 1 && keep > 0` check and
subsequent error. Which, for reference, was introduced in
#3121.
Hoping this ensures that during recovery, purges that haven't executed
yet will still be executed.

An alternative approach, which wouldn't remove the error: not allow
combining `sequence` and `keep` normally and only allowing it during
recovery. Which would preserve the current behaviour, and correctly
apply `sequence+keep` during recovery still. However, not sure if it's
possible to know if we're in "recovery mode" from within `PurgeEx`.

Resolves #4196
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clustered stream recovery, full stream purge is executed instead of PurgeEx
3 participants