Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raft snapshot- validation for corruption before restore #13234

Closed
sabarisankarj opened this issue Nov 22, 2021 · 2 comments
Closed

Raft snapshot- validation for corruption before restore #13234

sabarisankarj opened this issue Nov 22, 2021 · 2 comments

Comments

@sabarisankarj
Copy link

Is your feature request related to a problem? Please describe.
How to validate a Raft snapshot before using it to restore, to avoid restore with corrupted snapshot

Describe the solution you'd like
vault operator raft snapshot validate raft.snap -- to validate raft snapshot for corruption

Describe alternatives you've considered
Start restoring with the last known working raft snapshot (n), if it fails then try n-1 to test whether the cluster is stable

Explain any additional use-cases
When the Vault cluster is broken and a snapshot is taken during failed state, multiple corrupt snapshots are created with either less data or wrong leader node ip. Current process is taking more time to restore service.

Additional context
Not applicable

@ncabatoff
Copy link
Collaborator

It sounds like you might have fallen victim to an issue we recently discovered whereby the snapshot api might return an incomplete snapshot without reporting an error. We've addressed that in the API code with #12388, which might obviate the need for the validation command you're looking for.

My concern with implementing a validation command is that I don't want to give a false sense of security: we can look for some kinds of breakage, but other things can't be validated except by sending the snapshot to the server, and still other things simply can't be validated except by actually doing the restore and trying to use it. I'd rather people validated their snapshots by spinning up a throwaway server, restoring the snapshot, unsealing the throwaway server and doing some smoke testing (e.g. writing a value, mounting a secrets engine).

@hghaf099
Copy link
Contributor

hghaf099 commented Feb 8, 2022

@sabarisankarj Based on the provided feedback, I am going to close this issue for now. Please feel free to reopen the issue or open a new one for further discussions.

@hghaf099 hghaf099 closed this as completed Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants