You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
How to validate a Raft snapshot before using it to restore, to avoid restore with corrupted snapshot
Describe the solution you'd like
vault operator raft snapshot validate raft.snap -- to validate raft snapshot for corruption
Describe alternatives you've considered
Start restoring with the last known working raft snapshot (n), if it fails then try n-1 to test whether the cluster is stable
Explain any additional use-cases
When the Vault cluster is broken and a snapshot is taken during failed state, multiple corrupt snapshots are created with either less data or wrong leader node ip. Current process is taking more time to restore service.
Additional context
Not applicable
The text was updated successfully, but these errors were encountered:
It sounds like you might have fallen victim to an issue we recently discovered whereby the snapshot api might return an incomplete snapshot without reporting an error. We've addressed that in the API code with #12388, which might obviate the need for the validation command you're looking for.
My concern with implementing a validation command is that I don't want to give a false sense of security: we can look for some kinds of breakage, but other things can't be validated except by sending the snapshot to the server, and still other things simply can't be validated except by actually doing the restore and trying to use it. I'd rather people validated their snapshots by spinning up a throwaway server, restoring the snapshot, unsealing the throwaway server and doing some smoke testing (e.g. writing a value, mounting a secrets engine).
@sabarisankarj Based on the provided feedback, I am going to close this issue for now. Please feel free to reopen the issue or open a new one for further discussions.
Is your feature request related to a problem? Please describe.
How to validate a Raft snapshot before using it to restore, to avoid restore with corrupted snapshot
Describe the solution you'd like
vault operator raft snapshot validate raft.snap -- to validate raft snapshot for corruption
Describe alternatives you've considered
Start restoring with the last known working raft snapshot (n), if it fails then try n-1 to test whether the cluster is stable
Explain any additional use-cases
When the Vault cluster is broken and a snapshot is taken during failed state, multiple corrupt snapshots are created with either less data or wrong leader node ip. Current process is taking more time to restore service.
Additional context
Not applicable
The text was updated successfully, but these errors were encountered: