New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incomplete snapshots returned by snapshot save command #12168
Comments
Hi @needle82, It's possible you're timing out, hence the EOF. Have you tried setting |
Hi @ncabatoff , not yet, the snapshot is pretty small, just 2.2 Mb, I didn't expect timeout here. If it could be a rootcause, what timeout is your proposal? Thank you! |
The default timeout is 60s. Are your restore attempts failing after almost exactly one minute? If so, this is the likely cause. Try something much bigger, e.g. 30m. I don't expect it to take anywhere near that long, but this should serve to either confirm or deny this as the source of the problem. |
Yes, I confirm that the question is in couple of seconds in both directions. No any chance that I expected timeouts here. From my perspective it looks like with data grow we got some bad symbols while packing, because backups before 19th of July are able to be restored this way (they were for about 1.4 Mb weight). |
Hmm, that's odd. Is there anything in the logs of the server receiving the restore request? If you look inside the snapshot file (it's a gzipped tar file), does it contain all these files?
|
Hello @ncabatoff , it looks like the archive is damaged. $ tar tzvf temp_.snap gzip: stdin: unexpected end of file |
Ah, so maybe you need to be increasing the client timeout when taking the snapshots, in addition to when restoring them. |
Also note that this:
is an internal operation used to manage the storage on disk. It's not in response to a snapshot API call. |
I can see this three strings only when calling snapshot command (with CLI or REST, as far as I understand it uses the same REST calls): 2021-07-26T19:09:01.058Z [INFO] storage.raft: starting snapshot up to: index=107713 |
My mistake! The log messages can occur in response to a snapshot save request. But they can also occur without an external trigger. |
Back to the issue at hand: the |
Yes, exactly. On CLI side I can see nothing (exitcode is 0), on server side I can see only this three INFO messages like above. |
I think I can see how this might happen. I don't know why it's happening in your case, but I have a plausible code path that could lead to an error being swallowed. If true, this is a very serious bug, so I'm going to start work on reproducing and fixing it immediately. If you have an audit log, I'd be interested to see what the response audit record looks like. If you don't have an audit log, I'd be much obliged if you set one up temporarily. It might yield some additional clues. |
Hi @ncabatoff . I think I solved my issue, but it is still a bug. I enabled audit and got following: So, autounseal token wasn't autorenewed (I don't have any idea why) and I got an error sealing snapshot. So, the issue is that Vault exits successfully having error while sealing. And could you please recommend me how to prevent autounseal token expiration? I created it in usual way and expected autorenewal. Thank you for your help! |
Yeah, I had a hunch it was a seal-related issue, because the two missing files are the SHA256SUMS file, which is encrypted by the seal, and another file that contains the sha-256 sum of the SHA256SUMs file. There's still a bug in the snapshot API: we should never be returning an incomplete snapshot without error. I'm keeping this ticket open to address that. It's debatable whether an unusable seal should provoke Vault to do something. We typically get people asking for the opposite: they don't want failures resulting from transient issues reaching the service backing their seal. And depending on the seal type they might not want any avoidable traffic from going to it. As to your autounseal token expiration: I don't know offhand, but it doesn't sound like a bug, at least not yet. You'll have to do some investigation to see why renewal isn't occurring (maybe the role that generated the token isn't configured to allow token renewal?) If you need help in that area I suggest taking your questions to https://discuss.hashicorp.com/c/vault/30, since we prefer to reserve github issues for bug reports and feature requests. |
Thanks a lot for your help! |
Describe the bug
On Openshift environment I have Vault 1.7.2 running, every night I take a snapshot with REST call. I wanted to restore this snapshot to another environment with posting it into snapshot-force, but getting {"errors":["1 error occurred:\n\t* failed to read snapshot file: failed to read or write snapshot data: unexpected EOF\n\n"]} all the time. I tried different backups and found that old of them are recoverable. I'm not able to restore files that are taken manually by CLI too.
To Reproduce
Create new StatefulSet with Vault 1.7.2 which is configured to use Raft. Initialize it with
vault operator init
. Vault becomes unsealed with autounseal schema. Set port forwarding for serving 8200:8200 and call from local machinecurl -vvv -L --header "X-Vault-Token: token" -X POST --data-binary @backup.snap http://localhost:8200/v1/sys/storage/raft/snapshot-force
Expected behavior
Vault is rewritten with old data.
Environment:
Vault server configuration file(s):
Additional context
No problems while getting snapshot in logs:
2021-07-26T13:50:06.361Z [INFO] storage.raft: starting snapshot up to: index=106948
2021-07-26T13:50:06.362Z [INFO] storage.raft: compacting logs: from=96701 to=96708
2021-07-26T13:50:06.364Z [INFO] storage.raft: snapshot complete up to: index=106948
The text was updated successfully, but these errors were encountered: