-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mount table corrupted after GCS rate limiting #7455
Comments
Hi folks! Is this still an issue in newer versions of Vault? Please let me know so I can bubble it up accordingly. Thanks! |
@hsimon-hashicorp 👋🏻 I'm seeing similar issues in We've been rate limited on the
(I think) this led to leases corruption due to GCS started returning 503s which in a Vault cluster we have with 3 replicas it all happened at the same time and the cluster got completely sealed and since we use auto-unseal we had to fix it by pointing it to a backup GCS backend. Btw advices on how to better fix that are appreciated (perhaps deleting the leases?):
Other than that I could spot several other 503s doing other kind of operations + context deadline exceeded or context canceled. We're living in pretty dangerous situation at the moment so any help is appreciated! |
somewhat related/ potential improvement: |
Summary
Under heavy load, Vault encounters GCS rate limiting on the mount table object (core/mounts) and occasionally corrupts data. It appears to insert duplicate entries in the table.
Log Snippet
The cluster continues to operate normally after these errors until a leader election needs to take place. At that time, we see the following:
Because of this, no instance can become active and the cluster is unavailable. We’re not aware of any way to recover the storage after this error occurs and have resorted to restoring from backups (luckily this has been in our lab/test environment).
Other Details
vault read sys/mounts
does not show a duplicate entry in the table - however, when trying to unseal in a separate cluster, we get thecannot mount under existing mount
error.The text was updated successfully, but these errors were encountered: