New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vault does not recover Postgres Connection #12194
Comments
Updated re investigation 2021/8/11: |
found this gist and referenced azure post. we might give that a try https://gist.github.com/moraisaugusto/0a5870c8751649708e5a3c6e0f154113 |
Checked up configure of our Azure Postgres DB, both |
it happened again. close to the same moment on two totally separated platforms. would it be possible to enhance the readiness probe in the helm chart? or to include a config option to crash on loosing connection |
Thanks @adrianliechti - I'll check with the team to see if we can't get to the bottom of this. |
Hi @adrianliechti - first, this is a "community-supported" backend, meaning that developers in the community created and maintain it. Also, it would be good to see Vault logs and diagnostics from the time of the PostgresDB failure, including information about open connections to PostgresDB. Have you set max_idle_connections on your PostgresDB? If it's a non-zero number, you may be hitting a cap there. Hope this helps! |
@hsimon-hashicorp we really appreciate! I will double-check the logs & recordings and ask my colleague to gather some more insights in the moment it happens. elsewise we have to wait quite for some time (~ 2 weeks) each time we try something :) fyi: for the moment we created a oldschool watchdog...
|
Wow! Thanks for sharing that, awesome work for the meantime. I'll check back in periodically to see if you've been able to get that data - vault debug in your dev environment might provide some clues too? You can run it against a running instance, but it'll throw some warnings about not being able to bind to the port (because it's already running). Hopefully together we can figure out a cause and a solution! |
Hi @adrianliechti - I just wanted to check in and see if you'd been able to run a vault debug, and if we can get that information. Please let me know how I can be of further assistance. Thanks! |
Thank you so much for coming back to this! We plan to update to the latest Vault version this evening
Unfortunately we had a super busy sprint and limited resources for a deep dive into the issue. But here are some screenshots of the database metrics active connections are not exhausting. the incident happend after the second drop from 6 -> 4; and the next short step from 6-> 7 the failed connection is 0 and the cpu very bored for me it looks like the pg driver might fail to detect closed connections... or is not really removing it from the pool and keeps using it |
Can you try with Vault 1.8.4? We recently upgraded our |
yes we will. i think the team awaited a new helm chart - but we will overwrite the image version directly and use it that way thanks a ton for all your interest! |
Hi @adrianliechti - wanted to circle back on this and see if you're still having issues. Thanks! |
Worked on this issue together with @adrianliechti and I can confirm that we didn't experienced any issues again since we installed version 1.8.4 some weeks ago. Thanks a lot for your superb support! |
@matthiasfehr Thanks so much for checking again and getting back to us! :) |
Hi Vault Team,
We use Vault using an Azure Postgres DB (11) and Azure Vault as Unsealing. Vault itself runs on a Kubernetes Cluster as single container.
After some days (usually the same rhythm on our prod and staging platform), Vault doesn't work anymore.
2021-07-28T08:29:00.866Z [ERROR] core: writing request counters to barrier: err="failed to save request counters: write tcp 10.240.2.42:58132->xxxxxxx:5432: write: connection reset by peer"
2021-07-28T08:29:30.866Z [ERROR] core: writing request counters to barrier: err="failed to save request counters: write tcp 10.240.2.42:58132->xxxxxxx:5432: write: connection reset by peer"
2021-07-28T08:30:00.866Z [ERROR] core: writing request counters to barrier: err="failed to save request counters: write tcp 10.240.2.42:58132->xxxxxxx:5432: write: connection reset by peer"
2021-07-28T08:30:30.866Z [ERROR] core: writing request counters to barrier: err="failed to save request counters: write tcp 10.240.2.42:58132->xxxxxxx:5432: write: connection reset by peer"
Describe the bug
Vault seems to loose the connection to DB - and does not recover anymore itself.
A restart of Vault usually helps.
To Reproduce
We were able to have the same experience using this steps:
Is this a known issue?
If there are some workarounds using connection strings tweaks - we are happy to implement and test them
The text was updated successfully, but these errors were encountered: