-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: timeout when restarting PostgreSQL and while lifting fencing #4504
fix: timeout when restarting PostgreSQL and while lifting fencing #4504
Conversation
❗ By default, the pull request is configured to backport to all release branches.
|
658e0ad
to
5ac35a9
Compare
Hi! This one seems harmless and needed, but I encourage you to take a deep look. After much thinking, I still didn't make up my mind about whether it was better to fix it or leave it as it is and employ the energies on refactoring, to make the code flow clearer. |
5ac35a9
to
e7819c4
Compare
e7819c4
to
9ee4fa5
Compare
/test limit=local |
@leonardoce, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/9091027252 |
9ee4fa5
to
c89fbe7
Compare
/test tl=4 l=local |
@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/9131874590 |
The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third example, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: cloudnative-pg#4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com>
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
c89fbe7
to
0ad2f7f
Compare
/ok-to-merge E2E green. Only expected unrelated failures. |
) The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third examples, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: #4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com> (cherry picked from commit 09a4d80)
) The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third examples, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: #4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com> (cherry picked from commit 09a4d80)
) The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third examples, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: #4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com> (cherry picked from commit 09a4d80)
The instance manager starts PostgreSQL:
In the second and third examples, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout.
If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up.
This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster.
Fixes: #4501