-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Instance manager reconciler loop stuck if PostgreSQL restart fails #4501
Labels
triage
Pending triage
Comments
leonardoce
pushed a commit
to leonardoce/cloudnative-pg
that referenced
this issue
May 9, 2024
The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third example, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: cloudnative-pg#4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com>
leonardoce
pushed a commit
to leonardoce/cloudnative-pg
that referenced
this issue
May 9, 2024
The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third example, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: cloudnative-pg#4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com>
armru
pushed a commit
to leonardoce/cloudnative-pg
that referenced
this issue
May 13, 2024
The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third example, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: cloudnative-pg#4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com>
leonardoce
pushed a commit
to leonardoce/cloudnative-pg
that referenced
this issue
May 15, 2024
The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third example, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: cloudnative-pg#4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com>
mnencia
pushed a commit
to leonardoce/cloudnative-pg
that referenced
this issue
May 17, 2024
The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third example, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: cloudnative-pg#4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com>
mnencia
pushed a commit
to leonardoce/cloudnative-pg
that referenced
this issue
May 20, 2024
The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third example, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: cloudnative-pg#4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com>
mnencia
pushed a commit
that referenced
this issue
May 20, 2024
) The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third examples, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: #4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
cnpg-bot
pushed a commit
that referenced
this issue
May 20, 2024
) The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third examples, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: #4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com> (cherry picked from commit 09a4d80)
mnencia
pushed a commit
that referenced
this issue
May 20, 2024
) The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third examples, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: #4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com> (cherry picked from commit 09a4d80)
mnencia
pushed a commit
that referenced
this issue
May 20, 2024
) The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third examples, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: #4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com> (cherry picked from commit 09a4d80)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is there an existing issue already for this bug?
I have read the troubleshooting guide
I am running a supported version of CloudNativePG
Contact Details
leonardo.cecchi@enterprisedb.com
Version
1.23.0
What version of Kubernetes are you using?
1.30 (unsuppprted)
What is your Kubernetes environment?
Self-managed: kind (evaluation)
How did you install the operator?
YAML manifest
What happened?
The instance manager reconciler loop can be blocked by a PostgreSQL that fails to start up after being shut down.
Cluster resource
No response
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: