Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Operator restarting due to DetectAvailableArchitectures() #4423

Open
4 tasks done
sxd opened this issue May 2, 2024 · 0 comments
Open
4 tasks done

[Bug]: Operator restarting due to DetectAvailableArchitectures() #4423

sxd opened this issue May 2, 2024 · 0 comments
Assignees
Labels
bug 🐛 Something isn't working
Milestone

Comments

@sxd
Copy link
Member

sxd commented May 2, 2024

Is there an existing issue already for this bug?

  • I have searched for an existing issue, and could not find anything. I believe this is a new bug.

I have read the troubleshooting guide

  • I have read the troubleshooting guide and I think this is a new bug.

I am running a supported version of CloudNativePG

  • I have read the troubleshooting guide and I think this is a new bug.

Contact Details

No response

Version

1.23.0

What version of Kubernetes are you using?

1.30 (unsuppprted)

What is your Kubernetes environment?

Self-managed: kind (evaluation)

How did you install the operator?

YAML manifest

What happened?

This issue seems to be present overall, but so far I’ve only been able to reproduce it in certain cloud providers (mainly EKS, but also AKS).

What’s happening is that utils.DetectAvailableArchitectures() is slowing down RunController() enough so that the ReadinessProbePeriod 10 seconds are not respected anymore, and the pod gets killed by the Kubelet (and thus restart).

DetectAvailableArchitectures() should be calculating each architecture’s sha256 hash asynchronously, so it shouldn’t lock the startup of the manager.
Needs more investigation to understand if the function is not working properly or if we are just hitting the timeout.

Cluster resource

No response

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@sxd sxd added the bug 🐛 Something isn't working label May 2, 2024
@sxd sxd assigned sxd and NiccoloFei May 2, 2024
@sxd sxd added this to the 1.24.0 milestone May 3, 2024
mnencia pushed a commit that referenced this issue May 17, 2024
This fix is temporary and only for the E2E tests.

This patch must be reverted before the next release,
once the operator has improved the hash calculation
of the binary files (see #4423)

Signed-off-by: Jonathan Gonzalez V <jonathan.gonzalez@enterprisedb.com>
cnpg-bot pushed a commit that referenced this issue May 17, 2024
This fix is temporary and only for the E2E tests.

This patch must be reverted before the next release,
once the operator has improved the hash calculation
of the binary files (see #4423)

Signed-off-by: Jonathan Gonzalez V <jonathan.gonzalez@enterprisedb.com>
(cherry picked from commit 4fa891e)
cnpg-bot pushed a commit that referenced this issue May 17, 2024
This fix is temporary and only for the E2E tests.

This patch must be reverted before the next release,
once the operator has improved the hash calculation
of the binary files (see #4423)

Signed-off-by: Jonathan Gonzalez V <jonathan.gonzalez@enterprisedb.com>
(cherry picked from commit 4fa891e)
cnpg-bot pushed a commit that referenced this issue May 17, 2024
This fix is temporary and only for the E2E tests.

This patch must be reverted before the next release,
once the operator has improved the hash calculation
of the binary files (see #4423)

Signed-off-by: Jonathan Gonzalez V <jonathan.gonzalez@enterprisedb.com>
(cherry picked from commit 4fa891e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
Development

No branches or pull requests

2 participants