[Bug]: Operator restarting due to DetectAvailableArchitectures() #4423

sxd · 2024-05-02T12:19:59Z

Is there an existing issue already for this bug?

I have searched for an existing issue, and could not find anything. I believe this is a new bug.

I have read the troubleshooting guide

I have read the troubleshooting guide and I think this is a new bug.

I am running a supported version of CloudNativePG

I have read the troubleshooting guide and I think this is a new bug.

Contact Details

No response

Version

1.23.0

What version of Kubernetes are you using?

1.30 (unsuppprted)

What is your Kubernetes environment?

Self-managed: kind (evaluation)

How did you install the operator?

YAML manifest

What happened?

This issue seems to be present overall, but so far I’ve only been able to reproduce it in certain cloud providers (mainly EKS, but also AKS).

What’s happening is that utils.DetectAvailableArchitectures() is slowing down RunController() enough so that the ReadinessProbePeriod 10 seconds are not respected anymore, and the pod gets killed by the Kubelet (and thus restart).

DetectAvailableArchitectures() should be calculating each architecture’s sha256 hash asynchronously, so it shouldn’t lock the startup of the manager.
Needs more investigation to understand if the function is not working properly or if we are just hitting the timeout.

Cluster resource

No response

Relevant log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

This fix is temporary and only for the E2E tests. This patch must be reverted before the next release, once the operator has improved the hash calculation of the binary files (see #4423) Signed-off-by: Jonathan Gonzalez V <jonathan.gonzalez@enterprisedb.com>

This fix is temporary and only for the E2E tests. This patch must be reverted before the next release, once the operator has improved the hash calculation of the binary files (see #4423) Signed-off-by: Jonathan Gonzalez V <jonathan.gonzalez@enterprisedb.com> (cherry picked from commit 4fa891e)

sxd added the bug 🐛 Something isn't working label May 2, 2024

sxd assigned sxd and NiccoloFei May 2, 2024

sxd added this to the 1.24.0 milestone May 3, 2024

mnencia mentioned this issue May 17, 2024

ci(fix): increase cpu requirements for the operator deployment #4568

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Operator restarting due to DetectAvailableArchitectures() #4423

[Bug]: Operator restarting due to DetectAvailableArchitectures() #4423

sxd commented May 2, 2024

[Bug]: Operator restarting due to DetectAvailableArchitectures() #4423

[Bug]: Operator restarting due to DetectAvailableArchitectures() #4423

Comments

sxd commented May 2, 2024

Is there an existing issue already for this bug?

I have read the troubleshooting guide

I am running a supported version of CloudNativePG

Contact Details

Version

What version of Kubernetes are you using?

What is your Kubernetes environment?

How did you install the operator?

What happened?

Cluster resource

Relevant log output

Code of Conduct