New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition when two identical certificate requests are made from different clusters #6229
Comments
We've had this issue with a couple different DNS providers and has been solved for Route53 and CloudDNS so far, see
There probably is some way how to achieve the same for Azure DNS. None of the maintainers are very familiar with Azure and we also don't have Azure infra to test this, so perhaps would be good if an external contributor who is an Azure user and can test it on Azure could pick this up. |
The same problem exists for cloudflare dns |
For Azure DNS this can be solved using optimistic concurrency their APIs support. I have some working PoC I never attempted to upstream as I'm quite clueless how to properly and consistently test something like this: |
We are hitting the same issue with the Cloudflare provider. Would a more simple approach be to allow the text record name to be changed to something other than That way clusterA could use This would then fix it for all providers. I believe this is hardcoded here but could potentially be exposed as a config item (env, start flags or clusterIssuer spec) This would also be easier to add test cases for |
@chr15murray there is a change merged into main branch #5884 & #6191, i read that it will be released with 1.13-alpha1 image. We had same issue with our clusters and we managed to build main branch - it is working. |
Thanks, will look to get this deployed. Thanks @michalg91 |
Issues go stale after 90d of inactivity. |
Stale issues rot after 30d of inactivity. |
/remove-lifecycle rotten |
Continues to be an issue for Azure DNS. There's a PR open that should fix the issue: #6351 |
Issues go stale after 90d of inactivity. |
/remove-lifecycle stale |
Describe the bug:
Seems like we have a race condition when two Kubernetes clusters with similar (near identical) configuration try to request a certificate using DNS01 to create the same certificate.
We have two Kubernetes clusters, one primary and the other secondary for disaster recovery. Both are in different Azure regions and configured almost identically.
Identical
Certificate
resources are created on both clusters within a minute of each other. The first one to get a response from LetsEncrypt deletes the_acme-challenge
DNS record and the second cluster'sCertificate
resource is left in aREADY
state ofFalse
indefinitely.Expected behaviour:
The second cluster should detect that the
_acme-challenge
DNS record was deleted, and then re-attempt the request with LetsEncrypt.Steps to reproduce the bug:
See above.
Anything else we need to know?:
Environment details::
1.25.6
quay.io/jetstack/cert-manager-acmesolver:v1.11.1
1.11.1
/kind bug
The text was updated successfully, but these errors were encountered: