Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Lower Memory Usage - Filtered Secrets Label Caused Unwarranted Renewal of Certificates #6494

Closed
antstacks opened this issue Nov 16, 2023 · 1 comment
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@antstacks
Copy link

antstacks commented Nov 16, 2023

Describe the bug:
Customer recently upgraded their Cert-Manager version to v1.13. As part of that upgrade, their certificates were labeled with 'controller.cert-manager.io/fao' as part of the lowering memory utilization efforts which was promoted to beta in v1.13 (https://cert-manager.io/docs/releases/release-notes/release-notes-1.12/#lower-memory-footprint).

This seemingly caused some of the cluster certificates to be labeled successfully, but some certificates seemed to miss the allocation of the label and were subsequently renewed because of it.

I1115 14:20:35.237545       1 secret_manager.go:98] "cert-manager/certificates-issuing: applying Secret data" key="cert-utils-operator/serving-cert" resource_name="serving-cert" resource_namespace="cert-utils-operator" resource_kind="Certificate" resource_version="v1" secret="webhook-server-cert" message="Secret is missing these Managed Labels: [controller.cert-manager.io/fao]"
I1115 14:20:36.531269       1 trigger_controller.go:215] "cert-manager/certificates-trigger: Certificate must be re-issued" key="test-user/[***redacted cert info***]" reason="DoesNotExist" message="Issuing certificate as Secret does not exist"
I1115 14:20:36.531306       1 conditions.go:203] Setting lastTransitionTime for Certificate "[***redacted cert info***]" condition "Issuing" to 2023-11-15 14:20:36.531300162 +0000 UTC m=+1.705586005
I1115 14:20:36.532243       1 conditions.go:192] Found status change for Certificate "[***redacted cert info***]" condition "Ready": "True" -> "False"; setting lastTransitionTime to 2023-11-15 14:20:36.53223235 +0000 UTC m=+1.706518194

Expected behaviour:
Certificates should be labeled with the filtered secrets label, but renewal should not be triggered.

Steps to reproduce the bug:
Upgrade to v1.13 from a previous version <v1.12, making sure there are existing certificates to be labeled.

Environment details::

  • Kubernetes version: Openshift
  • Cloud-provider/provisioner:
  • cert-manager version:
  • Install method: e.g. helm/static manifests

/kind bug

@jetstack-bot jetstack-bot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 16, 2023
@inteon
Copy link
Member

inteon commented Nov 17, 2023

Thanks for reporting this issue @antstacks.


Root cause analysis:

In v1.13, the SecretsFilteredCaching feature was graduated to Beta (enabled by default): #6298.

The SecretsFilteredCaching feature splits the Secret resource cache into two caches: a "full" cache and a "metadata-only" cache (#5824). Secrets that have the controller.cert-manager.io/fao label are cached in the "full" cache, other Secrets are cached in the "metadata-only" cache.

When the cert-manager controller wants to GET a Secret, it looks if the secret exists in one of the two caches. If no cache contains the secret, the controller thinks the Secret does not exist. However, in case the controller.cert-manager.io/fao label is added or removed from a Secret, the Secret might appear in both caches or none of the caches for a short time (due to the caches being out-of-sync). When the controller checks if the Secret exists during this transition window, we might incorrectly trigger re-issuance because we detected that the Secret does not exist.


This is only a problem in case not all Secrets have the controller.cert-manager.io/fao label yet. If you follow the advised upgrade path (upgrading to v1.12 before upgrading to v1.13), all Secrets should already have the controller.cert-manager.io/fao label. For that reason, we updated the release notes to clearly state that you should upgrade to v1.12 first. We hope this sufficiently helps you with your problem, please let us know if you think we need to take additional measures to fix this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants