Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-1.13] BUGFIX: fix datarace due to registering and reading from scheme #6832

Merged
merged 1 commit into from
Mar 8, 2024

Conversation

inteon
Copy link
Member

@inteon inteon commented Mar 7, 2024

Manual backport of the fix in #6028.

The following script finds data races:

#!/bin/bash

cd ./cmd/controller/ && \
go run -race . --v=2 \
    --kubeconfig /home/tramlot/.kube/config \
    --cluster-resource-namespace="cert-manager" \
    --leader-election-namespace=kube-system \
    --acme-http01-solver-image=cert-manager-acmesolver-amd64:v1.12.8-9-gc47e9fb626951e \
    --kube-api-qps=9000 \
    --kube-api-burst=9000 \
    --concurrent-workers=200 \
    --feature-gates=AdditionalCertificateOutputFormats=true,ExperimentalCertificateSigningRequestControllers=true,ExperimentalGatewayAPISupport=true,ServerSideApply=true,LiteralCertificateSubject=true,UseCertificateRequestBasicConstraints=true \
    --max-concurrent-challenges=60 \
    --dns01-recursive-nameservers-only=true \
    --dns01-recursive-nameservers=10.0.0.16:53
Before
$ ./test.sh 
I0307 14:48:45.262415  952658 start.go:75] "cert-manager: starting controller" version="canary" git-commit=""
I0307 14:48:45.262558  952658 controller.go:262] "cert-manager/controller/build-context: configured acme dns01 nameservers" nameservers=["10.0.0.16:53"]
I0307 14:48:45.281345  952658 options.go:488] "cert-manager: enabling all experimental certificatesigningrequest controllers"
I0307 14:48:45.281392  952658 options.go:493] "cert-manager: enabling the sig-network Gateway API certificate-shim and HTTP-01 solver"
I0307 14:48:45.281474  952658 controller.go:82] "cert-manager/controller: enabled controllers: [certificaterequests-approver certificaterequests-issuer-acme certificaterequests-issuer-ca certificaterequests-issuer-selfsigned certificaterequests-issuer-vault certificaterequests-issuer-venafi certificates-issuing certificates-key-manager certificates-metrics certificates-readiness certificates-request-manager certificates-revision-manager certificates-trigger certificatesigningrequests-issuer-acme certificatesigningrequests-issuer-ca certificatesigningrequests-issuer-selfsigned certificatesigningrequests-issuer-vault certificatesigningrequests-issuer-venafi challenges clusterissuers gateway-shim ingress-shim issuers orders]"
I0307 14:48:45.282800  952658 controller.go:149] "cert-manager/controller: starting healthz server" address="[::]:9403"
I0307 14:48:45.282940  952658 controller.go:103] "cert-manager/controller: starting metrics server" address="[::]:9402"
I0307 14:48:45.283384  952658 controller.go:156] "cert-manager/controller: starting leader election"
I0307 14:48:45.287039  952658 leaderelection.go:245] attempting to acquire leader lease kube-system/cert-manager-controller...
I0307 14:48:45.294702  952658 leaderelection.go:255] successfully acquired lease kube-system/cert-manager-controller
==================
WARNING: DATA RACE
Write at 0x00c00002bb60 by main goroutine:
  runtime.mapassign()
      /usr/local/go/src/runtime/map.go:579 +0x0
  k8s.io/apimachinery/pkg/runtime.(*Scheme).AddUnversionedTypes()
      /home/tramlot/go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/runtime/scheme.go:128 +0x5e8
  k8s.io/apimachinery/pkg/apis/meta/v1.AddToGroupVersion()
      /home/tramlot/go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/apis/meta/v1/register.go:75 +0x63e
  github.com/cert-manager/cert-manager/pkg/apis/acme/v1.addKnownTypes()
      /home/tramlot/projects/cert-manager/pkg/apis/acme/v1/register.go:56 +0x2d8
  k8s.io/apimachinery/pkg/runtime.(*SchemeBuilder).AddToScheme()
      /home/tramlot/go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/runtime/scheme_builder.go:29 +0xa1
  k8s.io/apimachinery/pkg/runtime.(*SchemeBuilder).AddToScheme-fm()
      <autogenerated>:1 +0x24
  k8s.io/apimachinery/pkg/runtime.(*SchemeBuilder).AddToScheme()
      /home/tramlot/go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/runtime/scheme_builder.go:29 +0xa1
  k8s.io/apimachinery/pkg/runtime.(*SchemeBuilder).AddToScheme-fm()
      <autogenerated>:1 +0x24
  github.com/cert-manager/cert-manager/pkg/controller.(*ContextFactory).Build()
      /home/tramlot/projects/cert-manager/pkg/controller/context.go:333 +0x178
  github.com/cert-manager/cert-manager/pkg/controller.(*Builder).Complete()
      /home/tramlot/projects/cert-manager/pkg/controller/builder.go:83 +0xd2
  github.com/cert-manager/cert-manager/pkg/controller/certificate-shim/ingresses.init.0.func1()
      /home/tramlot/projects/cert-manager/pkg/controller/certificate-shim/ingresses/controller.go:150 +0x13e
  github.com/cert-manager/cert-manager/controller-binary/app.Run()
      /home/tramlot/projects/cert-manager/cmd/controller/app/controller.go:213 +0x1659
  github.com/cert-manager/cert-manager/controller-binary/app.CertManagerControllerOptions.RunCertManagerController()
      /home/tramlot/projects/cert-manager/cmd/controller/app/start.go:100 +0x39e
  github.com/cert-manager/cert-manager/controller-binary/app.NewCommandStartCertManagerController.func1()
      /home/tramlot/projects/cert-manager/cmd/controller/app/start.go:76 +0x37d
  github.com/spf13/cobra.(*Command).execute()
      /home/tramlot/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0xdbb
  github.com/spf13/cobra.(*Command).ExecuteC()
      /home/tramlot/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x617
  github.com/spf13/cobra.(*Command).Execute()
      /home/tramlot/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992 +0x26
  main.main()
      /home/tramlot/projects/cert-manager/cmd/controller/main.go:34 +0x8a

Previous read at 0x00c00002bb60 by goroutine 76:
  runtime.mapaccess2()
      /usr/local/go/src/runtime/map.go:457 +0x0
  k8s.io/apimachinery/pkg/runtime.(*Scheme).ObjectKinds()
      /home/tramlot/go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/runtime/scheme.go:267 +0x318
  k8s.io/apimachinery/pkg/runtime.(*parameterCodec).EncodeParameters()
      /home/tramlot/go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/runtime/codec.go:191 +0x92
  k8s.io/client-go/rest.(*Request).SpecificallyVersionedParams()
      /home/tramlot/go/pkg/mod/k8s.io/client-go@v0.27.2/rest/request.go:376 +0xf2
  k8s.io/client-go/rest.(*Request).VersionedParams()
      /home/tramlot/go/pkg/mod/k8s.io/client-go@v0.27.2/rest/request.go:369 +0x28f
  k8s.io/client-go/kubernetes/typed/coordination/v1.(*leases).Update()
      /home/tramlot/go/pkg/mod/k8s.io/client-go@v0.27.2/kubernetes/typed/coordination/v1/lease.go:135 +0x1ed
  k8s.io/client-go/tools/leaderelection/resourcelock.(*LeaseLock).Update()
      /home/tramlot/go/pkg/mod/k8s.io/client-go@v0.27.2/tools/leaderelection/resourcelock/leaselock.go:75 +0x532
  k8s.io/client-go/tools/leaderelection.(*LeaderElector).tryAcquireOrRenew()
      /home/tramlot/go/pkg/mod/k8s.io/client-go@v0.27.2/tools/leaderelection/leaderelection.go:363 +0xa27
  k8s.io/client-go/tools/leaderelection.(*LeaderElector).renew.func1.1()
      /home/tramlot/go/pkg/mod/k8s.io/client-go@v0.27.2/tools/leaderelection/leaderelection.go:270 +0x46
  k8s.io/apimachinery/pkg/util/wait.PollImmediateUntil.ConditionFunc.WithContext.func1()
      /home/tramlot/go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/wait.go:109 +0x2e
  k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext()
      /home/tramlot/go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/wait.go:154 +0x6f
  k8s.io/apimachinery/pkg/util/wait.poll()
      /home/tramlot/go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/poll.go:245 +0x50
  k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext()
      /home/tramlot/go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/poll.go:200 +0x99
  k8s.io/apimachinery/pkg/util/wait.PollImmediateUntil()
      /home/tramlot/go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/poll.go:187 +0x75
  k8s.io/client-go/tools/leaderelection.(*LeaderElector).renew.func1()
      /home/tramlot/go/pkg/mod/k8s.io/client-go@v0.27.2/tools/leaderelection/leaderelection.go:269 +0x18f
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
      /home/tramlot/go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:226 +0x41
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
      /home/tramlot/go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:227 +0xc4
  k8s.io/apimachinery/pkg/util/wait.JitterUntil()
      /home/tramlot/go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:204 +0x10a
  k8s.io/apimachinery/pkg/util/wait.Until()
      /home/tramlot/go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/backoff.go:161 +0x23d
  k8s.io/client-go/tools/leaderelection.(*LeaderElector).renew()
      /home/tramlot/go/pkg/mod/k8s.io/client-go@v0.27.2/tools/leaderelection/leaderelection.go:266 +0x15c
  k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run()
      /home/tramlot/go/pkg/mod/k8s.io/client-go@v0.27.2/tools/leaderelection/leaderelection.go:209 +0x21e
  github.com/cert-manager/cert-manager/controller-binary/app.startLeaderElection()
      /home/tramlot/projects/cert-manager/cmd/controller/app/controller.go:384 +0x4b9
  github.com/cert-manager/cert-manager/controller-binary/app.Run.func6()
      /home/tramlot/projects/cert-manager/cmd/controller/app/controller.go:162 +0x38b
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /home/tramlot/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75 +0x91
After
$ ./test.sh 
I0307 15:20:59.135888 1067740 controller.go:263] "cert-manager/controller/build-context: configured acme dns01 nameservers" nameservers=["10.0.0.16:53"]
I0307 15:20:59.164708 1067740 options.go:231] "cert-manager: enabling all experimental certificatesigningrequest controllers"
I0307 15:20:59.164755 1067740 options.go:236] "cert-manager: enabling the sig-network Gateway API certificate-shim and HTTP-01 solver"
I0307 15:20:59.164841 1067740 controller.go:83] "cert-manager/controller: enabled controllers: [certificaterequests-approver certificaterequests-issuer-acme certificaterequests-issuer-ca certificaterequests-issuer-selfsigned certificaterequests-issuer-vault certificaterequests-issuer-venafi certificates-issuing certificates-key-manager certificates-metrics certificates-readiness certificates-request-manager certificates-revision-manager certificates-trigger certificatesigningrequests-issuer-acme certificatesigningrequests-issuer-ca certificatesigningrequests-issuer-selfsigned certificatesigningrequests-issuer-vault certificatesigningrequests-issuer-venafi challenges clusterissuers gateway-shim ingress-shim issuers orders]"
I0307 15:20:59.165499 1067740 controller.go:157] "cert-manager/controller: starting leader election"
I0307 15:20:59.165561 1067740 controller.go:104] "cert-manager/controller: starting metrics server" address="[::]:9402"
I0307 15:20:59.165619 1067740 controller.go:150] "cert-manager/controller: starting healthz server" address="[::]:9403"
I0307 15:20:59.175719 1067740 leaderelection.go:250] attempting to acquire leader lease kube-system/cert-manager-controller...
I0307 15:20:59.181492 1067740 leaderelection.go:260] successfully acquired lease kube-system/cert-manager-controller
I0307 15:20:59.190887 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="clusterissuers"
I0307 15:20:59.199501 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="gateway-shim"
I0307 15:20:59.209154 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificates-trigger"
I0307 15:20:59.219522 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificatesigningrequests-issuer-venafi"
I0307 15:20:59.229447 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificaterequests-issuer-ca"
I0307 15:20:59.240097 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificaterequests-issuer-vault"
I0307 15:20:59.251153 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificatesigningrequests-issuer-acme"
I0307 15:20:59.260238 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificates-issuing"
I0307 15:20:59.270784 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="orders"
I0307 15:20:59.282362 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificaterequests-issuer-acme"
I0307 15:20:59.293059 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificates-revision-manager"
I0307 15:20:59.303319 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificaterequests-issuer-venafi"
I0307 15:20:59.314281 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificatesigningrequests-issuer-selfsigned"
I0307 15:20:59.324814 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificatesigningrequests-issuer-vault"
I0307 15:20:59.334534 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="issuers"
I0307 15:20:59.344413 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificates-metrics"
I0307 15:20:59.353932 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificates-readiness"
I0307 15:20:59.363114 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificates-request-manager"
I0307 15:20:59.373088 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="challenges"
I0307 15:20:59.383383 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificaterequests-issuer-selfsigned"
I0307 15:20:59.393170 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificatesigningrequests-issuer-ca"
I0307 15:20:59.403401 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="ingress-shim"
I0307 15:20:59.413665 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificaterequests-approver"
I0307 15:20:59.425724 1067740 controller.go:227] "cert-manager/controller: starting controller" controller="certificates-key-manager"
^CI0307 15:24:27.237553 1067740 controller.go:127] "cert-manager/orders: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.238134 1067740 controller.go:127] "cert-manager/issuers: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.238241 1067740 controller.go:127] "cert-manager/certificaterequests-issuer-ca: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.238358 1067740 controller.go:127] "cert-manager/certificates-key-manager: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.238587 1067740 controller.go:127] "cert-manager/certificatesigningrequests-issuer-selfsigned: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.238612 1067740 controller.go:127] "cert-manager/challenges: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.238683 1067740 controller.go:127] "cert-manager/certificaterequests-issuer-acme: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.238894 1067740 controller.go:127] "cert-manager/certificates-readiness: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.239725 1067740 controller.go:127] "cert-manager/certificaterequests-issuer-selfsigned: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.239738 1067740 controller.go:127] "cert-manager/clusterissuers: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.239790 1067740 controller.go:127] "cert-manager/certificatesigningrequests-issuer-venafi: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.239806 1067740 controller.go:127] "cert-manager/certificates-request-manager: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.240374 1067740 controller.go:127] "cert-manager/certificatesigningrequests-issuer-vault: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.240449 1067740 controller.go:127] "cert-manager/certificatesigningrequests-issuer-acme: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.240492 1067740 controller.go:127] "cert-manager/certificaterequests-issuer-venafi: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.240529 1067740 controller.go:127] "cert-manager/certificates-revision-manager: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.240620 1067740 controller.go:127] "cert-manager/gateway-shim: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.241010 1067740 controller.go:127] "cert-manager/ingress-shim: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.241508 1067740 controller.go:127] "cert-manager/certificaterequests-approver: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.241567 1067740 controller.go:127] "cert-manager/certificatesigningrequests-issuer-ca: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.242141 1067740 controller.go:127] "cert-manager/certificaterequests-issuer-vault: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.242256 1067740 controller.go:127] "cert-manager/certificates-trigger: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.242320 1067740 controller.go:127] "cert-manager/certificates-issuing: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.243091 1067740 controller.go:127] "cert-manager/certificates-metrics: shutting down queue as workqueue signaled shutdown"
I0307 15:24:27.246821 1067740 controller.go:246] "cert-manager/controller: control loops exited"

We underestimated the #6028 PR, we thought it fixed race conditions in tests only. It does fix race conditions in the released binaries too, and thus should be backported.

Kind

/kind bug

Release Note

BUGFIX: fix race condition due to registering and using global runtime.Scheme variables

@jetstack-bot jetstack-bot added kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. area/acme Indicates a PR directly modifies the ACME Issuer code area/testing Issues relating to testing size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 7, 2024
Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are more changes here than are strictly necessary to fix the race condition reported in #6822.

And I can't tell whether the extra changes might cause unforeseen problems.

pkg/controller/context.go Outdated Show resolved Hide resolved
pkg/api/scheme.go Outdated Show resolved Hide resolved
pkg/controller/context.go Outdated Show resolved Hide resolved
pkg/controller/context.go Outdated Show resolved Hide resolved
pkg/controller/context.go Outdated Show resolved Hide resolved
pkg/controller/context.go Show resolved Hide resolved
@jetstack-bot jetstack-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 7, 2024
@inteon inteon changed the title [release-1.13] create ad-hoc schemes instead of sharing global ones [release-1.13] BUGFIX: fix datarace due to registering and reading from scheme Mar 7, 2024
@inteon inteon requested a review from wallrj March 7, 2024 20:14
pkg/controller/context.go Show resolved Hide resolved
pkg/controller/context.go Outdated Show resolved Hide resolved
Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sorry for earlier snarky comments, I was in a bad mood.

/lgtm
/approve

@jetstack-bot jetstack-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 8, 2024
@jetstack-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wallrj

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jetstack-bot jetstack-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 8, 2024
@inteon
Copy link
Member Author

inteon commented Mar 8, 2024

/retest

@jetstack-bot jetstack-bot merged commit e4fc764 into cert-manager:release-1.13 Mar 8, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/acme Indicates a PR directly modifies the ACME Issuer code area/testing Issues relating to testing dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants