New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a single cache for all dynamic controllers (i.e. XRs and claims) #5651
Conversation
f2f7122
to
ab57920
Compare
CC @sttts - this makes big changes to the realtime compositions implementation. |
c590da9
to
0ba7db9
Compare
Looks like the E2E tests are passing pretty reliably now. The only thing blocking moving this out of draft is writing some more unit tests. |
I am working on implementing watches in provider-kubernetes, and the PR is based on the previous real-time composition implementation, which is being refactored here. While the provider-kubernetes use case is more straightforward (no controller start/stop at runtime), I believe the new structure is more usable with the unified/shared cache then the previous implementation. When I check the functionality introduced here, I have a feeling like I could leverage things like |
@turkenh How would you feel about duplicating them to start with? We could then consolidate later. I'm hopeful we can get the tracking cache and stoppable source functionality added upstream to controller-runtime. If that happens I could imagine a world where:
|
I've moved this out of draft. It should be ready for final review. @sttts Do you want to do another pass? The history is very detailed - I'm leaving it as is to aid with review. I'll clean it up and squash it before merging. |
@turkenh Thanks for approving! I can't proceed until crossplane/crossplane-runtime#689 is also approved. Could you take a look? It's pretty much just deleting code. |
a8da7ab
to
24b03c1
Compare
This just moved the files, unedited, as of the below commit. crossplane/crossplane-runtime@8641eb2 Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
This bumps crossplane-runtime, controller-runtime, and k8s.io dependencies to latest. Per the below PR, the latest crossplane-runtime doesn't have the controller engine anymore. It moved into c/c. crossplane/crossplane-runtime#689 Signed-off-by: Nic Cope <nicc@rk0n.org>
Updating our controller-runtime and Kubernetes dependencies bumped our minimum Go version to v1.22. That in turn enables some new linters, since we no longer need to copy range vars in Go v1.22. Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
It started as a function, but now we pass several arguments that are all fields of the Reconciler. It's only called once, by the Reconciler. Making it a method shortens the function signature, and makes it clear which things change on each reconcile and which are fixed. Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <nicc@rk0n.org>
// RemoveInformer removes an informer entry and stops it if it was running. | ||
// | ||
// Removing an informer marks the informer as inactive. | ||
func (c *InformerTrackingCache) RemoveInformer(ctx context.Context, obj client.Object) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: a smoke (unit) test might help to detect when the cache.Cache
interface changes. The overridden methods are must be exhaustive for a given gvk. So there is risk that this breaks and will be hard to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another approach that could work would be to stop embedding cache.Cache
. Instead wrap one and implement all of its methods. That way we'll stop satisfying the interface if it changes. Will follow up in another PR.
Description of your changes
Fixes #5338
Fixes #2645
Closes #5463
Closes #5468
Closes #5422
It's possible this will fix #5400, #5151, #5533, and #5228 given that it makes large changes to how realtime compositions work. I'd rather not assume though - we can see if folks can reproduce if/when this PR is merged.
Crossplane uses a "controller engine" to dynamically start claim and XR controllers when a new XRD is installed, and stop them when the XRD is uninstalled.
Before this PR, each controller gets at least one cache. This is because when I built the controller engine, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). Instead, we stop entire caches.
When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches.
Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26):
This PR uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Each controller tracks what watches ("sources" in controller-runtime) it has active and stops them when it doesn't need them anymore. The controller engine also garbage collects custom resource informers when their CRDs are deleted.
Compared to the current implementation, this:
I also think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine (and into one repo), but that's pretty subjective.
Here's a questionable Mermaid diagram of how it works:
It's conceptually pretty similar to a typical set of controller-runtime controllers run by a
controller.Manager
, but with the ability to stop watches and informers.Reviewer Note
This PR depends on crossplane/crossplane-runtime#689.
Unfortunately, this required bumping a bunch of dependencies, mostly to get the latest controller-runtime. Those dependency bumps in turn required a bunch of other little fixes (CRD generation issues, new linter warnings, etc).
I also opted to move the ControllerEngine implementation from c/cr to c/c. I think that will make changes like this easier in future.
While this all makes this PR pretty huge, it's all broken out into in separate commits. The last commit in the PR contains the bulk of the changes - reviewers can focus on that.
I'm opening this as draft since it's a big change. I still need to add more tests for the new code (e.g. the new engine implementation).
I have:
make reviewable
to ensure this PR is ready for review.Added or updated e2e tests.Linked a PR or a docs tracking issue to document this change.Addedbackport release-x.y
labels to auto-backport this PR.Need help with this checklist? See the cheat sheet.