📖 Add designs/multi-cluster.md #2746

sttts · 2024-03-31T13:38:14Z

Controller-runtime today allows to write controllers against one cluster only.
Multi-cluster use-cases require the creation of multiple managers and/or cluster
objects. This proposal is about adding native support for multi-cluster use-cases
to controller-runtime.

The proposed changes are prototyped in #2726.

Signed-off-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>

k8s-ci-robot · 2024-03-31T13:38:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sttts
Once this PR has been reviewed and has the lgtm label, please assign sbueringer for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alvaroaleman · 2024-03-31T14:22:31Z

designs/multi-cluster.md

+}
+
+// pkg/handler
+type DeepCopyableEventHandler interface {


The eventhandlers are stateless, why do we need the deepcopy for them?

Looking at the propotype. I think this is because EventHandler then would store the Cluster (it is using that info to set the ClusterName field in the request)

This is gone now in #2726.

Will update the design here.

alvaroaleman · 2024-03-31T14:31:07Z

designs/multi-cluster.md

+	Disengage(context.Context, Cluster) error
+}
+```
+In particular, controllers implement the `AwareRunnable` interface. They react


Rather than changing the controller type directly and requiring all its dependencies to known how to deepcopy themselves, how about having something like a controllerconstructor (name tbd) in between that is filled with a []watchConstructor{source func(Cluster) source.Source, handler func(Cluster) handler.Handler, predicate func(cluster) []predicate.Predicate}?

I think this would require more invasive changes to our public API (the Controller interface)

No, you can call Watch on an existing controller. The idea is to not let the Controller or its dependencies have any knowledge about this but instead have a thing on top of the Controller that is configured with constructors that take a cluster.Cluster and return a source/predicate/handler and then uses those to call Watch when a new cluster appears.

When one disappears, it would cancel the context on the Source.

The idea really is the opposite, I do not want the Controller to know how to extend itself like this, IMHO this is a higher-level abstraction.

Compare #2726 after latest push. I have implemented @alvaroaleman's idea via a MultiClusterController wrapper implementing cluster.AwareRunnable and just calling Watch on the actual controller. All the deepcopy'ing is gone 🎉 Much nicer IMO. @alvaroaleman great intuition!

alvaroaleman · 2024-03-31T14:33:58Z

designs/multi-cluster.md

+// pkg/cluster
+type Provider interface {
+   Get(ctx context.Context, clusterName string, opts ...Option) (Cluster, error)
+   List(ctx context.Context) ([]string, error)


Why return []string here rather than []Cluster?

+1 Would be good for consistency with the Get func

There is a misunderstanding of the interface. The getter is actually the constructor. The life-cycle of the returned clusters is owned by the manager (they are added as runnables). Hence, the List returns names, not clusters. We should rather rename Get to Create or Connect.

alvaroaleman · 2024-03-31T14:37:16Z

designs/multi-cluster.md

+}
+```
+
+The `ctrl.Manager` will use the provider to watch clusters coming and going, and


I'll have to think about if and how this is doable, but ideally the "thing that comes and goes" wouldn't be typed to cluster.Cluster but can be anything, so this mechanism can also be used if folks have sources that are not kube watches

Would this be mostly about a more generic name? (can't think of much that would work, maybe something like scope)

designs/multi-cluster.md

elmiko

i think this is an interesting idea and i could see using it, i just have a question about some of the mechanics.

for context, i am investigating a cluster-api provider for karpenter and it would be nice to have the controllers discriminate between objects in the management cluster and objects in the workload clusters.

elmiko · 2024-04-08T17:29:42Z

designs/multi-cluster.md

+### Examples
+
+- Run a controller-runtime controller against a kubeconfig with arbitrary many contexts, all being reconciled.
+- Run a controller-runtime controller against cluster-managers like kind, Cluster-API, Open-Cluster-Manager or Hypershift.


given the cluster-api example here, is the intention that controllers will be able to reconcile CRDs in clusters that they know about that may only exist in a subset of clusters (e.g. Machine objects in the management cluster but not in the workload cluster) ?

Good point. I think that has to be possible. Otherwise we need all resources that we watch in all clusters

(especially good point because today a controller crashes if a resource doesn't exist)

EDIT: Further down:

For example, it can well be that every cluster has different REST mapping because installed CRDs are different. Without a context, we cannot return the right REST mapper.

Good point. Question is whether one would rather group them in managers such that every manager has a uniform set of clusters.

See my updated PR #2726. You can now opt into provider and/or the default cluster per controller via options:

// EngageWithDefaultCluster indicates whether the controller should engage // with the default cluster of a manager. This defaults to false through the // global controller options of the manager if a cluster provider is set, // and to true otherwise. Here it can be overridden. EngageWithDefaultCluster *bool // EngageWithProvidedClusters indicates whether the controller should engage // with the provided clusters of a manager. This defaults to true through the // global controller options of the manager if a cluster provider is set, // and to false otherwise. Here it can be overridden. EngageWithProviderClusters *bool

There is no logic yet for a controller to decide whether to engage with a provider cluster or not. Now it's with all of them. If the setup is more diverse, we might want such a functionality, e.g. some kind of pre-check: ctrl.WantsToEngage(ctx, cluster) bool`.

i'm still understanding the changes in #2726, but i think what you are saying here makes sense to me and would solve the issue.

some kind of pre-check: ctrl.WantsToEngage(ctx, cluster) bool`.

+1, i think we definitely need some way for the client user to specify when it should check a specific cluster for a resource.

I somehow think it should be the author's and managers responsibility (for now) to group them into groups which are working with the pattern. At this point, we don't know what we don't know. Once this is released, we can gather some feedback on edge cases and take it from there. I suspect the majority of use cases will be still single cluster reconcile loops.

Maybe document this edge case and mark this feature overall as experimental? This way we not committing to full production level stability, and allow to gather more feedback?

designs/multi-cluster.md

Add designs/multi-cluster.md

eb207cb

Signed-off-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 31, 2024

k8s-ci-robot requested review from varshaprasad96 and vincepri March 31, 2024 13:38

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 31, 2024

alvaroaleman reviewed Mar 31, 2024

View reviewed changes

embik reviewed Apr 4, 2024

View reviewed changes

designs/multi-cluster.md Show resolved Hide resolved

sbueringer mentioned this pull request Apr 4, 2024

Multi Cluster Example / Pattern #2755

Open

elmiko reviewed Apr 8, 2024

View reviewed changes

sbueringer reviewed Apr 11, 2024

View reviewed changes

designs/multi-cluster.md Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📖 Add designs/multi-cluster.md #2746

📖 Add designs/multi-cluster.md #2746

sttts commented Mar 31, 2024 •

edited

k8s-ci-robot commented Mar 31, 2024

alvaroaleman Mar 31, 2024

sbueringer Apr 11, 2024 •

edited

sttts Apr 23, 2024

alvaroaleman Mar 31, 2024

sbueringer Apr 11, 2024

alvaroaleman Apr 11, 2024

sttts Apr 23, 2024

alvaroaleman Mar 31, 2024

sbueringer Apr 10, 2024

sttts Apr 22, 2024

alvaroaleman Mar 31, 2024

sbueringer Apr 11, 2024

elmiko left a comment

elmiko Apr 8, 2024

sbueringer Apr 10, 2024 •

edited

sttts Apr 22, 2024

sttts Apr 23, 2024

elmiko Apr 23, 2024

mjudeikis Apr 28, 2024

📖 Add designs/multi-cluster.md #2746

Are you sure you want to change the base?

📖 Add designs/multi-cluster.md #2746

Conversation

sttts commented Mar 31, 2024 • edited

k8s-ci-robot commented Mar 31, 2024

Choose a reason for hiding this comment

sbueringer Apr 11, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elmiko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbueringer Apr 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sttts commented Mar 31, 2024 •

edited

sbueringer Apr 11, 2024 •

edited

sbueringer Apr 10, 2024 •

edited