Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🏃 Proposal to extract cluster-specifics out of the Manager #1075

Merged
merged 1 commit into from Sep 22, 2020

Conversation

alvaroaleman
Copy link
Member

A proposal based on #950
/assign @mengqiy @vincepri @estroz

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 26, 2020
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 26, 2020
`runnables` are started.


The new `ClusterConnector` interface will look like this:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideas for a better name are very welcome :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybePeerCluster or ClusterPeer? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding naming: What about just Cluster? Its pretty much what this represents

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster seems fine (it's bugging me a little bit that it doesn't say that it's not super clear that this type doesn't set up a cluster, like envtest, but rather connects to a cluster, but don't consider that comment blocking)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster it is now


```go
if cc, isClusterConnector:= runnable.(clusterconnector.ClusterConnector); isClusterConnector {
m.caches = append(m.caches, cc.GetCache())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the whole thing a bit weird, because we will end up starting both the ClusterConnectors Cache and the ClusterConnector separately. The alternate would be to require ppl to do a mgr.Add(clusterConnector.GetCache) which I would like to avoid because its unintuitive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably fine to do type assertion here, since this's internal implementation details.

The alternate would be to require ppl to do a mgr.Add(clusterConnector.GetCache) which I would like to avoid because its unintuitive.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just do:

type HasCaches interface {
  GetCache() ...
}

if getter, hasCaches := runnable.(hasCaches); hasCaches {
  ...
}

(could make a additional method to avoid accidentally conflicting too)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

GetWebhookServer() *webhook.Server
}
```

Copy link
Contributor

@kramerul kramerul Jul 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to separate this into 3 interfaces? In my point of view, this will make the API clearer. In multicluster environments, you have to call MultiClusterManager#Connect once for each connected cluster in contrast to have one implicit connection and having n-1 explicit connections.

type cluserconnector.ClusterConnector{
    // same as above
...
}}
type MultiClusterManager interface {
    // Create a new connection to a cluster. Creates a new instance of ClusterConnector and adds it to this manager
    Connect(config *rest.Config, name string, opts ...Option) cluserconnector.ClusterConnector
    // same as Manager from proposal except cluserconnector.ClusterConnector
    Add(Runnable) error
    Elected() <-chan struct{}
    SetFields(interface{}) error
    AddMetricsExtraHandler(path string, handler http.Handler) error
    AddHealthzCheck(name string, check healthz.Checker) error
    AddReadyzCheck(name string, check healthz.Checker) error
    Start(<-chan struct{}) error
    GetWebhookServer() *webhook.Server
}

type Manager interface {
     cluserconnector.ClusterConnector
     MultiClusterManager
}

Introducing a factory method Connect inside MultiClusterManager would also solve the problems of having to use type assertions (see comment of @alvaroaleman).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kramerul Generally I like the idea (although I would probably embedd the Manager ito the MultiClusterManager and not the other way round) but it opens up an interesting question: What config will we use for leader election?

Right now LeaderElection is the reason why it IMHO makes sense to define one primary cluster which we implicitly do by requiring a config to construct a manager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alvaroaleman, I haven't thought about this aspect. But it's absolutely correct, that you need to have a primary connection for leader election. In this case I would agree to have only one Manager interface.

The Connect method could ease the usage of the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having an explicit interface for MultiCluster manager.

I would probably embedd the Manager ito the MultiClusterManager and not the other way round)

+1

A possible alternative is:

type ClusterConnector{
    // same as above
}

// For single cluster use-case only
type Manager interface {
     // same as above
}

type MultiClusterManager interface {
    Manager

    // Create a new connection to a secondary cluster. Creates a new instance of ClusterConnector and adds it to this manager. Each ClusterConnector should have its unique name.
    ConnectSecondaryCluster(config *rest.Config, name string, opts ...Option) (cluserconnector.ClusterConnector, error)
    // Look up the ClusterConnector by its unique name.
    GetClusterConnectorFor(name string) (cluserconnector.ClusterConnector, error)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it like this:

type MultiClusterManager interface {
       Manager
       // Add another named cluster to the MultiClusterManager. The name
       // must be unique. The MultiClusterManager will wait for all clusters
       // caches to be started before starting anytything else.
       AddCluster(config *rest.Config, name string) error
       // GetClusters returns all Clusters this MultiClusterManager knows about.
       // The cluster used to construct the MultiClusterManager is named `primary`
       // and will be used for LeaderElection, if enabled.
       GetClusters() map[string]clusterconnector.ClusterConnector
}

In all the multi-cluster controllers I've built so far, the actual controller just wants to watch in all clusters but doesn't necessarily know the number or names of them ahead of time which is why a method that returns a map of all clusters is IMHO more useful. If there is one that gets a special treatment, this is still possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of the patterns I've seen for multicluster end up having clusters come & go over time. Even if we're not going to tackle that now, we should keep that in mind while designing.

I'm also not certain about the whole "primary" / "secondary" cluster thing. Perhaps modeling this as there's a "leader election" cluster and "functional" clusters (one of which may be the leader election cluster)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of the patterns I've seen for multicluster end up having clusters come & go over time. Even if we're not going to tackle that now, we should keep that in mind while designing.

IMHO this proposal should do nothing to prevent that but it does not need to do anything yet to simplify that. Once we want such a functionality, we would probably add a RemoveCluster to the interface.

I'm also not certain about the whole "primary" / "secondary" cluster thing. Perhaps modeling this as there's a "leader election" cluster and "functional" clusters (one of which may be the leader election cluster)?

I guess we could probably change AddCluster to have a UseForLeaderElection bool opt if leader election is enabled, use that cluster (and error out when starting and note exactly one cluster is configured to be used for leader election). WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, after trying to update the sample, I would like to not make a MultiClusterManager part of this proposal, because it opens up a set of questions I would like to not answer just yet:

  • The Builder needs to be extended to have something like WatchesFromCluster(source, "cluster-name", handler), otherwise we need to blindly dereference elements from the map[clusterName]cluster.Cluster, potentially getting NPDs
  • The Builder needs to be extended to have something like WatchesFromAllClustersExcept(source, "exepcted-cluster-name", handler)
  • Probably more use cases for watches I didn't think about
  • There should be a nice way to pass on all clients for all clusters to a controller that doesn't involve writing a loop (probably the multi cluster client abstraction that Solly suggested?)
  • When adding official support for building a controller that watches an arbitraty number of clusters, we also need an official way of encoding the cluster name into the reconcile.Request

And probably more.

IMHO, extracting the clusterspecifics out of the manager won't block any of the work mentioned above but allows us to start working on the topic without requiring us to already anticipate all use cases (And is already a big improvement over the status quo).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fine with doing that in a follow-up


// SetFields will set any dependencies on an object for which the object has implemented the inject
// interface - e.g. inject.Client.
SetFields(interface{}) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what will happen when the embeded ClusterConnector also has SetFields method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this is fine as long as they have the same signature

GetWebhookServer() *webhook.Server
}
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having an explicit interface for MultiCluster manager.

I would probably embedd the Manager ito the MultiClusterManager and not the other way round)

+1

A possible alternative is:

type ClusterConnector{
    // same as above
}

// For single cluster use-case only
type Manager interface {
     // same as above
}

type MultiClusterManager interface {
    Manager

    // Create a new connection to a secondary cluster. Creates a new instance of ClusterConnector and adds it to this manager. Each ClusterConnector should have its unique name.
    ConnectSecondaryCluster(config *rest.Config, name string, opts ...Option) (cluserconnector.ClusterConnector, error)
    // Look up the ClusterConnector by its unique name.
    GetClusterConnectorFor(name string) (cluserconnector.ClusterConnector, error)
}


```go
if cc, isClusterConnector:= runnable.(clusterconnector.ClusterConnector); isClusterConnector {
m.caches = append(m.caches, cc.GetCache())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably fine to do type assertion here, since this's internal implementation details.

The alternate would be to require ppl to do a mgr.Add(clusterConnector.GetCache) which I would like to avoid because its unintuitive.

+1

@mengqiy
Copy link
Member

mengqiy commented Jul 28, 2020

/assign @DirectXMan12 Thoughts?

Copy link
Contributor

@DirectXMan12 DirectXMan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late comments. Mostly looks good, couple of things inline

`runnables` are started.


The new `ClusterConnector` interface will look like this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster seems fine (it's bugging me a little bit that it doesn't say that it's not super clear that this type doesn't set up a cluster, like envtest, but rather connects to a cluster, but don't consider that comment blocking)

type ClusterConnector interface {
// SetFields will set cluster-specific dependencies on an object for which the object has implemented the inject
// interface, specifically inject.Client, inject.Cache, inject.Scheme, inject.Config and inject.APIReader
SetFields(interface{}) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tangentially related aside: I'd like to see if we can maybe refactor the internal DI stuff a bit before 1.0 -- the complete lack of type signature, etc bugs me a bit, but I've yet to figure out a better answer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its an unrelated topic but IMHO we should try to get rid of all of it, because it makes changes to it runtime errors and not compile-time errors which is very bad

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. I'd like to see a proposal for how to cleanly do that (this is not to sound dismissive or snarky -- I genuinely would like to see a proposal :-) )

GetScheme() *runtime.Scheme

// Start starts the ClusterConnector
Start(<-chan struct{}) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we going to integrate the context work here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this proposal is orthogonal to the context work, the MultiClusterManager will make use of the same Runnable interface as the normal Manager, whatever that is at time of implementation

GetWebhookServer() *webhook.Server
}
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of the patterns I've seen for multicluster end up having clusters come & go over time. Even if we're not going to tackle that now, we should keep that in mind while designing.

I'm also not certain about the whole "primary" / "secondary" cluster thing. Perhaps modeling this as there's a "leader election" cluster and "functional" clusters (one of which may be the leader election cluster)?


```go
if cc, isClusterConnector:= runnable.(clusterconnector.ClusterConnector); isClusterConnector {
m.caches = append(m.caches, cc.GetCache())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just do:

type HasCaches interface {
  GetCache() ...
}

if getter, hasCaches := runnable.(hasCaches); hasCaches {
  ...
}

(could make a additional method to avoid accidentally conflicting too)

return reconcile.Result, err
}

if err := r.mirrorClusterClient.Get(context.TODO(), r.NamespacedName, &corev1.Secret); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangent: Not that we need to tackle it here, but we could even create a multicluster client that (ab)used the unused ClusterName field (maybe dangerous), or used a client option, context field, etc to avoid needing multiple clients.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a great idea (but I would prefer to keep that as an orthogonal work item)

}

if err := r.mirrorClusterClient.Get(context.TODO(), r.NamespacedName, &corev1.Secret); err != nil {
if !kerrors.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: client.IgnoreNotFound(err) exists ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how that is of use, we have to return on NotFound so its handled differently from no error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, the pattern just becomes if client.IgnoreNotFound(err) != nil. Mainly just avoids remembering another k8s package to import.

panic(err)
}

mirrorClusterConnector, err := clusterconnector.New(cfg2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we leave the possibility for options here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO we should use functional opts once we have a need for that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, need to specify that in the signature so it's not breaking.

We should probably do a review before v1 and figure out spots where we should put functional options in (e.g. manager.New probably wants functional options).

@alvaroaleman
Copy link
Member Author

Everyone, I think I have responded to all feedback, PTAL

Copy link
Contributor

@DirectXMan12 DirectXMan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor nits, otherwise

/approve

(will add the LGTM once nits are addressed, since it'll be removed anyway. Really would love "lgtm with nits" in k8s :-/)

designs/move-cluster-specific-code-out-of-manager.md Outdated Show resolved Hide resolved
designs/move-cluster-specific-code-out-of-manager.md Outdated Show resolved Hide resolved
designs/move-cluster-specific-code-out-of-manager.md Outdated Show resolved Hide resolved
panic(err)
}

mirrorClusterConnector, err := clusterconnector.New(cfg2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, need to specify that in the signature so it's not breaking.

We should probably do a review before v1 and figure out spots where we should put functional options in (e.g. manager.New probably wants functional options).

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, DirectXMan12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [DirectXMan12,alvaroaleman]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alvaroaleman
Copy link
Member Author

@DirectXMan12 updated to incorporate your suggestions

@alvaroaleman
Copy link
Member Author

/retest

@DirectXMan12
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 21, 2020
@DirectXMan12
Copy link
Contributor

test failure is #1171

@DirectXMan12
Copy link
Contributor

/retest

@k8s-ci-robot k8s-ci-robot merged commit ea6a506 into kubernetes-sigs:master Sep 22, 2020
@ashishvya
Copy link

@alvaroaleman Is this proposal got implemented? If not do you have any tracking ticket for it.

@alvaroaleman
Copy link
Member Author

@ashishvya it is implemented

@taragu
Copy link

taragu commented Jun 27, 2022

@alvaroaleman could you kindly link the PR for implementing this proposal? I struggle to find where it's implemented.

@alvaroaleman
Copy link
Member Author

@taragu the implementation of this is in pkg/cluster, you can use the git history to find the relevant PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants