Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🏃 Proposal to extract cluster-specifics out of the Manager #1075

Merged
merged 1 commit into from
Sep 22, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
228 changes: 228 additions & 0 deletions designs/move-cluster-specific-code-out-of-manager.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
Move cluster-specific code out of the manager
===================
alvaroaleman marked this conversation as resolved.
Show resolved Hide resolved

## Motivation

Today, it is already possible to use controller-runtime to build controllers that act on
more than one cluster. However, this is undocumented and not straight-forward, requiring
users to look into the implementation details to figure out how to make this work.

## Goals

* Provide an easy-to-discover way to build controllers that act on multiple clusters
* Decouple the management of `Runnables` from the construction of "things that require a kubeconfig"
* Do not introduce changes for users that build controllers that act on one cluster only

## Non-Goals

## Proposal

Currently, the `./pkg/manager.Manager` has two purposes:

* Handle running controllers/other runnables and managing their lifecycle
* Setting up various things to interact with the Kubernetes cluster,
for example a `Client` and a `Cache`

This works very well when building controllers that talk to a single cluster,
however some use-cases require controllers that interact with more than
one cluster. This multi-cluster usecase is very awkward today, because it
requires to construct one manager per cluster and adding all subsequent
managers to the first one.

This document proposes to move all cluster-specific code out of the manager
and into a new package and interface, that then gets embedded into the manager.
This allows to keep the usage for single-cluster cases the same and introduce
this change in a backwards-compatible manner.

Furthermore, the manager gets extended to start all caches before any other
`runnables` are started.


The new `Cluster` interface will look like this:

```go
type Cluster interface {
// SetFields will set cluster-specific dependencies on an object for which the object has implemented the inject
// interface, specifically inject.Client, inject.Cache, inject.Scheme, inject.Config and inject.APIReader
SetFields(interface{}) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tangentially related aside: I'd like to see if we can maybe refactor the internal DI stuff a bit before 1.0 -- the complete lack of type signature, etc bugs me a bit, but I've yet to figure out a better answer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its an unrelated topic but IMHO we should try to get rid of all of it, because it makes changes to it runtime errors and not compile-time errors which is very bad

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. I'd like to see a proposal for how to cleanly do that (this is not to sound dismissive or snarky -- I genuinely would like to see a proposal :-) )


// GetConfig returns an initialized Config
GetConfig() *rest.Config

// GetClient returns a client configured with the Config. This client may
// not be a fully "direct" client -- it may read from a cache, for
// instance. See Options.NewClient for more information on how the default
// implementation works.
GetClient() client.Client

// GetFieldIndexer returns a client.FieldIndexer configured with the client
GetFieldIndexer() client.FieldIndexer

// GetCache returns a cache.Cache
GetCache() cache.Cache

// GetEventRecorderFor returns a new EventRecorder for the provided name
GetEventRecorderFor(name string) record.EventRecorder

// GetRESTMapper returns a RESTMapper
GetRESTMapper() meta.RESTMapper

// GetAPIReader returns a reader that will be configured to use the API server.
// This should be used sparingly and only when the client does not fit your
// use case.
GetAPIReader() client.Reader

// GetScheme returns an initialized Scheme
GetScheme() *runtime.Scheme

// Start starts the connection tothe Cluster
Start(<-chan struct{}) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we going to integrate the context work here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this proposal is orthogonal to the context work, the MultiClusterManager will make use of the same Runnable interface as the normal Manager, whatever that is at time of implementation

}
```

And the current `Manager` interface will change to look like this:

```go
type Manager interface {
// Cluster holds objects to connect to a cluster
cluser.Cluster

// Add will set requested dependencies on the component, and cause the component to be
// started when Start is called. Add will inject any dependencies for which the argument
// implements the inject interface - e.g. inject.Client.
// Depending on if a Runnable implements LeaderElectionRunnable interface, a Runnable can be run in either
// non-leaderelection mode (always running) or leader election mode (managed by leader election if enabled).
Add(Runnable) error

// Elected is closed when this manager is elected leader of a group of
// managers, either because it won a leader election or because no leader
// election was configured.
Elected() <-chan struct{}

// SetFields will set any dependencies on an object for which the object has implemented the inject
// interface - e.g. inject.Client.
SetFields(interface{}) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what will happen when the embeded ClusterConnector also has SetFields method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this is fine as long as they have the same signature


// AddMetricsExtraHandler adds an extra handler served on path to the http server that serves metrics.
// Might be useful to register some diagnostic endpoints e.g. pprof. Note that these endpoints meant to be
// sensitive and shouldn't be exposed publicly.
// If the simple path -> handler mapping offered here is not enough, a new http server/listener should be added as
// Runnable to the manager via Add method.
AddMetricsExtraHandler(path string, handler http.Handler) error

// AddHealthzCheck allows you to add Healthz checker
AddHealthzCheck(name string, check healthz.Checker) error

// AddReadyzCheck allows you to add Readyz checker
AddReadyzCheck(name string, check healthz.Checker) error

// Start starts all registered Controllers and blocks until the Stop channel is closed.
// Returns an error if there is an error starting any controller.
// If LeaderElection is used, the binary must be exited immediately after this returns,
// otherwise components that need leader election might continue to run after the leader
// lock was lost.
Start(<-chan struct{}) error

// GetWebhookServer returns a webhook.Server
GetWebhookServer() *webhook.Server
}
```

Copy link
Contributor

@kramerul kramerul Jul 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to separate this into 3 interfaces? In my point of view, this will make the API clearer. In multicluster environments, you have to call MultiClusterManager#Connect once for each connected cluster in contrast to have one implicit connection and having n-1 explicit connections.

type cluserconnector.ClusterConnector{
    // same as above
...
}}
type MultiClusterManager interface {
    // Create a new connection to a cluster. Creates a new instance of ClusterConnector and adds it to this manager
    Connect(config *rest.Config, name string, opts ...Option) cluserconnector.ClusterConnector
    // same as Manager from proposal except cluserconnector.ClusterConnector
    Add(Runnable) error
    Elected() <-chan struct{}
    SetFields(interface{}) error
    AddMetricsExtraHandler(path string, handler http.Handler) error
    AddHealthzCheck(name string, check healthz.Checker) error
    AddReadyzCheck(name string, check healthz.Checker) error
    Start(<-chan struct{}) error
    GetWebhookServer() *webhook.Server
}

type Manager interface {
     cluserconnector.ClusterConnector
     MultiClusterManager
}

Introducing a factory method Connect inside MultiClusterManager would also solve the problems of having to use type assertions (see comment of @alvaroaleman).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kramerul Generally I like the idea (although I would probably embedd the Manager ito the MultiClusterManager and not the other way round) but it opens up an interesting question: What config will we use for leader election?

Right now LeaderElection is the reason why it IMHO makes sense to define one primary cluster which we implicitly do by requiring a config to construct a manager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alvaroaleman, I haven't thought about this aspect. But it's absolutely correct, that you need to have a primary connection for leader election. In this case I would agree to have only one Manager interface.

The Connect method could ease the usage of the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having an explicit interface for MultiCluster manager.

I would probably embedd the Manager ito the MultiClusterManager and not the other way round)

+1

A possible alternative is:

type ClusterConnector{
    // same as above
}

// For single cluster use-case only
type Manager interface {
     // same as above
}

type MultiClusterManager interface {
    Manager

    // Create a new connection to a secondary cluster. Creates a new instance of ClusterConnector and adds it to this manager. Each ClusterConnector should have its unique name.
    ConnectSecondaryCluster(config *rest.Config, name string, opts ...Option) (cluserconnector.ClusterConnector, error)
    // Look up the ClusterConnector by its unique name.
    GetClusterConnectorFor(name string) (cluserconnector.ClusterConnector, error)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it like this:

type MultiClusterManager interface {
       Manager
       // Add another named cluster to the MultiClusterManager. The name
       // must be unique. The MultiClusterManager will wait for all clusters
       // caches to be started before starting anytything else.
       AddCluster(config *rest.Config, name string) error
       // GetClusters returns all Clusters this MultiClusterManager knows about.
       // The cluster used to construct the MultiClusterManager is named `primary`
       // and will be used for LeaderElection, if enabled.
       GetClusters() map[string]clusterconnector.ClusterConnector
}

In all the multi-cluster controllers I've built so far, the actual controller just wants to watch in all clusters but doesn't necessarily know the number or names of them ahead of time which is why a method that returns a map of all clusters is IMHO more useful. If there is one that gets a special treatment, this is still possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of the patterns I've seen for multicluster end up having clusters come & go over time. Even if we're not going to tackle that now, we should keep that in mind while designing.

I'm also not certain about the whole "primary" / "secondary" cluster thing. Perhaps modeling this as there's a "leader election" cluster and "functional" clusters (one of which may be the leader election cluster)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of the patterns I've seen for multicluster end up having clusters come & go over time. Even if we're not going to tackle that now, we should keep that in mind while designing.

IMHO this proposal should do nothing to prevent that but it does not need to do anything yet to simplify that. Once we want such a functionality, we would probably add a RemoveCluster to the interface.

I'm also not certain about the whole "primary" / "secondary" cluster thing. Perhaps modeling this as there's a "leader election" cluster and "functional" clusters (one of which may be the leader election cluster)?

I guess we could probably change AddCluster to have a UseForLeaderElection bool opt if leader election is enabled, use that cluster (and error out when starting and note exactly one cluster is configured to be used for leader election). WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, after trying to update the sample, I would like to not make a MultiClusterManager part of this proposal, because it opens up a set of questions I would like to not answer just yet:

  • The Builder needs to be extended to have something like WatchesFromCluster(source, "cluster-name", handler), otherwise we need to blindly dereference elements from the map[clusterName]cluster.Cluster, potentially getting NPDs
  • The Builder needs to be extended to have something like WatchesFromAllClustersExcept(source, "exepcted-cluster-name", handler)
  • Probably more use cases for watches I didn't think about
  • There should be a nice way to pass on all clients for all clusters to a controller that doesn't involve writing a loop (probably the multi cluster client abstraction that Solly suggested?)
  • When adding official support for building a controller that watches an arbitraty number of clusters, we also need an official way of encoding the cluster name into the reconcile.Request

And probably more.

IMHO, extracting the clusterspecifics out of the manager won't block any of the work mentioned above but allows us to start working on the topic without requiring us to already anticipate all use cases (And is already a big improvement over the status quo).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fine with doing that in a follow-up

Furthermore, during startup, the `Manager` will use type assertion to find `Cluster`s
to be able to start their caches before anything else:

```go
type HasCaches interface {
GetCache()
}
if getter, hasCaches := runnable.(HasCaches); hasCaches {
m.caches = append(m.caches, getter())
}
```

```go
for idx := range cm.caches {
go func(idx int) {cm.caches[idx].Start(cm.internalStop)}
}

for _, cache := range cm.caches {
cache.WaitForCacheSync(cm.internalStop)
}

// Start all other runnables
```

## Example

Below is a sample `reconciler` that will create a secret in a `mirrorCluster` for each
secret found in `referenceCluster` if none of that name already exists. To keep the sample
short, it won't compare the contents of the secrets.

```go
type secretMirrorReconciler struct {
referenceClusterClient, mirrorClusterClient client.Client
}

func (r *secretMirrorReconciler) Reconcile(r reconcile.Request)(reconcile.Result, error){
s := &corev1.Secret{}
if err := r.referenceClusterClient.Get(context.TODO(), r.NamespacedName, s); err != nil {
if kerrors.IsNotFound{ return reconcile.Result{}, nil }
return reconcile.Result, err
}

if err := r.mirrorClusterClient.Get(context.TODO(), r.NamespacedName, &corev1.Secret); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangent: Not that we need to tackle it here, but we could even create a multicluster client that (ab)used the unused ClusterName field (maybe dangerous), or used a client option, context field, etc to avoid needing multiple clients.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a great idea (but I would prefer to keep that as an orthogonal work item)

if !kerrors.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: client.IgnoreNotFound(err) exists ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how that is of use, we have to return on NotFound so its handled differently from no error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, the pattern just becomes if client.IgnoreNotFound(err) != nil. Mainly just avoids remembering another k8s package to import.

return reconcile.Result{}, err
}

mirrorSecret := &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{Namespace: s.Namespace, Name: s.Name},
Data: s.Data,
}
return reconcile.Result{}, r.mirrorClusterClient.Create(context.TODO(), mirrorSecret)
}

return nil
}

func NewSecretMirrorReconciler(mgr manager.Manager, mirrorCluster cluster.Cluster) error {
return ctrl.NewControllerManagedBy(mgr).
// Watch Secrets in the reference cluster
For(&corev1.Secret{}).
// Watch Secrets in the mirror cluster
Watches(
source.NewKindWithCache(&corev1.Secret{}, mirrorCluster.GetCache()),
&handler.EnqueueRequestForObject{},
).
DirectXMan12 marked this conversation as resolved.
Show resolved Hide resolved
Complete(&secretMirrorReconciler{
referenceClusterClient: mgr.GetClient(),
mirrorClusterClient: mirrorCluster.GetClient(),
})
}
}
alvaroaleman marked this conversation as resolved.
Show resolved Hide resolved

func main(){

mgr, err := manager.New(cfg1, manager.Options{})
if err != nil {
panic(err)
}

mirrorCluster, err := cluster.New(cfg2)
if err != nil {
panic(err)
}

if err := mgr.Add(mirrorCluster); err != nil {
panic(err)
}

if err := NewSecretMirrorReconciler(mgr, mirrorCluster); err != nil {
panic(err)
}

if err := mgr.Start(signals.SetupSignalHandler()); err != nil {
panic(err)
}
}
```