Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec.Manifest != release.Manifest when helm "lookup" func used in template causing repeat reconcilliation #209

Open
DreamingRaven opened this issue Feb 17, 2023 · 11 comments · May be fixed by #344

Comments

@DreamingRaven
Copy link

DreamingRaven commented Feb 17, 2023

Hey, I have been building a hybrid operator (https://gitlab.com/GeorgeRaven/authentik-manager/-/tree/staging/operator) with the helm controller which attempts to reconcile one of my helm charts (https://gitlab.com/GeorgeRaven/authentik-manager/-/tree/staging/operator/helm-charts/ak).

This works overall however on closer inspection I note that the reconciler is repeatedly trying to reconcile a resource that it later recognizes as being correct or requiring no change. This leads to extremely high resource usage.

I believe this is related to helm lookup functions, and attempting to impute values from already existing resources. The reason I say this is because the affected resource in particular appear to be one of my secrets:

{{- if .Values.secret.generate }}
{{- $existingSecret := (lookup "v1" "Secret" (printf "%s" .Release.Namespace) (printf "%s" .Values.secret.name) ) -}}
apiVersion: v1
kind: Secret
metadata:
  name: {{ .Values.secret.name }}
{{/* If the secret already exists then keep the data else gen new */}}
{{- if $existingSecret }}
  annotations: {{ toJson $existingSecret.metadata.annotations }}
# from existing secret
data: {{ toJson $existingSecret.data }}
{{- else }}
data:
  authJwtToken: {{ default 30 .Values.secret.randLength | int | randAlphaNum | b64enc }}
  authStorageEncryptionKey: {{  default 30 .Values.secret.randLength | int | randAlphaNum | b64enc }}
  ************** several more secrets ************
  {{- end }}
{{- end }}

Which I believe may be affecting this check which is being met causing repeat reconciles:

if specRelease.Manifest != currentRelease.Manifest ||
currentRelease.Info.Status == release.StatusFailed ||
currentRelease.Info.Status == release.StatusSuperseded {
return currentRelease, stateNeedsUpgrade, nil

In contrast the actual actionclient reconciler is likely properly generating an empty patch:

func (c *actionClient) Reconcile(rel *release.Release) error {
infos, err := c.conf.KubeClient.Build(bytes.NewBufferString(rel.Manifest), false)
if err != nil {
return err
}
return infos.Visit(func(expected *resource.Info, err error) error {
if err != nil {
return fmt.Errorf("visit error: %w", err)
}
helper := resource.NewHelper(expected.Client, expected.Mapping)
existing, err := helper.Get(expected.Namespace, expected.Name)
if apierrors.IsNotFound(err) {
if _, err := helper.Create(expected.Namespace, true, expected.Object); err != nil {
return fmt.Errorf("create error: %w", err)
}
return nil
} else if err != nil {
return fmt.Errorf("could not get object: %w", err)
}
patch, patchType, err := createPatch(existing, expected)
if err != nil {
return fmt.Errorf("error creating patch: %w", err)
}
if patch == nil {
// nothing to do
return nil
}
_, err = helper.Patch(expected.Namespace, expected.Name, patchType, patch,
&metav1.PatchOptions{})
if err != nil {
return fmt.Errorf("patch error: %w", err)
}
return nil

This also affects my sub-charts (Bitnami redis, Bitnami postgres) which also use the lookup function and fail although are disabled here to focus only on the clearer example.

Here is a full log of a run where I apply my watched for helm resource:

> make generate manifests install run
test -s /home/archer/git/authentik-manager/operator/bin/controller-gen || GOBIN=/home/archer/git/authentik-manager/operator/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.10.0
/home/archer/git/authentik-manager/operator/bin/controller-gen object:headerFile="" paths="./..."
/home/archer/git/authentik-manager/operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/archer/git/authentik-manager/operator/bin/kustomize build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/akblueprints.akm.goauthentik.io configured
customresourcedefinition.apiextensions.k8s.io/aks.akm.goauthentik.io unchanged
go fmt ./...
go vet ./...
# helm-operator run
go run ./main.go
1.6766436991798725e+09	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": ":8080"}
1.6766436991832378e+09	INFO	controllers.Helm	Watching resource	{"group": "akm.goauthentik.io", "version": "v1alpha1", "kind": "Ak"}
1.676643699183254e+09	INFO	setup	configured watch	{"gvk": "akm.goauthentik.io/v1alpha1, Kind=Ak", "chartPath": "helm-charts/ak", "maxConcurrentReconciles": 24, "reconcilePeriod": "1m0s"}
1.6766436991832697e+09	INFO	setup	starting manager
1.6766436991834452e+09	INFO	Starting server	{"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
1.6766436991834517e+09	INFO	Starting server	{"kind": "health probe", "addr": "[::]:8081"}
1.6766436991835036e+09	INFO	Starting EventSource	{"controller": "akblueprint", "controllerGroup": "akm.goauthentik.io", "controllerKind": "AkBlueprint", "source": "kind source: *v1alpha1.AkBlueprint"}
1.6766436991835206e+09	INFO	Starting Controller	{"controller": "akblueprint", "controllerGroup": "akm.goauthentik.io", "controllerKind": "AkBlueprint"}
1.6766436991835217e+09	INFO	Starting EventSource	{"controller": "ak-controller", "source": "kind source: *unstructured.Unstructured"}
1.6766436991835382e+09	INFO	Starting EventSource	{"controller": "ak-controller", "source": "kind source: *v1.Secret"}
1.6766436991835427e+09	INFO	Starting Controller	{"controller": "ak-controller"}
1.6766436992841232e+09	INFO	Starting workers	{"controller": "akblueprint", "controllerGroup": "akm.goauthentik.io", "controllerKind": "AkBlueprint", "worker count": 1}
1.6766436992841737e+09	INFO	Starting workers	{"controller": "ak-controller", "worker count": 24}
1.6766437054854615e+09	DEBUG	controllers.Helm	Reconciliation triggered	{"ak": "default/ak-sample"}
1.6766437054908583e+09	DEBUG	controllers.Helm	Starting install
1.6766437056020834e+09	DEBUG	controllers.Helm	creating 11 resource(s)
1.676643705631722e+09	INFO	controllers.Helm	Release installed	{"ak": "default/ak-sample", "name": "ak-sample", "version": 1}
1.6766437056327908e+09	INFO	Starting EventSource	{"controller": "ak-controller", "source": "kind source: *unstructured.Unstructured"}
1.6766437056328113e+09	DEBUG	controllers.Helm	Watching dependent resource	{"ak": "default/ak-sample", "dependentAPIVersion": "v1", "dependentKind": "Secret"}
1.6766437056330194e+09	INFO	Starting EventSource	{"controller": "ak-controller", "source": "kind source: *unstructured.Unstructured"}
1.6766437056330316e+09	DEBUG	controllers.Helm	Watching dependent resource	{"ak": "default/ak-sample", "dependentAPIVersion": "v1", "dependentKind": "ConfigMap"}
1.6766437056331952e+09	INFO	Starting EventSource	{"controller": "ak-controller", "source": "kind source: *unstructured.Unstructured"}
1.6766437056332052e+09	DEBUG	controllers.Helm	Watching dependent resource	{"ak": "default/ak-sample", "dependentAPIVersion": "v1", "dependentKind": "Service"}
1.6766437056338887e+09	INFO	Starting EventSource	{"controller": "ak-controller", "source": "kind source: *unstructured.Unstructured"}
1.6766437056339014e+09	DEBUG	controllers.Helm	Watching dependent resource	{"ak": "default/ak-sample", "dependentAPIVersion": "apps/v1", "dependentKind": "Deployment"}
1.6766437056340444e+09	INFO	Starting EventSource	{"controller": "ak-controller", "source": "kind source: *unstructured.Unstructured"}
1.6766437056340532e+09	DEBUG	controllers.Helm	Watching dependent resource	{"ak": "default/ak-sample", "dependentAPIVersion": "networking.k8s.io/v1", "dependentKind": "Ingress"}
1.6766437056342168e+09	INFO	Starting EventSource	{"controller": "ak-controller", "source": "kind source: *unstructured.Unstructured"}
1.676643705634227e+09	DEBUG	controllers.Helm	Watching dependent resource	{"ak": "default/ak-sample", "dependentAPIVersion": "networking.k8s.io/v1", "dependentKind": "NetworkPolicy"}
1.676643705649836e+09	DEBUG	controllers.Helm	Reconciliation triggered	{"ak": "default/ak-sample"}
1.6766437056602938e+09	DEBUG	controllers.Helm	preparing upgrade for ak-sample
1.676643705733423e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "bootstrap-token-js20q4", "namespace": "kube-system", "apiVersion": "v1", "kind": "Secret"}
1.6766437057334795e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "sh.helm.release.v1.ak-sample.v1", "namespace": "default", "apiVersion": "v1", "kind": "Secret"}
1.6766437057334864e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "auth", "namespace": "default", "apiVersion": "v1", "kind": "Secret"}
1.676643705733534e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "authentik-server", "namespace": "default", "apiVersion": "v1", "kind": "Service"}
1.6766437057335584e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "kubernetes", "namespace": "default", "apiVersion": "v1", "kind": "Service"}
1.6766437057335649e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "kube-dns", "namespace": "kube-system", "apiVersion": "v1", "kind": "Service"}
1.676643705733571e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "authentik-worker", "namespace": "default", "apiVersion": "v1", "kind": "Service"}
1.676643705733582e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "kubelet-config", "namespace": "kube-system", "apiVersion": "v1", "kind": "ConfigMap"}
1.6766437057336082e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "kube-root-ca.crt", "namespace": "default", "apiVersion": "v1", "kind": "ConfigMap"}
1.676643705733655e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "kube-root-ca.crt", "namespace": "kube-system", "apiVersion": "v1", "kind": "ConfigMap"}
1.6766437057336686e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "kube-proxy", "namespace": "kube-system", "apiVersion": "v1", "kind": "ConfigMap"}
1.6766437057336805e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "extension-apiserver-authentication", "namespace": "kube-system", "apiVersion": "v1", "kind": "ConfigMap"}
1.676643705733691e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "coredns", "namespace": "kube-system", "apiVersion": "v1", "kind": "ConfigMap"}
1.6766437057337024e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "kube-root-ca.crt", "namespace": "kube-public", "apiVersion": "v1", "kind": "ConfigMap"}
1.6766437057337132e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "kube-root-ca.crt", "namespace": "kube-node-lease", "apiVersion": "v1", "kind": "ConfigMap"}
1.6766437057337236e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "authentik-config", "namespace": "default", "apiVersion": "v1", "kind": "ConfigMap"}
1.6766437057337363e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "kubeadm-config", "namespace": "kube-system", "apiVersion": "v1", "kind": "ConfigMap"}
1.676643705733748e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "cluster-info", "namespace": "kube-public", "apiVersion": "v1", "kind": "ConfigMap"}
1.676643705734698e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "auth-ldap", "namespace": "default", "apiVersion": "networking.k8s.io/v1", "kind": "Ingress"}
1.6766437057347128e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "auth", "namespace": "default", "apiVersion": "networking.k8s.io/v1", "kind": "Ingress"}
1.6766437057346978e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "general-netpol", "namespace": "default", "apiVersion": "networking.k8s.io/v1", "kind": "NetworkPolicy"}
1.6766437057347288e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "ldap-netpol", "namespace": "default", "apiVersion": "networking.k8s.io/v1", "kind": "NetworkPolicy"}
1.6766437057347398e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "authentik-worker", "namespace": "default", "apiVersion": "networking.k8s.io/v1", "kind": "NetworkPolicy"}
1.6766437057347512e+09	DEBUG	controllers.Helm	performing update for ak-sample
1.6766437057347138e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "authentik-server", "namespace": "default", "apiVersion": "apps/v1", "kind": "Deployment"}
1.6766437057347898e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "authentik-worker", "namespace": "default", "apiVersion": "apps/v1", "kind": "Deployment"}
1.6766437057348058e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "coredns", "namespace": "kube-system", "apiVersion": "apps/v1", "kind": "Deployment"}
1.6766437057447062e+09	DEBUG	controllers.Helm	dry run for ak-sample
1.6766437057501326e+09	DEBUG	controllers.Helm	preparing upgrade for ak-sample
1.676643705789205e+09	DEBUG	controllers.Helm	performing update for ak-sample
1.6766437057983747e+09	DEBUG	controllers.Helm	creating upgraded release for ak-sample
1.6766437058037639e+09	DEBUG	controllers.Helm	checking 11 resources for changes
1.676643705805758e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "sh.helm.release.v1.ak-sample.v2", "namespace": "default", "apiVersion": "v1", "kind": "Secret"}
1.6766437058108108e+09	DEBUG	controllers.Helm	Looks like there are no changes for NetworkPolicy "authentik-worker"
1.676643705814753e+09	DEBUG	controllers.Helm	Looks like there are no changes for NetworkPolicy "ldap-netpol"
1.6766437058186688e+09	DEBUG	controllers.Helm	Looks like there are no changes for NetworkPolicy "general-netpol"
1.676643705822401e+09	DEBUG	controllers.Helm	Looks like there are no changes for Secret "auth"
1.6766437058262606e+09	DEBUG	controllers.Helm	Looks like there are no changes for ConfigMap "authentik-config"
1.676643705831045e+09	DEBUG	controllers.Helm	Looks like there are no changes for Service "authentik-server"
1.6766437058344057e+09	DEBUG	controllers.Helm	Looks like there are no changes for Service "authentik-worker"
1.6766437058460007e+09	DEBUG	controllers.Helm	Looks like there are no changes for Deployment "authentik-server"
1.6766437058591201e+09	DEBUG	controllers.Helm	Looks like there are no changes for Deployment "authentik-worker"
1.6766437058633142e+09	DEBUG	controllers.Helm	Looks like there are no changes for Ingress "auth"
1.6766437058660803e+09	DEBUG	controllers.Helm	Looks like there are no changes for Ingress "auth-ldap"
1.6766437058724103e+09	DEBUG	controllers.Helm	updating status for upgraded release for ak-sample
1.6766437058739226e+09	DEBUG	predicate	Reconciling due to dependent resource update	{"name": "sh.helm.release.v1.ak-sample.v1", "namespace": "default", "apiVersion": "v1", "kind": "Secret"}
1.6766437058783867e+09	INFO	controllers.Helm	Release upgraded	{"ak": "default/ak-sample", "name": "ak-sample", "version": 2}

********************************************************** skipping many repeats ***************************************************************

1.6766437080106707e+09	DEBUG	predicate	Reconciling due to dependent resource update	{"name": "sh.helm.release.v1.ak-sample.v9", "namespace": "default", "apiVersion": "v1", "kind": "Secret"}
1.676643708014716e+09	DEBUG	controllers.Helm	Reconciliation triggered	{"ak": "default/ak-sample"}
1.676643708067962e+09	DEBUG	controllers.Helm	preparing upgrade for ak-sample
1.6766437082209888e+09	DEBUG	controllers.Helm	performing update for ak-sample
1.6766437082409153e+09	DEBUG	controllers.Helm	dry run for ak-sample
1.6766437082749157e+09	DEBUG	controllers.Helm	preparing upgrade for ak-sample
1.676643708349455e+09	DEBUG	controllers.Helm	performing update for ak-sample
1.6766437083658214e+09	DEBUG	controllers.Helm	creating upgraded release for ak-sample
1.6766437083739903e+09	DEBUG	controllers.Helm	checking 11 resources for changes
1.676643708375554e+09	DEBUG	predicate	Skipping reconciliation for dependent resource creation	{"name": "sh.helm.release.v1.ak-sample.v10", "namespace": "default", "apiVersion": "v1", "kind": "Secret"}
1.6766437083802536e+09	DEBUG	controllers.Helm	Looks like there are no changes for NetworkPolicy "authentik-worker"
1.6766437083839474e+09	DEBUG	controllers.Helm	Looks like there are no changes for NetworkPolicy "ldap-netpol"
1.6766437083875918e+09	DEBUG	controllers.Helm	Looks like there are no changes for NetworkPolicy "general-netpol"
1.6766437083913655e+09	DEBUG	controllers.Helm	Looks like there are no changes for Secret "auth"
1.6766437083955123e+09	DEBUG	controllers.Helm	Looks like there are no changes for ConfigMap "authentik-config"
1.6766437083996217e+09	DEBUG	controllers.Helm	Looks like there are no changes for Service "authentik-server"
1.6766437084061203e+09	DEBUG	controllers.Helm	Looks like there are no changes for Service "authentik-worker"
1.6766437084129076e+09	DEBUG	controllers.Helm	Looks like there are no changes for Deployment "authentik-server"
1.6766437084197428e+09	DEBUG	controllers.Helm	Looks like there are no changes for Deployment "authentik-worker"
1.6766437084236712e+09	DEBUG	controllers.Helm	Looks like there are no changes for Ingress "auth"
1.6766437084272254e+09	DEBUG	controllers.Helm	Looks like there are no changes for Ingress "auth-ldap"
1.6766437084342504e+09	DEBUG	controllers.Helm	updating status for upgraded release for ak-sample
1.6766437084358995e+09	DEBUG	predicate	Reconciling due to dependent resource update	{"name": "sh.helm.release.v1.ak-sample.v9", "namespace": "default", "apiVersion": "v1", "kind": "Secret"}
^C1.6766437084366193e+09	INFO	Stopping and waiting for non leader election runnables
1.6766437084366376e+09	INFO	Stopping and waiting for leader election runnables
1.676643708436671e+09	INFO	Shutdown signal received, waiting for all workers to finish	{"controller": "akblueprint", "controllerGroup": "akm.goauthentik.io", "controllerKind": "AkBlueprint"}
1.6766437084366868e+09	INFO	Shutdown signal received, waiting for all workers to finish	{"controller": "ak-controller"}
1.6766437084366996e+09	INFO	All workers finished	{"controller": "akblueprint", "controllerGroup": "akm.goauthentik.io", "controllerKind": "AkBlueprint"}
1.676643708444819e+09	INFO	controllers.Helm	Release upgraded	{"ak": "default/ak-sample", "name": "ak-sample", "version": 10}
1.6766437084463599e+09	DEBUG	predicate	Reconciling due to dependent resource update	{"name": "sh.helm.release.v1.ak-sample.v10", "namespace": "default", "apiVersion": "v1", "kind": "Secret"}
1.6766437084504726e+09	DEBUG	controllers.Helm	Reconciliation triggered	{"ak": "default/ak-sample"}
1.6766437084505365e+09	ERROR	Reconciler error	{"controller": "ak-controller", "object": {"name":"ak-sample","namespace":"default"}, "namespace": "default", "name": "ak-sample", "reconcileID": "95d10411-5865-4450-b6f4-e718640001ac", "error": "client rate limiter Wait returned an error: context canceled"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/home/archer/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:326
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/archer/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/home/archer/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:234
1.6766437084505928e+09	INFO	All workers finished	{"controller": "ak-controller"}
1.6766437084506066e+09	INFO	Stopping and waiting for caches
1.6766437084507222e+09	INFO	Stopping and waiting for webhooks
1.6766437084507341e+09	INFO	Wait completed, proceeding to shutdown the manager
make: *** [Makefile:113: run] Error 1

I can confirm it is related to the secret as when it is disabled I have no such reconcilliation problems:

1.6766454866735418e+09	INFO	controllers.Helm	Release installed	{"ak": "default/ak-sample", "name": "ak-sample", "version": 1}
1.6766454866736445e+09	DEBUG	predicate	Reconciling due to dependent resource update	{"name": "authentik-worker", "namespace": "default", "apiVersion": "apps/v1", "kind": "Deployment"}
1.6766454866741118e+09	DEBUG	predicate	Reconciling due to dependent resource update	{"name": "authentik-server", "namespace": "default", "apiVersion": "apps/v1", "kind": "Deployment"}
1.676645486674792e+09	DEBUG	predicate	Reconciling due to dependent resource update	{"name": "sh.helm.release.v1.ak-sample.v1", "namespace": "default", "apiVersion": "v1", "kind": "Secret"}
1.67664548668227e+09	DEBUG	predicate	Reconciling due to dependent resource update	{"name": "authentik-worker", "namespace": "default", "apiVersion": "apps/v1", "kind": "Deployment"}
1.67664548668497e+09	DEBUG	predicate	Reconciling due to dependent resource update	{"name": "authentik-server", "namespace": "default", "apiVersion": "apps/v1", "kind": "Deployment"}
1.6766454866865325e+09	DEBUG	controllers.Helm	Reconciliation triggered	{"ak": "default/ak-sample"}
1.6766454866900659e+09	DEBUG	predicate	Reconciling due to dependent resource update	{"name": "authentik-worker", "namespace": "default", "apiVersion": "apps/v1", "kind": "Deployment"}
1.6766454866957946e+09	DEBUG	predicate	Reconciling due to dependent resource update	{"name": "authentik-server", "namespace": "default", "apiVersion": "apps/v1", "kind": "Deployment"}
1.6766454866978717e+09	DEBUG	controllers.Helm	preparing upgrade for ak-sample
1.6766454867697005e+09	DEBUG	controllers.Helm	performing update for ak-sample
1.6766454867792912e+09	DEBUG	controllers.Helm	dry run for ak-sample
1.6766454867954018e+09	INFO	controllers.Helm	Release reconciled	{"ak": "default/ak-sample", "name": "ak-sample", "version": 1}
1.6766454867980201e+09	DEBUG	controllers.Helm	Reconciliation triggered	{"ak": "default/ak-sample"}
1.6766454868060386e+09	DEBUG	controllers.Helm	preparing upgrade for ak-sample
1.676645486920131e+09	DEBUG	controllers.Helm	performing update for ak-sample
1.6766454869291651e+09	DEBUG	controllers.Helm	dry run for ak-sample
1.6766454869506211e+09	INFO	controllers.Helm	Release reconciled	{"ak": "default/ak-sample", "name": "ak-sample", "version": 1}
1.6766454987888744e+09	DEBUG	controllers.Helm	Reconciliation triggered	{"ak": "default/ak-sample"}
1.676645498802959e+09	DEBUG	controllers.Helm	preparing upgrade for ak-sample
1.6766454989271529e+09	DEBUG	controllers.Helm	performing update for ak-sample
1.6766454989450462e+09	DEBUG	controllers.Helm	dry run for ak-sample
1.6766454989689188e+09	INFO	controllers.Helm	Release reconciled	{"ak": "default/ak-sample", "name": "ak-sample", "version": 1}

If I am doing anything particularly wrong please let me know. I will try to see if there is a specific in-code fix possible and potentially PR it if I find one.

@joelanford
Copy link
Member

First, the approach you have for secret generation (generate new if not exist, otherwise use existing) is one I have not seen before. I didn't know that was possible with helm, and it's a very elegant solution to the "you can't use random values in the helm charts managed by the helm operator" problem. If we can figure out this other problem, I would really like to see this in our docs!

1.6766437058783867e+09 INFO controllers.Helm Release upgraded

^ That's a release upgrade which wouldn't follow the patch code path. The patch code path is followed when the manifests match.

On the release upgrade path, the specRelease.Manifest != currentRelease.Manifest check is most likely turning up a difference.

I wonder if there's some inconsequential difference (different spacing or base64 vs not base64)? Can you turn up log verbosity to get these diff log lines in the output:

if log.V(4).Enabled() {
fmt.Println(diff.Generate(curRel.Manifest, rel.Manifest))
}

@DreamingRaven
Copy link
Author

DreamingRaven commented Mar 9, 2023

Hey @joelanford, thanks I cant take any credit for it, it was an idea I exported from some stackoverflow post at some point a while ago.

I have actually moved away from the helm-operator-plugin a few days ago in favor of calling helm directly:
API spec:

type AkSpec struct {

	// +kubebuilder:pruning:PreserveUnknownFields
	// +kubebuilder:validation:Schemaless

	// Values is the helm chart values map to override chart defaults. This is often further adapted by the controller
	// to add additional resources like declarative blueprints into the deployments. Values is a loose, and unstructured
	// datatype. It will not complain if the values do not override anything, or do anything at all.
	Values json.RawMessage `json:"values,omitempty"`
}

Reconciler Logic (apologies for the length I included it here for completion):

func (r *AkReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	l := klog.FromContext(ctx)

	// Parsing options to make them available TODO: pass them in rather than read continuously
	o := utils.Opts{}
	arg.MustParse(&o)

	actionConfig, err := r.GetActionConfig(req.NamespacedName.Namespace, l)
	if err != nil {
		return ctrl.Result{}, err
	}

	// GET CRD
	crd := &akmv1a1.Ak{}
	err = r.Get(ctx, req.NamespacedName, crd)
	if err != nil {
		if errors.IsNotFound(err) {
			// Request object not found, could have been deleted after reconcile request.
			// Owned objects are automatically garbage collected. For additional cleanup logic use finalizers.
			// Return and don't requeue
			l.Info("Ak resource reconciliation triggered but disappeared. Checking for residual chart for uninstall then ignoring since object must have been deleted.")
			_, err := r.UninstallChart(req.NamespacedName, actionConfig)
			if err != nil {
				return ctrl.Result{}, err
			}
			return ctrl.Result{}, nil
		}
		// Error reading the object - requeue the request.
		l.Error(err, "Failed to get Ak resource. Likely fetch error. Retrying.")
		return ctrl.Result{}, err
	}
	l.Info(fmt.Sprintf("Found Ak resource `%v` in `%v`.", crd.Name, crd.Namespace))

	// Helm Chart Identification
	u, err := url.Parse(fmt.Sprintf("file://workspace/helm-charts/ak-%v.tgz", o.SrcVersion))
	if err != nil {
		return ctrl.Result{}, err
	}

	// Helm Install or Upgrade Chart
	var vals map[string]interface{}
	err = json.Unmarshal(crd.Spec.Values, &vals)
	if err != nil {
		return ctrl.Result{}, err
	}
	_, err = r.UpgradeOrInstallChart(req.NamespacedName, u, actionConfig, vals)
	if err != nil {
		return ctrl.Result{}, err
	}

	return ctrl.Result{}, nil
}

// UpgradeOrInstallChart upgrades a chart in cluster or installs it new if it does not already exist
// ulr format is [scheme:][//[userinfo@]host][/]path[?query][#fragment] e.g file://workspace/helm-charts/ak-0.1.0.tgz"
func (r *AkReconciler) UpgradeOrInstallChart(nn types.NamespacedName, u *url.URL, a *action.Configuration, o map[string]interface{}) (*release.Release, error) {
	// Helm List Action
	listAction := action.NewList(a)
	releases, err := listAction.Run()
	if err != nil {
		return nil, err
	}

	toUpgrade := false
	for _, release := range releases {
		// fmt.Println("Release: " + release.Name + " Status: " + release.Info.Status.String())
		if release.Name == nn.Name {
			toUpgrade = true
		}
	}

	c, err := r.LoadHelmChart(u)
	if err != nil {
		return nil, err
	}

	fmt.Println(o)

	var rel *release.Release
	if toUpgrade {
		// Helm Upgrade
		updateAction := action.NewUpgrade(a)
		rel, err = updateAction.Run(nn.Name, c, o)
		if err != nil {
			return nil, err
		}

	} else {
		// Helm Install
		installAction := action.NewInstall(a)
		installAction.Namespace = nn.Namespace
		installAction.ReleaseName = nn.Name
		rel, err = installAction.Run(c, o)
		if err != nil {
			return nil, err
		}
	}
	return rel, nil
}

func (r *AkReconciler) UninstallChart(nn types.NamespacedName, a *action.Configuration) (*release.UninstallReleaseResponse, error) {
	uninstallAction := action.NewUninstall(a)
	releaseResponse, err := uninstallAction.Run(nn.Name)
	if err != nil {
		return nil, err
	}
	return releaseResponse, nil
}

// GetActionConfig Get the Helm action config from in cluster service account
func (r *AkReconciler) GetActionConfig(namespace string, l logr.Logger) (*action.Configuration, error) {
	actionConfig := new(action.Configuration)
	var kubeConfig *genericclioptions.ConfigFlags
	config, err := rest.InClusterConfig()
	if err != nil {
		return nil, err
	}
	// Set properties manually from official rest config
	kubeConfig = genericclioptions.NewConfigFlags(false)
	kubeConfig.APIServer = &config.Host
	kubeConfig.BearerToken = &config.BearerToken
	kubeConfig.CAFile = &config.CAFile
	kubeConfig.Namespace = &namespace
	if err := actionConfig.Init(kubeConfig, namespace, os.Getenv("HELM_DRIVER"), log.Printf); err != nil {

	}
	return actionConfig, nil
}

// Get Connection Client to Kubernetes
func (r *AkReconciler) GetKubeClient() (*kubernetes.Clientset, error) {
	config, err := rest.InClusterConfig()
	if err != nil {
		return nil, err
	}
	// creates the clientset
	clientset, err := kubernetes.NewForConfig(config)
	if err != nil {
		return nil, err
	}
	return clientset, nil
}

// GetHelmChart loads a helm chart from a given file as URL
func (r *AkReconciler) LoadHelmChart(u *url.URL) (*chart.Chart, error) {

	// GET HELM CHART
	if u.Scheme != "file" {
		err := errors.NewInvalid(
			schema.GroupKind{
				Group: "akm.goauthentik.io",
				Kind:  "Ak",
			},
			fmt.Sprintf("Url scheme `%v` != `file`, unsupported scheme.", u.Scheme),
			field.ErrorList{})
		return nil, err
	}
	// load chart from filepath (which is part of host in url)
	path, err := filepath.Abs(u.Host + u.Path)
	if err != nil {
		return nil, err
	}
	_, err = utils.Exists(path)
	if err != nil {
		return nil, err
	}
	return chartLoader.Load(path)
}

Which has much the same effect.
I will checkout to an old commit and get you a more in depth debug log regardless if you are still interested in this though.

@Jay-Madden
Copy link

Jay-Madden commented Jul 26, 2023

This just bit us pretty badly, spent a few hours on this with an infinite reconcile loop stemming from this code in the helm chart

  template:
    metadata:
      labels:
        app: {{ .Values.metadata.name }}
        backstage.io/kubernetes-id: {{ .Values.metadata.annotations.backstageName }}
        kuma.io/region: {{ include "webapp.cluster.region" . }} # <--- this line which calls this helper
{{- define "webapp.cluster.region" -}}
{{- $region:= "noregion" }}
{{- if $config :=  (lookup "v1" "ConfigMap" "runway-operator" "kubernetes-info") -}}
    {{- $region = $config.data.region_name -}}
{{- end }}
{{- printf "%v" $region -}} 
{{- end -}}

This line

when the release spec is gotten it doesn't account for lookup values and therefore sees the two manifests as different

Is there considered a bug or expected behavior?

@joelanford
Copy link
Member

Lookup values should be accounted for. The release secret of the current release should have the value populated from the lookup function. The next time reconcile happens, I would expect that the dry-run upgrade that happens would also populate the value from the lookup. If the value of the lookup changes, then that would trigger a diff in the manifest and a real upgrade would proceed.

Can you turn logging up to level 4 to get the release diff in the logs and see what the difference is (if there is a difference at least):

if log.V(4).Enabled() {
fmt.Println(diff.Generate("", rel.Manifest))
}

@Jay-Madden
Copy link

Update: its because of this line

When DryRun is set to true lookup values are not respected and the upgrade check for sameness will always fail because the template has default values while the actual release has the lookedup values

@Jay-Madden
Copy link

@joelanford Apologies for the delay I didn't see your comment.

Here is a short 2 second log dump run with go run cmd/main.go --zap-log-level 4

https://pastebin.com/WKCRk8Ns The diff that it logs every time is identical.

The actual upgrade check, done here

if specRelease.Manifest != currentRelease.Manifest ||
currentRelease.Info.Status == release.StatusFailed ||
currentRelease.Info.Status == release.StatusSuperseded {
return currentRelease, stateNeedsUpgrade, nil

is done with a dry run flag which does not actually lookup values in a cluster so the string comparison fails because it fell back to the default value. As documented here

https://helm.sh/docs/chart_template_guide/functions_and_pipelines/#using-the-lookup-function

image

@joelanford
Copy link
Member

Oh, it is unfortunate that the lookup function is non-functional during dry runs.

The dry run and manifest comparison are fundamental aspects of the reconciliation, and we have not been able to find an alternative.

Some context is documented here: operator-framework/operator-sdk#5059 (comment)

@miquella
Copy link

@joelanford: Would it be possible to use the new --dry-run=server introduced in v3.13.0?

@joelanford
Copy link
Member

We can certainly try that out! My initial instinct is that we can just change the existing client-side dry-run used to calculate the manifest diff to be a server-side dry-run.

Do we know of any examples where switching from client dry-run to server dry-run would break existing usage?

The goal of the dry-run manifest generation is to simulate as closely as possible a real upgrade. So my personal feeling is that since a real upgrade obviously contacts the server and supports templates with the lookup function, it follows that using server-side dry run is a strict improvement and aligns with the purpose of the dry-run in the first place.

Can you try using the code from #344 to see if it would resolve this issue?

@acornett21
Copy link

@miquella If you have a normal helm operator this feature already in operator-sdk, and can be enabled by updating the watches.yaml please take a look at the below PR.

@miquella
Copy link

@joelanford: I apparently crossed some wires in my research 😝 It was pointed out to me that our current solution isn't currently using the hybrid operator, so I don't think I have a way to test with your change.

But it sounds like we can take advantage of the solution that @acornett21 suggested!

Thanks, guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants