Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm 3 resource installation order is not respected #2227

Closed
Tracked by #2847
PierreBeucher opened this issue Nov 8, 2022 · 3 comments
Closed
Tracked by #2847

Helm 3 resource installation order is not respected #2227

PierreBeucher opened this issue Nov 8, 2022 · 3 comments
Assignees
Labels
area/helm kind/question Questions about existing features mro2 Monica's list of 2st tier overlay related issues resolution/fixed This issue was fixed

Comments

@PierreBeucher
Copy link

PierreBeucher commented Nov 8, 2022

What happened?

When deploying a Helm chart using helm.v3.Chart resource, all resources are deployed in parallel. This causes issues when Helm chart is designed to follow the expected deployment order documented here and linking to InstallerOrder var in code:

var InstallOrder KindSortOrder = []string{
	"Namespace",
        ...
	"Secret",
	"ConfigMap",
       ...
	"Service",
	"DaemonSet",
	"Deployment",
}

For example, a Secret is deployed before a Pod or Deployment using it.

Steps to reproduce

Deploy any chart using multiple resources

Expected Behavior

Deploy resources in documented order

Actual Behavior

Resources are not deployed in documented order

Output of pulumi about

CLI          
Version      3.42.0
Go Version   go1.19.1
Go Compiler  gc

Host     
OS       alpine
Version  3.16.0
Arch     x86_64

Pulumi locates its logs in /tmp by default

Additional context

This is probably related to #1671 in which @lblackstone said:

We did sort resources with that ordering in the original TypeScript implementation of our Helm support, but it got dropped at some point since Pulumi automatically retries resource creation.

It's indeed solving the issue in most situations, however some charts like Datadog Helm chart relies on this mechanism to update Secret shared across Pods and ensure they are restarted using this pattern:

  • Set a value in a Secret used by two different set of Pods (let's call them A and B)
    • A and B rely on this secret for communication
    • More specifically: A expected B to provide the shared secret value at runtime
    • If B can't connect to A on startup, it fails
    • This is a common pattern for clusterized systems like Database, Redis, etc.
  • To ensure Pods are all recreated upon Secret change, add some annotation which will be different if secret change (such as a checksum of Secret template or secret value)
    • This is also the typical approach to ensure consistency between Secret/ConfigMap and Pods

For this to work, we must deploy the Secret/ConfigMap before all Pods specs are applied, otherwise a new Pod may be created before the new Secret is created, resulting in some Pods using old Secret and other Pods using new Secret, causing havoc.

"Hey, but your pod won't be healthy and will be restarted using new secret". Not necessarily, considering that A is passively waiting for B to provide the shared secret. What may happen is:

  1. A is started using old secret
  2. New Secret is created
  3. B is started using new secret
  4. B fails to contact A because of invalid auth (as their secrets do not match). B is unealthy and restarted (still using new secret), again and again, ending-up in CrashLoopBackoff
  5. A seems "healthy" but is in fact using old secret and will never be restarted
  6. Have fun debugging :)

A simple workaround is of course to force recreate (restart) A manually, but the issue will keep happening from time to time.


To be more specific, we had this issue with Datadog Helm Charts because Datadog Cluster Agent expect an auth token from Node Agents (each Node Agent report to a single Cluster Agent), and it's exactly what happened: Cluster Agent was restarted too soon using old auth token, and all new Node Agents failed because they couldn't reach the Cluster Agent.

I guess such situations may be frequent in other situations, such as clusterized systems like Database, Redis, etc.

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@PierreBeucher PierreBeucher added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Nov 8, 2022
@squaremo
Copy link
Contributor

The Chart object is really more like a template expansion -- it applies its configuration to the templates of the chart, then treats the resulting YAML as just a bunch of YAML. Are you able to use the helm.v3.Release instead? That uses the Helm machinery to deploy a chart, meaning you get all the dependency ordering and other Helm-specific features.

I would say Chart is unlikely to fundamentally change, so Release is a better bet for usng Helm with Pulumi if you can make it work.

@squaremo squaremo added kind/question Questions about existing features area/helm and removed kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Nov 10, 2022
@LouisJULIEN
Copy link

LouisJULIEN commented Mar 24, 2023

I would like to advise for a bit of caution when migrating from helm.v3.Chart to helm.v3.Release as import doesn't seem possible. You should:

  1. First apply your changes to delete the helm.v3.Chart
  2. Then apply your changes to create the helm.v3.Release.

I want to stress out here that you should do it in two steps to avoid concurrency issues and do it in that order. I encountered issue with CRDs. They were not created by the helm.v3.Release as they still existed at helm.v3.Release creation but they were later deleted by the removal of helm.v3.Chart. It left us in a weird state where helm.v3.Release didn't and wouldn't create the CRDs. I had to delete the helm.v3.Release and recreate it again to have the CRDs on my cluster.

@EronWright
Copy link
Contributor

The upcoming Chart v4 resource has improved support for resource ordering, based on cli-utils as described here. This means that you can apply a config.kubernetes.io/depends-on annotation to ensure that the deployment is installed first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm kind/question Questions about existing features mro2 Monica's list of 2st tier overlay related issues resolution/fixed This issue was fixed
Projects
None yet
Development

No branches or pull requests

5 participants