Helm 3 resource installation order is not respected #2227

PierreBeucher · 2022-11-08T18:24:48Z

What happened?

When deploying a Helm chart using helm.v3.Chart resource, all resources are deployed in parallel. This causes issues when Helm chart is designed to follow the expected deployment order documented here and linking to InstallerOrder var in code:

var InstallOrder KindSortOrder = []string{
	"Namespace",
        ...
	"Secret",
	"ConfigMap",
       ...
	"Service",
	"DaemonSet",
	"Deployment",
}

For example, a Secret is deployed before a Pod or Deployment using it.

Steps to reproduce

Deploy any chart using multiple resources

Expected Behavior

Deploy resources in documented order

Actual Behavior

Resources are not deployed in documented order

Output of `pulumi about`

CLI          
Version      3.42.0
Go Version   go1.19.1
Go Compiler  gc

Host     
OS       alpine
Version  3.16.0
Arch     x86_64

Pulumi locates its logs in /tmp by default

Additional context

This is probably related to #1671 in which @lblackstone said:

We did sort resources with that ordering in the original TypeScript implementation of our Helm support, but it got dropped at some point since Pulumi automatically retries resource creation.

It's indeed solving the issue in most situations, however some charts like Datadog Helm chart relies on this mechanism to update Secret shared across Pods and ensure they are restarted using this pattern:

Set a value in a Secret used by two different set of Pods (let's call them A and B)
- A and B rely on this secret for communication
- More specifically: A expected B to provide the shared secret value at runtime
- If B can't connect to A on startup, it fails
- This is a common pattern for clusterized systems like Database, Redis, etc.
To ensure Pods are all recreated upon Secret change, add some annotation which will be different if secret change (such as a checksum of Secret template or secret value)
- This is also the typical approach to ensure consistency between Secret/ConfigMap and Pods

For this to work, we must deploy the Secret/ConfigMap before all Pods specs are applied, otherwise a new Pod may be created before the new Secret is created, resulting in some Pods using old Secret and other Pods using new Secret, causing havoc.

"Hey, but your pod won't be healthy and will be restarted using new secret". Not necessarily, considering that A is passively waiting for B to provide the shared secret. What may happen is:

A is started using old secret
New Secret is created
B is started using new secret
B fails to contact A because of invalid auth (as their secrets do not match). B is unealthy and restarted (still using new secret), again and again, ending-up in CrashLoopBackoff
A seems "healthy" but is in fact using old secret and will never be restarted
Have fun debugging :)

A simple workaround is of course to force recreate (restart) A manually, but the issue will keep happening from time to time.

To be more specific, we had this issue with Datadog Helm Charts because Datadog Cluster Agent expect an auth token from Node Agents (each Node Agent report to a single Cluster Agent), and it's exactly what happened: Cluster Agent was restarted too soon using old auth token, and all new Node Agents failed because they couldn't reach the Cluster Agent.

I guess such situations may be frequent in other situations, such as clusterized systems like Database, Redis, etc.

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

The text was updated successfully, but these errors were encountered:

squaremo · 2022-11-10T13:16:38Z

The Chart object is really more like a template expansion -- it applies its configuration to the templates of the chart, then treats the resulting YAML as just a bunch of YAML. Are you able to use the helm.v3.Release instead? That uses the Helm machinery to deploy a chart, meaning you get all the dependency ordering and other Helm-specific features.

I would say Chart is unlikely to fundamentally change, so Release is a better bet for usng Helm with Pulumi if you can make it work.

LouisJULIEN · 2023-03-24T09:37:57Z

I would like to advise for a bit of caution when migrating from helm.v3.Chart to helm.v3.Release as import doesn't seem possible. You should:

First apply your changes to delete the helm.v3.Chart
Then apply your changes to create the helm.v3.Release.

I want to stress out here that you should do it in two steps to avoid concurrency issues and do it in that order. I encountered issue with CRDs. They were not created by the helm.v3.Release as they still existed at helm.v3.Release creation but they were later deleted by the removal of helm.v3.Chart. It left us in a weird state where helm.v3.Release didn't and wouldn't create the CRDs. I had to delete the helm.v3.Release and recreate it again to have the CRDs on my cluster.

EronWright · 2024-05-14T00:17:28Z

The upcoming Chart v4 resource has improved support for resource ordering, based on cli-utils as described here. This means that you can apply a config.kubernetes.io/depends-on annotation to ensure that the deployment is installed first.

PierreBeucher added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Nov 8, 2022

squaremo added kind/question Questions about existing features area/helm and removed kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Nov 10, 2022

mnlumi added the mro2 Monica's list of 2st tier overlay related issues label Mar 28, 2023

lblackstone mentioned this issue Mar 21, 2024

[yaml/v2] Support for resource ordering (implicit and explicit) #2894

Merged

EronWright mentioned this issue Mar 29, 2024

Develop Component resource for: kubernetes:helm:v4:Chart #2847

Closed

blampe mentioned this issue Apr 18, 2024

Cannot preview RBAC resources when user is not a cluster admin #2962

Open

EronWright mentioned this issue Apr 22, 2024

Fail-fast when a program attempts to create an object that already exists. #2926

Open

EronWright self-assigned this May 14, 2024

EronWright added the resolution/fixed This issue was fixed label May 14, 2024

EronWright closed this as completed May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helm 3 resource installation order is not respected #2227

Helm 3 resource installation order is not respected #2227

PierreBeucher commented Nov 8, 2022 •

edited

squaremo commented Nov 10, 2022

LouisJULIEN commented Mar 24, 2023 •

edited

EronWright commented May 14, 2024

Helm 3 resource installation order is not respected #2227

Helm 3 resource installation order is not respected #2227

Comments

PierreBeucher commented Nov 8, 2022 • edited

What happened?

Steps to reproduce

Expected Behavior

Actual Behavior

Output of pulumi about

Additional context

Contributing

squaremo commented Nov 10, 2022

LouisJULIEN commented Mar 24, 2023 • edited

EronWright commented May 14, 2024

PierreBeucher commented Nov 8, 2022 •

edited

Output of `pulumi about`

LouisJULIEN commented Mar 24, 2023 •

edited