Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform state is updated when update apply fails #265

Open
RoryCrispin opened this issue Mar 27, 2023 · 0 comments
Open

Terraform state is updated when update apply fails #265

RoryCrispin opened this issue Mar 27, 2023 · 0 comments

Comments

@RoryCrispin
Copy link

RoryCrispin commented Mar 27, 2023

When the provider fails while applying an update to a kubectl resource the change is still persisted in the terraform state.
Subsequent plans will then generate no changes and the inconsistency remains silently present.

To mitigate, the administrator will need to manually identify all cases where the state has become out-of-sync and trigger a change, such as making a superflous change to the YAML definition such adding a tmp annotation in order to force the provider to update the resource.

Steps to reproduce:

Using Rancher Desktop as an example

With the definition;

terraform {
  required_version = ">= 0.13"

  required_providers {
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = ">= 1.7.0"
    }
  }
}

provider "kubectl" {
  host                   = "127.0.0.1:6443"
  load_config_file       = true
  config_context = "rancher-desktop"
  insecure = true
}

resource "kubectl_manifest" "test" {
    yaml_body = <<YAML
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  annotations:
    tmp: one
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
YAML
}
  1. Run terraform init, then run and apply terraform apply to create the resource
  2. Update the spec.replicas: 2 and the annotation to tmp: two
  3. Runterraform apply to generate the plan but don't yet type 'yes' to run it
  4. Simulate a network partition somehow. As I'm using a local cluster I will just shut it down
  5. Type 'yes' to apply the plan
  6. Observe that the apply fails - and that terraform state pull shows that replicas: 2 and tmp: two was persisted to TF state
  7. Resolve the network partition
  8. Observe that the replica count and annotation is still 1/one on the cluster
  9. Run terraform plan and observe that 'no changes' are requested by the provider

Workaround:
As a workaround - apply another change to yaml and apply it, then reverse it.
10. Change the replicas and annotation to 3/three
11. Run and apply terraform apply
12. Observe that the plan shows a diff from 2->3 rather than 1->3 which is what will actually be applied to the cluster
13. Change them back to 2 and apply - you are now at the desired state.

  # kubectl_manifest.test will be updated in-place
  ~ resource "kubectl_manifest" "test" {
        id                      = "/apis/apps/v1/namespaces/default/deployments/nginx-deployment"
        name                    = "nginx-deployment"
      ~ yaml_body               = (sensitive value)
      ~ yaml_body_parsed        = <<-EOT
            apiVersion: apps/v1
            kind: Deployment
            metadata:
              annotations:
          -     tmp: two
          +     tmp: three
              name: nginx-deployment
            spec:
          -   replicas: 2
          +   replicas: 3
              selector:
                matchLabels:
                  app: nginx
              template:
                metadata:
                  labels:
                    app: nginx
                spec:
                  containers:
                  - image: nginx:1.14.2
                    name: nginx
                    ports:
                    - containerPort: 80
        EOT
        # (13 unchanged attributes hidden)
    }

Shutting down the cluster is, of course, a contrived example.
This bug was actually found on a real cluster because the CI workers K8s credentials expired due to a long task before the TF apply, causing an Unauthorized response from K8s.

I suspect this bug is related to the following line of documentation:
https://developer.hashicorp.com/terraform/plugin/framework/diagnostics#how-errors-affect-state

How Errors Affect State
Returning an error diagnostic does not stop the state from being updated. Terraform will still persist the returned state even when an error diagnostic is returned with it. This is to allow Terraform to persist the values that have already been modified when a resource modification requires multiple API requests or an API request fails after an earlier one succeeded.

When returning error diagnostics, we recommend resetting the state in the response to the prior state available in the configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant