Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"panic: duplicate checkable objects report" in terraform apply #31846

Closed
SimonG-Kontentai opened this issue Sep 22, 2022 · 24 comments · Fixed by #31890
Closed

"panic: duplicate checkable objects report" in terraform apply #31846

SimonG-Kontentai opened this issue Sep 22, 2022 · 24 comments · Fixed by #31890
Assignees
Labels
bug confirmed a Terraform Core team member has reproduced this issue core v1.3 Issues (primarily bugs) reported against v1.3 releases

Comments

@SimonG-Kontentai
Copy link

SimonG-Kontentai commented Sep 22, 2022

Terraform Version

Terraform: 1.3
AzureRM: 3.23.0
AzureAD: 2.28.1
Fastly: 2.3.2

Terraform Configuration Files

Debug Output

Crash output in "Actual Behavior"

Expected Behavior

Everything is fine with Terraform 1.2.9

Actual Behavior

First crash problem

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration
and found no differences, so no changes are needed.

!!!!!!!!!!!!!!!!!!!!!!!!!!! TERRAFORM CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!

Terraform crashed! This is always indicative of a bug within Terraform.
Please report the crash with Terraform1 so that we can fix this.

When reporting bugs, please include your terraform version, the stack trace
shown below, and any additional information which may help replicate the issue.

!!!!!!!!!!!!!!!!!!!!!!!!!!! TERRAFORM CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!

duplicate checkable objects report for module.webhooks_api.module.location.output.keyvault_location
goroutine 138210 [running]:
runtime/debug.Stack()
/usr/local/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
/usr/local/go/src/runtime/debug/stack.go:16 +0x19
github.com/hashicorp/terraform/internal/logging.PanicHandler()
/home/circleci/project/project/internal/logging/panic.go:55 +0x153
panic({0x2c01360, 0xc030a317e0})
/usr/local/go/src/runtime/panic.go:884 +0x212
github.com/hashicorp/terraform/internal/checks.(*State).ReportCheckableObjects(0x2ebd400?, {0x3639ca8?, 0xc006502fc0}, 0xc003a69f50?)
/home/circleci/project/project/internal/checks/state_report.go:29 +0x45d
github.com/hashicorp/terraform/internal/terraform.(*nodeExpandOutput).DynamicExpand(0xc00b7c9100, {0x3650618, 0xc01197b500})
/home/circleci/project/project/internal/terraform/node_output.go:102 +0x2dd
github.com/hashicorp/terraform/internal/terraform.(*Graph).walk.func1({0x2ecdc00, 0xc00b7c9100})
/home/circleci/project/project/internal/terraform/graph.go:84 +0x4d4
github.com/hashicorp/terraform/internal/dag.(*Walker).walkVertex(0xc0063cad80, {0x2ecdc00, 0xc00b7c9100}, 0xc003c99540)
/home/circleci/project/project/internal/dag/walk.go:381 +0x2f6
created by github.com/hashicorp/terraform/internal/dag.(*Walker).Update
/home/circleci/project/project/internal/dag/walk.go:304 +0xf65

Second crash problem

Steps to Reproduce

The first problem occurs in the existing environment

  1. Terraform init
  2. Terraform apply

Additional Context

Not sure if this is problem of new Terraform version or AzureRM is incompatible with new Terraform

References

No response

@SimonG-Kontentai SimonG-Kontentai added bug new new issue not yet triaged labels Sep 22, 2022
@Laffs2k5
Copy link

Seeing similar issue;

Version

$ terraform -version 
Terraform v1.3.0
on linux_amd64

Providers

provider "registry.terraform.io/hashicorp/azuread" {
  version     = "2.28.1"
}
provider "registry.terraform.io/hashicorp/azurerm" {
  version     = "3.23.0"
}
provider "registry.terraform.io/hashicorp/random" {
  version     = "3.4.3"
}

Stack trace:

duplicate checkable objects report for module.main.module.ship_logs_to_event_hub.azurerm_policy_definition.policy 
goroutine 741 [running]:
runtime/debug.Stack()
        /usr/local/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
        /usr/local/go/src/runtime/debug/stack.go:16 +0x19
github.com/hashicorp/terraform/internal/logging.PanicHandler()
        /home/circleci/project/project/internal/logging/panic.go:55 +0x153
panic({0x2140d20, 0xc001760e50})
        /usr/local/go/src/runtime/panic.go:884 +0x212
github.com/hashicorp/terraform/internal/checks.(*State).ReportCheckableObjects(0x24a8880?, {0x2b667b8?, 0xc0011d7180}, 0xc001707270?)
        /home/circleci/project/project/internal/checks/state_report.go:29 +0x45d
github.com/hashicorp/terraform/internal/terraform.(*NodePlannableResource).DynamicExpand(0xc000fedc20, {0x2b7ce98, 0xc001c10380})
        /home/circleci/project/project/internal/terraform/node_resource_plan.go:333 +0x852
github.com/hashicorp/terraform/internal/terraform.(*Graph).walk.func1({0x25231c0, 0xc000fedc20})
        /home/circleci/project/project/internal/terraform/graph.go:84 +0x4d4
github.com/hashicorp/terraform/internal/dag.(*Walker).walkVertex(0xc0018d0f00, {0x25231c0, 0xc000fedc20}, 0xc001911d40)
        /home/circleci/project/project/internal/dag/walk.go:381 +0x2f6
created by github.com/hashicorp/terraform/internal/dag.(*Walker).Update
        /home/circleci/project/project/internal/dag/walk.go:304 +0xf65

@apparentlymart apparentlymart added the v1.3 Issues (primarily bugs) reported against v1.3 releases label Sep 22, 2022
@joe-a-t
Copy link

joe-a-t commented Sep 22, 2022

We are also seeing this duplicate checkable objects report for module.xxxxx issue with certain root modules but it appears to be super random (eg two different root modules that should be basically identical have different behavior). We are not using azure, we are using

Terraform v1.3.0
on darwin_amd64
+ provider registry.terraform.io/hashicorp/google v4.37.0
+ provider registry.terraform.io/hashicorp/google-beta v4.37.0
+ provider registry.terraform.io/hashicorp/http v3.1.0
+ provider registry.terraform.io/hashicorp/random v3.4.3

This crashing is happening without making any changes to our configuration from what successfully works with 1.2.7 and switching back to 1.2.7 gets rid of the crash. We will post more details here once we have narrowed down the issue a bit and tried to get something reproducible and sharable, just wanted to flag that the issue appears to be wider-spread than the azure provider.

@joe-a-t
Copy link

joe-a-t commented Sep 22, 2022

Seems to be coming from the new checks introduced in 9dea198#diff-cb64485c666c1dfa5eccfd55aee6959f8c133a440107e2456f9511328bb68b68R29. We suspect that it has something to do with the existing state file since very similar root modules don't have the same experience with this crash, but even looking at the raw state file, we aren't seeing anything that looks suspicious.

@joe-a-t
Copy link

joe-a-t commented Sep 22, 2022

We are still unable to provide sample code that reproduces this issue, but so far, we have seen is multiple places with google_storage_bucket and google_compute_subnetwork resources. However, we also have a bunch of usages of those resources that are totally fine and have so far been unable to identify any unique features or meaningful differences between the affected and unaffected resources since even usage of the same module that holds the raw resource call is behaving differently.

@pndurette
Copy link

pndurette commented Sep 22, 2022

Maybe irrelevant, but in my case (and in the stack traces so far in this thread), the crash happens on nested modules, i.e. module.<sub>.module.<sub-sub>.<resource>

Edit: it's apparently not the case for everyone.

@joe-a-t
Copy link

joe-a-t commented Sep 22, 2022

Most of our usages are also with nested modules, but we do have some that are occurring with single layer modules paths (eg module.foo.google_storage_bucket.bar).

@baguasquirrel
Copy link

baguasquirrel commented Sep 22, 2022

Seeing this as well. For us, it happened during a terraform plan where we used -target, and it crashed on a resource that was a data vault_generic_secret (i.e. a Hashicorp Vault data resource).

@apparentlymart
Copy link
Member

Hi all!

Are you any of you able to share a TF_LOG=trace report as a GitHub Gist?

The first of these is indeed coming from the new checks implementation (which is a generalization of what we added for preconditions and postconditions in v1.1) but this is supposed to be an unreachable codepath so clearly there's something different about all of your configurations that our test cases don't cover.

I understand that it's hard to produce a small reproduction configuration here -- reducing it is probably removing whatever special situation is allowing this to happen 😖 -- but in the absence of a configuration we may be able to understand from the series of actions in the trace log why exactly this occurred.

What I can tell from just the stack trace is that for some reason the module "expansion" logic (which handles count and for_each, if present) seems to be happening twice for at least one of the output values in the configuration, but I'm not sure yet why that would be, since each output value graph node should be visited only once per graph walk.

The second panic here, from terraform import, seems to be unrelated to the first. @SimonG-Kontentai would you mind splitting that one out into a separate issue so that we can focus on the "duplicate checkable objects report" problem here? I expect the solutions to these two will be independent of one another so having them tracked separately will help make sure they can both get addressed. Thanks!

@apparentlymart apparentlymart changed the title Crash in TF 1.3.0 "duplicate checkable objects report" in terraform apply in v1.3.0 Sep 22, 2022
@apparentlymart apparentlymart changed the title "duplicate checkable objects report" in terraform apply in v1.3.0 "panic: duplicate checkable objects report" in terraform apply in v1.3.0 Sep 22, 2022
@apparentlymart apparentlymart added waiting for reproduction unable to reproduce issue without further information core and removed new new issue not yet triaged labels Sep 22, 2022
@joe-a-t
Copy link

joe-a-t commented Sep 23, 2022

Hey @apparentlymart, thanks a lot for the quick attention. Here's a gist from a targeted plan that generates the crash. Please LMK when you've had a chance to copy it so I can delete it.

We are also happy to hop on a call and debug in a less public forum where we don't have to worry about sanitizing as much. I've previously chatted with Korinne via email and @mhaggerty042 also has my contact info.

@apparentlymart
Copy link
Member

Thanks @joe-a-t! I've taken a copy of your log, so you can delete it now if you wish.

For anyone on our team who wants to see the log once it's been deleted: contact me via one of our internal communication channels and I'll tell you where I put this. 😀

@apparentlymart
Copy link
Member

Okay, I think I see what's going on here, but I've run out of time today so I'm just going to record what I know and then I or someone else will dig in some more and try to produce a smaller repro on another day.

The problem reflected in the trace log is that resources inside modules actually "expand" twice: first to handle the count and for_each arguments for all of the module calls along the way, and then again to handle the count and for_each arguments on the resources themselves.

The call that's creating this crash is currently happening at the wrong level: it's being reported inside the second level of expansion, and so it ends up being called once for each instance of the module it's inside rather than just once to represent the static resource block itself.

The original issue's stack trace is instead originating in an output value graph node, so this explanation doesn't suit that stack trace. So far it seems like there are two independent bugs here, and so we should be careful not to consider it solved just by addressing the situation that's covered by the trace log. We'll need to separately find out what's causing the output value situation and address that one too before this is fully resolved.

@apparentlymart
Copy link
Member

apparentlymart commented Sep 23, 2022

I have a minimal reproduction for the resource variant of this problem.

root.tf:

module "child" {
  source = "./child"
  count  = 2 # Must be 2 or more to reproduce
}

./child/child.tf:

locals {
  two = 2
}

resource "null_resource" "test" {
  lifecycle {
    # At least one resource in the child module must have a precondition
    # or postcondition to activate the bug. This condition won't actually
    # be checked because Terraform crashes before evaluating it.
    postcondition {
      condition     = (1 == local.two)
      error_message = "Forced condition failure"
    }
  }
}

As noted in the comments, it appears that the two requirements to reproduce are:

  • There be a module with two or more instances.
  • That module contains at least one resource that has a precondition or postcondition declared.

The crash occurs because Terraform Core is incorrectly trying to capture the information about what instances of the resource exist separately for each module instance, rather than just once with all of the module instances and all of the resource instances taken into account together.

duplicate checkable objects report for module.child.null_resource.test 
goroutine 49 [running]:
runtime/debug.Stack()
        /home/me/.goenv/versions/1.19.1/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
        /home/me/.goenv/versions/1.19.1/src/runtime/debug/stack.go:16 +0x19
github.com/hashicorp/terraform/internal/logging.PanicHandler()
        /home/me/Devel/terraform/terraform/internal/logging/panic.go:55 +0x153
panic({0x2146960, 0xc0006eb480})
        /home/me/.goenv/versions/1.19.1/src/runtime/panic.go:884 +0x212
github.com/hashicorp/terraform/internal/checks.(*State).ReportCheckableObjects(0x24af180?, {0x2b6d758?, 0xc0000a5380}, 0xc00090edc0?)
        /home/me/Devel/terraform/terraform/internal/checks/state_report.go:29 +0x45d
github.com/hashicorp/terraform/internal/terraform.(*NodePlannableResource).DynamicExpand(0xc0000e77a0, {0x2b83e98, 0xc000877180})
        /home/me/Devel/terraform/terraform/internal/terraform/node_resource_plan.go:333 +0x852
github.com/hashicorp/terraform/internal/terraform.(*Graph).walk.func1({0x2529ac0, 0xc0000e77a0})
        /home/me/Devel/terraform/terraform/internal/terraform/graph.go:84 +0x4d4
github.com/hashicorp/terraform/internal/dag.(*Walker).walkVertex(0xc00082b800, {0x2529ac0, 0xc0000e77a0}, 0xc0008de500)
        /home/me/Devel/terraform/terraform/internal/dag/walk.go:381 +0x2f6
created by github.com/hashicorp/terraform/internal/dag.(*Walker).Update
        /home/me/Devel/terraform/terraform/internal/dag/walk.go:304 +0xf65

@joe-a-t
Copy link

joe-a-t commented Sep 23, 2022

As noted in the comments, it appears that the two requirements to reproduce are:

  • There be a module with two or more instances.
  • That module contains at least one resource that has a precondition or postcondition declared.

Can confirm, that appears to be the case in all of the instances where we are seeing this crash.

@apparentlymart
Copy link
Member

apparentlymart commented Sep 23, 2022

I've also confirmed that the output value crash seems to have a different cause. I wasn't able to reproduce it by just trivially changing my previous example to include an output value instead of a resource.

I'm going to focus on the resource variant for the moment since we already have a good reproduction case for it, but we'll need to also track down the output value variant and solve that before this is truly fixed. If anyone can share a GitHub Gist with a TF_LOG=trace leading to the panic in the original issue comment (where the stack trace mentions *nodeExpandOutput instead of *NodePlannableResource) then that I expect that would accelerate fixing both.

@apparentlymart
Copy link
Member

Thanks again for the trace log for the resource-related variant of this bug! I have a proposed fix over in #31860 so hopefully we'll get that particular one addressed soon; it seems like that is what most of you in this discussion are seeing.

I've not yet developed any leads on how to reproduce the variant where the address in the panic message is for an output value rather than a resource: output values don't support expansion (count and for_each) of their own so the situation I fixed for resources doesn't apply to output values.

It's clear from the panic stacktrace that somehow the output value module.webhooks_api.module.location.output.keyvault_location in the original issue was asked to expand twice, but it isn't yet clear to me how that can happen since each graph node is only visited once per graph walk.

@SimonG-Kontentai if you are able to produce a trace log by running with TF_LOG=trace, or share some relevant configuration, that would be a big help to narrow down what is special about your configuration which is making this crash occur for you. Alternatively, if anyone else is seeing this problem with output values rather than resources then I'd love if you could share a TF_LOG=trace trace and/or an example configuration that reproduces the problem.

Thanks!

@joe-a-t
Copy link

joe-a-t commented Sep 23, 2022

Awesome, thanks a lot for the speed here. We are only seeing the resource related variant so unfortunately I cannot help with the output variant. Would it be possible to cut a new Terraform release as soon as your PR gets merged or very shortly afterwards since this is fixing a panic?

@apparentlymart
Copy link
Member

We typically release Terraform at most weekly, so the earliest a fix would be available is this Wednesday if the PR is merged in time. If you are currently affected by this then I recommend staying on the most recent v1.2 release until a fixed version is available.

@apparentlymart
Copy link
Member

The resource-related problem is now fixed in the v1.3 branch ready to be included in the forthcoming v1.3.1 release.

I've still not been able to reproduce the output-related version of this, so we don't yet have a proposed fix for that. If anyone has an example of it and is able to share a trace log and/or an example configuration, please let us know. Thanks!

@apparentlymart
Copy link
Member

Since the resource-related variant of this should be fixed in the forthcoming v1.3.1 release, this issue now represents only the output value variant that originally opened it.

However, we don't yet have any leads on how to reproduce it in order to determine the cause and develop a fix, so I'm going to remove this from the v1.3.1 milestone to represent that it should not block the v1.3.1 release. Once we're able to gather enough information to transition this to the "explained" state I'll target it to the next available milestone assuming that this happens soon enough that we're still using milestones for tracking. (We typically use milestones only around x.y.0 and early patch releases where there are often many different streams of work all trying to land at once; in later patch releases it tends to just become a small trickle of bug fixes and so the overhead of milestone tracking isn't worth it.)

@apparentlymart apparentlymart removed this from the v1.3.1 milestone Sep 27, 2022
@apparentlymart apparentlymart changed the title "panic: duplicate checkable objects report" in terraform apply in v1.3.0 "panic: duplicate checkable objects report" in terraform apply Sep 27, 2022
@Vince-Chenal
Copy link

Vince-Chenal commented Sep 28, 2022

Hello @apparentlymart
I was able to reproduce the problem with an output if it can help

main.tf

terraform {
  required_version = "~> 1.3.1"
}

module "my_module" {
  source = "./module"
  input  = "hello"
}

output "my_output" {
  value = module.my_module.output
}

module/main.tf

variable "input" {
  type = string
}

output "output" {
  value = var.input

  precondition {
    condition     = var.input != ""
    error_message = "some error"
  }
}

Debug Output

~/dev/1.3.1-crash ❯ terraform version
Terraform v1.3.1
on linux_amd64

~/dev/1.3.1-crash ❯ terraform apply

Changes to Outputs:
  + my_output = "hello"

You can apply this plan to save these new output values to the Terraform state, without changing any real infrastructure.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes


!!!!!!!!!!!!!!!!!!!!!!!!!!! TERRAFORM CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!

Terraform crashed! This is always indicative of a bug within Terraform.
Please report the crash with Terraform[1] so that we can fix this.

When reporting bugs, please include your terraform version, the stack trace
shown below, and any additional information which may help replicate the issue.

[1]: https://github.com/hashicorp/terraform/issues

!!!!!!!!!!!!!!!!!!!!!!!!!!! TERRAFORM CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!

duplicate checkable objects report for module.my_module.output.output 
goroutine 104 [running]:
runtime/debug.Stack()
        /usr/local/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
        /usr/local/go/src/runtime/debug/stack.go:16 +0x19
github.com/hashicorp/terraform/internal/logging.PanicHandler()
        /home/circleci/project/project/internal/logging/panic.go:55 +0x153
panic({0x2140e00, 0xc000830500})
        /usr/local/go/src/runtime/panic.go:884 +0x212
github.com/hashicorp/terraform/internal/checks.(*State).ReportCheckableObjects(0x23fab00?, {0x2b665a0?, 0xc00082db30}, 0xc000136076?)
        /home/circleci/project/project/internal/checks/state_report.go:29 +0x45d
github.com/hashicorp/terraform/internal/terraform.(*nodeExpandOutput).DynamicExpand(0xc0008991c0, {0x2b7ccb8, 0xc00089c8c0})
        /home/circleci/project/project/internal/terraform/node_output.go:102 +0x2dd
github.com/hashicorp/terraform/internal/terraform.(*Graph).walk.func1({0x240b3e0, 0xc0008991c0})
        /home/circleci/project/project/internal/terraform/graph.go:84 +0x442
github.com/hashicorp/terraform/internal/dag.(*Walker).walkVertex(0xc0005cac60, {0x240b3e0, 0xc0008991c0}, 0xc0008994c0)
        /home/circleci/project/project/internal/dag/walk.go:381 +0x2f6
created by github.com/hashicorp/terraform/internal/dag.(*Walker).Update
        /home/circleci/project/project/internal/dag/walk.go:304 +0xf65

@apparentlymart
Copy link
Member

Thanks @Vince-Chenal!

I was able to reproduce the crash with the configuration you shared. I also see from your full output why I wasn't able to reproduce this before: I was testing only with terraform plan and not with terraform apply, because the resource-related variant of this bug was a plan-time bug rather than an apply-time bug.

This particular variant crashes only during the apply phase, which actually makes more sense in retrospect because the apply phase is supposed to be announcing check results for the checkable objects that were planned during the planning phase, and not announcing new objects to run during the apply step. (In this case it's trying to re-announce an object we previously planned, but the logic error is still the same.)

I'll dig in to this now and see if I can find the root cause for this inconsistency. Thanks again!

@apparentlymart
Copy link
Member

Hi again, all!

We tracked down the cause of the output value panic now too, and the change will be in the v1.3 branch shortly and so will be included in the forthcoming v1.3.2 release.

Thanks again for reporting this, and for sharing the reproduction cases!

@AnitaErnszt
Copy link

Thanks for fixing the issue! Do you have an ETA on v1.3.2?

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug confirmed a Terraform Core team member has reproduced this issue core v1.3 Issues (primarily bugs) reported against v1.3 releases
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants