New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"panic: duplicate checkable objects report" in terraform apply #31846
Comments
Seeing similar issue; Version
Providers provider "registry.terraform.io/hashicorp/azuread" {
version = "2.28.1"
}
provider "registry.terraform.io/hashicorp/azurerm" {
version = "3.23.0"
}
provider "registry.terraform.io/hashicorp/random" {
version = "3.4.3"
} Stack trace:
|
We are also seeing this
This crashing is happening without making any changes to our configuration from what successfully works with |
Seems to be coming from the new checks introduced in 9dea198#diff-cb64485c666c1dfa5eccfd55aee6959f8c133a440107e2456f9511328bb68b68R29. We suspect that it has something to do with the existing state file since very similar root modules don't have the same experience with this crash, but even looking at the raw state file, we aren't seeing anything that looks suspicious. |
We are still unable to provide sample code that reproduces this issue, but so far, we have seen is multiple places with |
Edit: it's apparently not the case for everyone. |
Most of our usages are also with nested modules, but we do have some that are occurring with single layer modules paths (eg |
Seeing this as well. For us, it happened during a terraform plan where we used |
Hi all! Are you any of you able to share a The first of these is indeed coming from the new checks implementation (which is a generalization of what we added for preconditions and postconditions in v1.1) but this is supposed to be an unreachable codepath so clearly there's something different about all of your configurations that our test cases don't cover. I understand that it's hard to produce a small reproduction configuration here -- reducing it is probably removing whatever special situation is allowing this to happen 😖 -- but in the absence of a configuration we may be able to understand from the series of actions in the trace log why exactly this occurred. What I can tell from just the stack trace is that for some reason the module "expansion" logic (which handles The second panic here, from |
Hey @apparentlymart, thanks a lot for the quick attention. Here's a gist from a targeted plan that generates the crash. Please LMK when you've had a chance to copy it so I can delete it. We are also happy to hop on a call and debug in a less public forum where we don't have to worry about sanitizing as much. I've previously chatted with Korinne via email and @mhaggerty042 also has my contact info. |
Thanks @joe-a-t! I've taken a copy of your log, so you can delete it now if you wish. For anyone on our team who wants to see the log once it's been deleted: contact me via one of our internal communication channels and I'll tell you where I put this. 😀 |
Okay, I think I see what's going on here, but I've run out of time today so I'm just going to record what I know and then I or someone else will dig in some more and try to produce a smaller repro on another day. The problem reflected in the trace log is that resources inside modules actually "expand" twice: first to handle the The call that's creating this crash is currently happening at the wrong level: it's being reported inside the second level of expansion, and so it ends up being called once for each instance of the module it's inside rather than just once to represent the static The original issue's stack trace is instead originating in an output value graph node, so this explanation doesn't suit that stack trace. So far it seems like there are two independent bugs here, and so we should be careful not to consider it solved just by addressing the situation that's covered by the trace log. We'll need to separately find out what's causing the output value situation and address that one too before this is fully resolved. |
I have a minimal reproduction for the resource variant of this problem.
module "child" {
source = "./child"
count = 2 # Must be 2 or more to reproduce
}
locals {
two = 2
}
resource "null_resource" "test" {
lifecycle {
# At least one resource in the child module must have a precondition
# or postcondition to activate the bug. This condition won't actually
# be checked because Terraform crashes before evaluating it.
postcondition {
condition = (1 == local.two)
error_message = "Forced condition failure"
}
}
} As noted in the comments, it appears that the two requirements to reproduce are:
The crash occurs because Terraform Core is incorrectly trying to capture the information about what instances of the resource exist separately for each module instance, rather than just once with all of the module instances and all of the resource instances taken into account together.
|
Can confirm, that appears to be the case in all of the instances where we are seeing this crash. |
I've also confirmed that the output value crash seems to have a different cause. I wasn't able to reproduce it by just trivially changing my previous example to include an output value instead of a resource. I'm going to focus on the resource variant for the moment since we already have a good reproduction case for it, but we'll need to also track down the output value variant and solve that before this is truly fixed. If anyone can share a GitHub Gist with a |
Thanks again for the trace log for the resource-related variant of this bug! I have a proposed fix over in #31860 so hopefully we'll get that particular one addressed soon; it seems like that is what most of you in this discussion are seeing. I've not yet developed any leads on how to reproduce the variant where the address in the panic message is for an output value rather than a resource: output values don't support expansion ( It's clear from the panic stacktrace that somehow the output value @SimonG-Kontentai if you are able to produce a trace log by running with Thanks! |
Awesome, thanks a lot for the speed here. We are only seeing the resource related variant so unfortunately I cannot help with the output variant. Would it be possible to cut a new Terraform release as soon as your PR gets merged or very shortly afterwards since this is fixing a panic? |
We typically release Terraform at most weekly, so the earliest a fix would be available is this Wednesday if the PR is merged in time. If you are currently affected by this then I recommend staying on the most recent v1.2 release until a fixed version is available. |
The resource-related problem is now fixed in the v1.3 branch ready to be included in the forthcoming v1.3.1 release. I've still not been able to reproduce the output-related version of this, so we don't yet have a proposed fix for that. If anyone has an example of it and is able to share a trace log and/or an example configuration, please let us know. Thanks! |
Since the resource-related variant of this should be fixed in the forthcoming v1.3.1 release, this issue now represents only the output value variant that originally opened it. However, we don't yet have any leads on how to reproduce it in order to determine the cause and develop a fix, so I'm going to remove this from the v1.3.1 milestone to represent that it should not block the v1.3.1 release. Once we're able to gather enough information to transition this to the "explained" state I'll target it to the next available milestone assuming that this happens soon enough that we're still using milestones for tracking. (We typically use milestones only around x.y.0 and early patch releases where there are often many different streams of work all trying to land at once; in later patch releases it tends to just become a small trickle of bug fixes and so the overhead of milestone tracking isn't worth it.) |
Hello @apparentlymart main.tf
module/main.tf
Debug Output
|
Thanks @Vince-Chenal! I was able to reproduce the crash with the configuration you shared. I also see from your full output why I wasn't able to reproduce this before: I was testing only with This particular variant crashes only during the apply phase, which actually makes more sense in retrospect because the apply phase is supposed to be announcing check results for the checkable objects that were planned during the planning phase, and not announcing new objects to run during the apply step. (In this case it's trying to re-announce an object we previously planned, but the logic error is still the same.) I'll dig in to this now and see if I can find the root cause for this inconsistency. Thanks again! |
Hi again, all! We tracked down the cause of the output value panic now too, and the change will be in the v1.3 branch shortly and so will be included in the forthcoming v1.3.2 release. Thanks again for reporting this, and for sharing the reproduction cases! |
Thanks for fixing the issue! Do you have an ETA on v1.3.2? |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Terraform Version
Terraform Configuration Files
Debug Output
Crash output in "Actual Behavior"
Expected Behavior
Everything is fine with Terraform 1.2.9
Actual Behavior
First crash problem
No changes. Your infrastructure matches the configuration.
Terraform has compared your real infrastructure against your configuration
and found no differences, so no changes are needed.
!!!!!!!!!!!!!!!!!!!!!!!!!!! TERRAFORM CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!
Terraform crashed! This is always indicative of a bug within Terraform.
Please report the crash with Terraform1 so that we can fix this.
When reporting bugs, please include your terraform version, the stack trace
shown below, and any additional information which may help replicate the issue.
!!!!!!!!!!!!!!!!!!!!!!!!!!! TERRAFORM CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!
duplicate checkable objects report for module.webhooks_api.module.location.output.keyvault_location
goroutine 138210 [running]:
runtime/debug.Stack()
/usr/local/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
/usr/local/go/src/runtime/debug/stack.go:16 +0x19
github.com/hashicorp/terraform/internal/logging.PanicHandler()
/home/circleci/project/project/internal/logging/panic.go:55 +0x153
panic({0x2c01360, 0xc030a317e0})
/usr/local/go/src/runtime/panic.go:884 +0x212
github.com/hashicorp/terraform/internal/checks.(*State).ReportCheckableObjects(0x2ebd400?, {0x3639ca8?, 0xc006502fc0}, 0xc003a69f50?)
/home/circleci/project/project/internal/checks/state_report.go:29 +0x45d
github.com/hashicorp/terraform/internal/terraform.(*nodeExpandOutput).DynamicExpand(0xc00b7c9100, {0x3650618, 0xc01197b500})
/home/circleci/project/project/internal/terraform/node_output.go:102 +0x2dd
github.com/hashicorp/terraform/internal/terraform.(*Graph).walk.func1({0x2ecdc00, 0xc00b7c9100})
/home/circleci/project/project/internal/terraform/graph.go:84 +0x4d4
github.com/hashicorp/terraform/internal/dag.(*Walker).walkVertex(0xc0063cad80, {0x2ecdc00, 0xc00b7c9100}, 0xc003c99540)
/home/circleci/project/project/internal/dag/walk.go:381 +0x2f6
created by github.com/hashicorp/terraform/internal/dag.(*Walker).Update
/home/circleci/project/project/internal/dag/walk.go:304 +0xf65
Second crash problem
Steps to Reproduce
The first problem occurs in the existing environment
Additional Context
Not sure if this is problem of new Terraform version or AzureRM is incompatible with new Terraform
References
No response
The text was updated successfully, but these errors were encountered: