New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
second destroy (after successful first) returns 'The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many instances will be created.' #32126
Comments
While this appears to be a harmless error, the erroneously failing tf destroy command's exit status is 1 which makes our build/tests pipeline report failure form its last - global cleanup - step, despite having successfully destroyed all of the resources in the previous step. |
Hi @jaffel-lc! Thanks for reporting this. I think what's going on here is that Terraform creates a normal plan as a precursor to creating a destroy plan because a normal plan refreshes the previous run state and can therefore detect if something has already been destroyed which therefore doesn't need to be "re-destroyed". However, a normal plan also needs to expand all resource blocks that have repetition arguments, and so it can run into this problem if the repetition depends on something that hasn't been created yet, in this case because you've literally just destroyed it. If I'm right about the cause then the good news is that we changed the approach to that in #32051 for an unrelated reason, and so as of the next release Terraform will internally use a refresh-only plan for that initial refreshing step. Terraform didn't behave this way before just because the destroy feature has been around longer than the possibility of refresh-only plans, and so it was implemented in terms of the primitives that were available at the time. That change was backported into the v1.3 branch and so will be included in the forthcoming v1.3.4 release. Once that's out (which should be in the next week or so), could you give that a try and see if the problem still occurs? Thanks! |
That is great news. |
Tested. Now I am getting several "invalid index" errors. Here is one example:
[updated error formatting for readability] |
Thanks @jaffel-lc! It seems that Terraform is noticing that there are now no instances of I would agree that this doesn't seem right, but I'm also not really sure what Terraform ought to do instead here. It is true that there is not an index zero in the state, but there is still presumably an index zero declared in the configuration. It's weird to evaluate something in the configuration against the current state rather than the desired state, but in this context Terraform isn't actually building a desired state and so it can't refer to that. With that said then: it does seem like there's an opportunity to improve this case, but it's not clear to me exactly what change is valid to make here while allowing the refresh phase prior to destroy still work. We'll need to think more about that before deciding how to proceed here. In the meantime I think unfortunately the most viable strategy with today's Terraform is to somehow avoid running |
One thought is that an error caused by outputs could have a different exit value than one cause by e.g. syntax error, and then our automations could choose to treat that specific exit code as not-an-error. Another is that TF could not bother to lookup an output value if its originating resource is about to be destroyed, or does not exist, since the output will be cleared. We have a cleanup task that runs a second destroy as a workaround failures we have encountered when ASGs, ECS services, and/or ecsclusters do not complete their destroys before terraform times out waiting for them to complete. The second destroy usually clears the reseources, or manages to resync the state file. |
But it occurs to me now that I could wrap the resource[0].name reference in a try() call. |
I'm going to take a look into this because it's very similar to some related destroy time errors. The problem generally arises while attempting to refresh the instances, which during destroy is primarily used to ensure providers have the most recent values for their configuration, and remove any instances which may have already been deleted. In the meantime, a better workaround maybe to use |
This has also hit us too on 1.3.4, downgrading to 1.2.9 appears to work for now, fingers crossed for a proper fix soon! 😄 Interestingly I found that downgrading to 1.3.2 initially appeared to fix the issue for one case I was seeing but then it failed in other cases whereas 1.2.9 seems to work properly. |
Spent half of my day today investigating our integration test suite failing with the following error after upgrading from
trying to hunt down the issue and now I see the culprit is indeed in the destroy fixes in the |
Hi, In my case the resource that triggers the fault is https://github.com/terraform-aws-modules/terraform-aws-sqs/blob/cf30bb3498d39969590e4d47bbce56b02f1dc9a5/main.tf#L30. This is the line that fails: This is only broken in cheers, |
In my case, even the first destroy fails, if the root modules uses a child module that uses a data source and In our use-case we always destroy without refreshing to avoid data source failures and using the |
@sbocinec, Thanks, that would be a different issue, and is unlikely to be affected by any fix to this one. The current problem happens during the pre-destroy refresh which is skipped with |
Happy to join your ranks fellows. Like @sbocinec I've been troubleshooting this issue for days now. I was first focusing on the changes I had made in forks of the modules I'm using from my root module (eks_blueprints, terraform-aws-eks, eks_blueprints_kubernetes_addons). After I fixed a few unrelated bugs, I started seeing the 'the collection has no elements' error pattern at destroy-time, which i fixed by applying this pattern to each case that popped up, whack-a-mole style:
After "hardening" all the direct references to the first element of variable-length resource lists in my modules, I then started seeing the same pattern in the outer modules I hadn't made any changes to. Finally convinced "It's you, not me" I was able to find this issue. Happy to be here. :-) What can I do to help? I've inferred from the comments of @apparentlymart and @jbardin that this "variable-length list hardening" should be unnecessary, and that this misbehavior is caused by the new "destroy-time refresh-only plan" feature in 1.3.4. For now I will try @jbardin's recommendation of passing Nota BeneTerraform Destroy SequenceIn my case, and many other people I've seen hitting this issue who are using eks_blueprints module, there is not a single Same Misbehavior Reported in EKS BlueprintsThis failure mode is being tracked here for eks_blueprints. Same Misbehavior Reported in terraform-aws-eksHere is a closed 3 year old issue in terraform-aws-eks which I thought the core devs/contributors @apparentlymart @jbardin may find interesting, since it's the same misbehavior. The changeset that resolved that issue is where I got the idea for my "variable-length list hardening" patch above. Module Forks/Branches Used to Test my PatchesMy root module currently uses this fork/branch of eks_blueprints which originally was created to add support for crossplane helm and terraform providers. I had validated all it's functionality and was about to submit a PR when I started noticing "collection has no elements" errors. (I upgraded to 1.3.4 somewhere in that process.) My eks_blueprints fork |
@jbardin in my first test using However :-( there remains a problem with not refreshing during destroy: after successfully completing my destroy sequence, data sources are not destroyed and remain in the state. I've tried to work around this by appending YADS (Yet Another Destroy Stage) to the end of my sequence to run a normal, non-targeted, with-refresh,
The complete destroy sequence I'm using is:
|
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Terraform Version
Terraform Configuration Files
Debug Output
https://gist.github.com/jaffel-lc/78be590f8fddae0426adbffba844f374
Expected Behavior
second destroy should complete without error (just like the first) and without destroying anything.
Actual Behavior
Steps to Reproduce
Additional Context
No response
References
No response
The text was updated successfully, but these errors were encountered: