core: Don't try to register separate checkable resources for each module instance #31860

apparentlymart · 2022-09-23T22:20:29Z

Over in #31846 we learned that the logic for registering the "checkable object instances" for resources with custom condition checks was running once per containing module instance, rather than once total to cover all instances of a resource across all module instances.

The checks infrastructure expects to be told the instances for each checkable object exactly once so that we can ensure it always either has complete information or no information. This is part of how we guarantee that we'll either always return the full set of instances for a checkable object or mark the check as "error" so that check-presenting UIs can produce a consistent set of instances on each run.

Unfortunately our previous approach to handling the two-level expansion of resources inside modules was to use two levels of DynamicExpand, which is a special inversion-of-control technique we use as part of the concurrent graph walk. That meant that there was no place in the code which had access to all of the instances of a particular resource at once.

It turns out that the double-DynamicExpand isn't really necessary here, and was largely just a convenience when we were implementing multi-instance modules to avoid disrupting the existing resource expansion logic too much. The module expansion logic is all just local computation anyway, and it was holding a lock on the state for its duration, so it was getting no material benefit from being called concurrently by the graph walker.

To correct the bug, this reworks the logic to just be one big DynamicExpand that uses normal straight-through code to gather all of the instances of a resource together into a dynamic subgraph, and register them with the checks system. This makes the control flow clearer while also ensuring that by the end of this function we know exactly which set of resource instances we're expecting.

I had originally intended to find a less invasive fix for this for backporting into the v1.3 branch to fix the bug, but the graph walker inversion of control meant that before there was no reasonable correct place to call ReportCheckableObjects objects from.

To minimize the risk of doing such refactoring I made an effort to modify the actual logic as little as possible: what were previously the Execute and DynamicExpand methods of NodePlannableResource are now nodeExpandPlannableResource.expandResourceInstances and nodeExpandPlannableResource.resourceInstanceSubgraph respectively, with them chained together as direct method calls rather than indirectly chained using an intermediate dynamic subgraph.

Moving some logic that was previously in an Execute method to instead be in a DynamicExpand method exposed a logic error in the graph walker's handling of DynamicExpand which I corrected here in the initial commit.

This refactoring did not regress any existing tests except the ones that were directly testing NodePlannableResource, which are now removed altogether because that graph node type no longer exists. I added a new test to cover the resource-related portion of #31846.

Issue #31846 seems to actually be capturing two separate but related bugs, where the other is about output values. This PR does not attempt to address the output value bug, since its root cause seems unrelated and so I intend to tackle it as a separate PR.

We were previously _trying_ to handle diagnostics here but were not quite doing it right because we were testing whether the resulting error was nil rather than appending it to the diagnostics and then seeing if the result has errors. The difference here is important because it allows DynamicExpand to return warnings without associated errors when needed. Previously the graph walker would treat a warnings-only result as if it were an error. Ideally we'd change DynamicExpand to return diagnostics directly, but we previously decided against that because there were so many implementors to update, and my intent for this change is to be surgical in the update so we minimize risk of backporting the change into patch releases.

We previously did two levels of DynamicExpand to go from ConfigResource to AbsResource and then from AbsResource to AbsResourceInstance. We'll now do the full expansion from ConfigResource to AbsResourceInstance in a single DynamicExpand step inside nodeExpandPlannableResource. The new approach is essentially functionally equivalent to the old except that it fixes a bug in the previous implementation: we will now call checkState.ReportCheckableObjects only once for the entire set of instances for a particular resource, which is what the checkable objects infrastructure expects so that it can always mention all of the checkable objects in the check report even if we bail out partway through due to a downstream error. This is essentially the same code but now turned into additional methods on nodeExpandPlannableResource instead of having the extra graph node type. This has the further advantage of this now being straight-through code with standard control flow, instead of the unusual inversion of control we were doing before bouncing in and out of different Execute and DynamicExpand implementations to get this done.

jbardin · 2022-09-26T18:22:39Z

internal/terraform/node_resource_plan.go


-func (n *NodePlannableResource) Name() string {
-	return n.Addr.String()
+	return &g, diags.ErrWithWarnings()


Because the process here combines multiple distinct graphs, the graph here often won't have a single root. The single root probably doesn't matter in practice, that is vestigial from very early Terraform, but it does show that we're not actually validating the graph return from DynamicExpand. These have always been very simple and had no way to introduce cycles, but maybe since you're already touching the DynamicExpand call site it might be worth mentioning.

Hmm indeed... I typically think of these DynamicExpand graphs as just an over-engineered way to iterate over the multiple expanded items, but you're right that the surrounding infrastructure at least thinks it's dealing with single-rooted acyclic graphs even though these graphs never contain any real dependency edges in practice.

I'm going to first see if there's a straightforward way to add an additional root node just to maintain the contract, but if that isn't easy and low-risk then I'll find a suitable place to add a comment describing this hazard.

Yeah, these have definitely been reduced to an obfuscated iteration, and in other cases we've removed the graph builder entirely. It does still have to get fed into the graph walker for concurrent operations, but maybe in the future we get rid of these last few specialized DynamicExpand graph transformers.

While investigating this I noticed two interesting things:

Because the test case I added here has only one instance of one resource in the module, the root transformer didn't add an extra root node anyway because that single resource instance was already the root.

After adding another resource to force the addition of a root, I noticed that we do actually still end up with only one root because all of the root nodes are equal to one another as currently implemented: the node type is an empty struct and we use a value of that type (rather than a pointer) as the node. Therefore it naturally coalesces together into a single node anyway.

With those two observations in mind, I'm going to add a comment to the RootTransformer that records that all root node values being equal to each other is part of its contract and leave it at that for now. If we want to do something more involved later on then of course we can, but the current code is generating a valid graph already and I don't want to add more complexity to it just for a bug fix.

If there are multiple levels of expansion, you still end up with multiple roots (I verified by calling Validate on the expanded subgraphs, which would error with "multiple roots"). I'm not saying this is necessarily something which needs to be fixed here, but we definitely don't always get a single root.

Hmm interesting... I wonder why that isn't reproducing on my end. It's working okay for me with the test case updated to generate multiple instances in each module. 🤔

I'll poke at it some more and see if I can get it to fail in the way you're describing or explain why it doesn't before merging.

Ahh, I see what the problem is: it's actually related to the first observation I made above.

If there is only one resource instance in a particular module instance then RootTransformer won't insert a new root node at all, because there's already only one node in the graph. But that node isn't a root node so when we subsume the graphs all together we end up with just a pile of resource instances and no root, which therefore fails validation.

I managed to defeat my own reproduction by already having changed the test to generate multiple resource instances per resource before I tried calling .Validate on the result, and therefore masked the problem. 🤦

It does seem to work regardless and so I'm going to go ahead with this change as a minimal v1.3 backport but I'll open a follow-up PR (intended only for the main branch, not backporting) to actually validate the DynamicExpand results and then make sure it does indeed produce valid graphs in all cases.

We use a non-pointer value for this particular node, which means that there can never be two root nodes in the same graph: the graph implementation will just coalesce them together when a second one is added. Our resource expansion code is relying on that coalescing so that it can subsume together multiple graphs for different modules instances into a single mega-graph with all instances across all module instances, with any root nodes coalescing together to produce a single root. This also updates one of the context tests that exercises resource expansion so that it will generate multiple resource instance nodes per module and thus potentially have multiple roots to coalesce together. However, we aren't currently explicitly validating the return values from DynamicExpand and so this test doesn't actually fail if the coalescing doesn't happen. We may choose to validate the DynamicExpand result in a later commit in order to make it more obvious if future modifications fail to uphold this invariant.

eczy-tempus · 2022-09-26T20:30:12Z

This branch is confirmed working in several locations where we were previously seeing duplicate checkable objects report... errors for resources. Thank you for addressing this so quickly!

github-actions · 2022-09-26T20:46:52Z

Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch.

This change is required because Terraform v1.3.0 was affected by a bug that caused the module to crash Terraform (see hashicorp/terraform#31860 for more details). In addition, we now finally have a version constraint for the bucket module.

github-actions · 2022-10-27T02:36:29Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

apparentlymart added bug core 1.3-backport If you add this label to a PR before merging, backport-assistant will open a new PR once merged labels Sep 23, 2022

apparentlymart added this to the v1.3.1 milestone Sep 23, 2022

apparentlymart self-assigned this Sep 23, 2022

apparentlymart mentioned this pull request Sep 23, 2022

"panic: duplicate checkable objects report" in terraform apply #31846

Closed

apparentlymart force-pushed the b-check-resource-multi-expand branch from b78b246 to 5036853 Compare September 23, 2022 22:34

vercel bot deployed to Preview September 23, 2022 22:39 View deployment

jbardin reviewed Sep 26, 2022

View reviewed changes

jbardin approved these changes Sep 26, 2022

View reviewed changes

vercel bot deployed to Preview September 26, 2022 19:05 View deployment

apparentlymart merged commit cc964e6 into main Sep 26, 2022

teamterraform mentioned this pull request Sep 26, 2022

Backport of core: Don't try to register separate checkable resources for each module instance into v1.3 #31875

Merged

apparentlymart mentioned this pull request Sep 27, 2022

core: Simplify our idea of "root node" and require it for DynamicExpand #31880

Merged

mariuskiessling mentioned this pull request Sep 29, 2022

Require Terraform v1.3.1 or newer metro-digital/terraform-google-cf-projectcfg#42

Merged

github-actions bot locked as resolved and limited conversation to collaborators Oct 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: Don't try to register separate checkable resources for each module instance #31860

core: Don't try to register separate checkable resources for each module instance #31860

apparentlymart commented Sep 23, 2022 •

edited

jbardin Sep 26, 2022

apparentlymart Sep 26, 2022

jbardin Sep 26, 2022

apparentlymart Sep 26, 2022

jbardin Sep 26, 2022 •

edited

apparentlymart Sep 26, 2022

apparentlymart Sep 26, 2022

eczy-tempus commented Sep 26, 2022

github-actions bot commented Sep 26, 2022

github-actions bot commented Oct 27, 2022

core: Don't try to register separate checkable resources for each module instance #31860

core: Don't try to register separate checkable resources for each module instance #31860

Conversation

apparentlymart commented Sep 23, 2022 • edited

jbardin Sep 26, 2022

Choose a reason for hiding this comment

apparentlymart Sep 26, 2022

Choose a reason for hiding this comment

jbardin Sep 26, 2022

Choose a reason for hiding this comment

apparentlymart Sep 26, 2022

Choose a reason for hiding this comment

jbardin Sep 26, 2022 • edited

Choose a reason for hiding this comment

apparentlymart Sep 26, 2022

Choose a reason for hiding this comment

apparentlymart Sep 26, 2022

Choose a reason for hiding this comment

eczy-tempus commented Sep 26, 2022

github-actions bot commented Sep 26, 2022

github-actions bot commented Oct 27, 2022

apparentlymart commented Sep 23, 2022 •

edited

jbardin Sep 26, 2022 •

edited