Backport of core: Don't try to register separate checkable resources for each module instance into v1.3 #31875
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport
This PR is auto-generated from #31860 to be assessed for backporting due to the inclusion of the label 1.3-backport.
The below text is copied from the body of the original PR.
Over in #31846 we learned that the logic for registering the "checkable object instances" for resources with custom condition checks was running once per containing module instance, rather than once total to cover all instances of a resource across all module instances.
The checks infrastructure expects to be told the instances for each checkable object exactly once so that we can ensure it always either has complete information or no information. This is part of how we guarantee that we'll either always return the full set of instances for a checkable object or mark the check as "error" so that check-presenting UIs can produce a consistent set of instances on each run.
Unfortunately our previous approach to handling the two-level expansion of resources inside modules was to use two levels of
DynamicExpand
, which is a special inversion-of-control technique we use as part of the concurrent graph walk. That meant that there was no place in the code which had access to all of the instances of a particular resource at once.It turns out that the double-
DynamicExpand
isn't really necessary here, and was largely just a convenience when we were implementing multi-instance modules to avoid disrupting the existing resource expansion logic too much. The module expansion logic is all just local computation anyway, and it was holding a lock on the state for its duration, so it was getting no material benefit from being called concurrently by the graph walker.To correct the bug, this reworks the logic to just be one big DynamicExpand that uses normal straight-through code to gather all of the instances of a resource together into a dynamic subgraph, and register them with the checks system. This makes the control flow clearer while also ensuring that by the end of this function we know exactly which set of resource instances we're expecting.
I had originally intended to find a less invasive fix for this for backporting into the v1.3 branch to fix the bug, but the graph walker inversion of control meant that before there was no reasonable correct place to call
ReportCheckableObjects
objects from.To minimize the risk of doing such refactoring I made an effort to modify the actual logic as little as possible: what were previously the
Execute
andDynamicExpand
methods ofNodePlannableResource
are nownodeExpandPlannableResource.expandResourceInstances
andnodeExpandPlannableResource.resourceInstanceSubgraph
respectively, with them chained together as direct method calls rather than indirectly chained using an intermediate dynamic subgraph.Moving some logic that was previously in an
Execute
method to instead be in aDynamicExpand
method exposed a logic error in the graph walker's handling ofDynamicExpand
which I corrected here in the initial commit.This refactoring did not regress any existing tests except the ones that were directly testing
NodePlannableResource
, which are now removed altogether because that graph node type no longer exists. I added a new test to cover the resource-related portion of #31846.Issue #31846 seems to actually be capturing two separate but related bugs, where the other is about output values. This PR does not attempt to address the output value bug, since its root cause seems unrelated and so I intend to tackle it as a separate PR.