Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra feedback for a variety of "move" consequences and edge-cases #29637

Merged
merged 5 commits into from Sep 23, 2021

Conversation

apparentlymart
Copy link
Member

Although in most cases Terraform's handling of the new moved blocks should be straightforward and intuitive, there are inevitably some less common cases where the behavior may be surprising enough to warrant further explanation.

This set of changes aims to add a first round of explanations of that sort, focused on the following situations:

  • Colliding moves, where two objects try to move to the same address in spite of our basic static validation rules.

    This can typically apply only when someone's previously used terraform state mv or terraform apply -target=... to create a situation that wouldn't be reachable through the standard workflow.

    However, there is at least one situation where it's reachable even in the standard workflow: if we need to resolve both module.foo moves to module.bar and aws_instance.a moves to module.bar.aws_instance.a then both of those moves would be considered structurally valid, but if our prior state already had an object at module.foo.aws_instance.a then the two aws_instance.a objects would conflict with one another to try to occupy the address module.bar.aws_instance.a.

    This is not a situation we expect folks to reach regularly, and so one of the changes here adds an additional warning message to the plan result in that unusual case, which is intended to help explain why Terraform is planning to destroy an object that the user might have expected to be moved to a new address which is still declared in configuration. We don't have enough information to derive a good sense of user intent in that situation, and so the warning just makes a general observation that you can use terraform state subcommands to make things less ambiguous if you're not satisfied with the resolution Terraform proposed.

  • Moving to an address that isn't declared in the configuration.

    The main expected use of moved blocks is to set the to argument to refer to an address which has a corresponding declaration in the configuration. In that case, the happy path is for Terraform to simply pretend that an existing object had originally been created at the new address and not propose any other changes at all.

    However, because moved blocks are a historical record that will (in most cases) remain in modules indefinitely, we do need to allow for a later version of the module possibly removing the declaration of that object. Our rules for moves are that we should always generate the same effect for two actions handled together in the same plan as we would for those actions separated into two runs, and so therefore we must tolerate moving into an undeclared address and plan to delete any object that moved into that undeclared address.

    An unfortunate consequence of this rule is that if an author makes a mistake when writing the to address then Terraform will typically propose to destroy the object. As a compromise then, the changes here add some additional context to a planned destroy to help draw attention to the mismatch between configuration and state.

    Although the motivation to do this now was to give better feedback about incorrect moved statements, adding this additional feedback for planned destroy has been on my wishlist for a while now, and so I implemented it in a generic way which works even in the normal case where there's no moving going on. I hope that this will be useful to help users disambiguate the several different reasons that can cause Terraform to declare a particular resource instance as an "orphan" (no longer declared in configuration) and thus figure out what they need to change in the configuration if they don't want to delete the object.

While I was in the area anyway, I also made some minor adjustments to how we present the extra note about a resource instance having moved, which we'd planned to do in an earlier PR but accidentally overlooked it.

The following example output shows a rather gnarly combination of these situations all together, just for illustrative purposes. In practice I would expect it to be extremely rare to see all of these appear together in the same proposed plan. (I created the conflict between null_resource.a[0] and null_resource.a through some malicious use of terraform state mv prior to requesting this plan.)

Screenshot of a particularly unfortunate combination of edge-cases, reproduced as plain text below

Terraform will perform the following actions:

  # null_resource.a[0] will be destroyed
  # (because resource does not use count)
  - resource "null_resource" "a" {
      - id = "6362033012182345903" -> null
    }

  # null_resource.d will be destroyed
  # (because null_resource.d is not in configuration)
  # (moved from null_resource.c)
  - resource "null_resource" "d" {
      - id = "7027155152011416384" -> null
    }

Plan: 0 to add, 0 to change, 2 to destroy.
╷
│ Warning: Unresolved resource instance address changes
│ 
│ Terraform tried to adjust resource instance addresses in the prior
│ state based on change information recorded in the configuration,
│ but some adjustments did not succeed due to existing objects
│ already at the intended addresses:
│   - null_resource.a[0] could not move to null_resource.a
│ 
│ Terraform has planned to destroy these objects. If Terraform's
│ proposed changes aren't appropriate, you must first resolve the
│ conflicts using the "terraform state" subcommands and then create a
│ new plan.
╵

Making the implicit zero-to-none and none-to-zero key moves more visible in the UI here drew attention to a quirk of my earlier work in #29605, in the situation of moving from for_each to count rather than for_each to no repetition. I also fixed that while here, because otherwise the reason given for proposing to delete the old object didn't actually make sense.

Our previous rule for implicitly moving from IntKey(0) to NoKey would
apply that move even when the current resource configuration uses
for_each, because we were only considering whether "count" were set.

Previously this was relatively harmless because the resource instance in
question would end up planned for deletion anyway: neither an IntKey nor
a NoKey are valid keys for for_each.

Now that we're going to be announcing these moves explicitly in the UI,
it would be confusing to see Terraform report that IntKey moved to NoKey
in a situation where the config changed from count to for_each, so to
address that we'll only generate the implied statement if neither
repetition argument is set.
There are a few different reasons why a resource instance tracked in the
prior state might be considered an "orphan", but previously we reported
them all identically in the planned changes.

In order to help users understand the reason for a surprising planned
delete, we'll now try to specify an additional reason for the planned
deletion, covering all of the main reasons why that could happen.

This commit only introduces the new detail to the plans.Changes result,
though it also incidentally exposes it as part of the JSON plan result
in order to keep that working without returning errors in these new
cases. We'll expose this information in the human-oriented UI output in
a subsequent commit.
The core runtime is now able to specify a reason for some situations when
Terraform plans to delete a resource instance.

This commit makes that information visible in the human-oriented UI. A
previous commit already made the underlying data informing these new hints
visible as part of the machine-oriented (JSON) plan output.

This also removes the bold formatting from the existing "has moved to"
hints, because subjectively it seemed like the result was emphasizing too
many parts of the output and thus somewhat defeating the benefit of the
emphasis in trying to create additional visual hierarchy for sighted users
running Terraform in a terminal. Now only the first line containing the
main action statement will be in bold, and all of the parenthesized
follow-up notes will be unformatted.
In most cases Terraform will be able to automatically fully resolve all
of the pending move statements before creating a plan, but there are some
edge cases where we can end up wanting to move one object to a location
where another object is already declared.

One relatively-obvious example is if someone uses "terraform state mv" in
order to create a set of resource instance bindings that could never have
arising in normal Terraform use.

A less obvious example arises from the interactions between moves at
different levels of granularity. If we are both moving a module to a new
address and moving a resource into an instance of the new module at the
same time, the old module might well have already had a resource of the
same name and so the resource move will be unresolvable.

In these situations Terraform will move the objects as far as possible,
but because it's never valid for a move "from" address to still be
declared in the configuration Terraform will inevitably always plan to
destroy the objects that didn't find a final home. To give some additional
explanation for that result, here we'll add a warning which describes
what happened.

This is not a particularly actionable warning because we don't really
have enough information to guess what the user intended, but we do at
least prompt that they might be able to use the "terraform state" family
of subcommands to repair the ambiguous situation before planning, if they
want a different result than what Terraform proposed.
Because our validation rules depend on some dynamic results produced by
actually running the plan, we deal with moves in a "backwards" order where
we try to apply them first -- ignoring anything strange we might find --
and then validate the original statements only after planning.

An unfortunate consequence of that approach is that when the move
statements are invalid it's likely that move execution will not fully
complete, and so the generated plan is likely to be incorrect and might
well include errors resulting from the unresolved moves.

To mitigate that, here we let any move validation errors supersede all
other diagnostics that the plan phase might've generated, in the hope that
it'll help the user focus on fixing the incorrect move statements without
creating confusing by reporting errors that only appeared as a quick of
how Terraform worked around the invalid move statements earlier.
@apparentlymart apparentlymart merged commit d97ef10 into main Sep 23, 2021
@apparentlymart apparentlymart deleted the f-moved-unusual-feedback branch September 23, 2021 21:37
// arguments. This difference in detail is out of pragmatism, because
// potentially multiple nested modules could all contribute conflicting
// specific reasons for a particular instance to no longer be declared.
ResourceInstanceDeleteBecauseNoModule ResourceInstanceChangeActionReason = 'M'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When adding new reasons here, I think we should also update the streaming JSON UI output to match:

const (
ReasonNone ChangeReason = ""
ReasonTainted ChangeReason = "tainted"
ReasonRequested ChangeReason = "requested"
ReasonCannotUpdate ChangeReason = "cannot_update"
ReasonUnknown ChangeReason = "unknown"
)
func changeReason(reason plans.ResourceInstanceChangeActionReason) ChangeReason {
switch reason {
case plans.ResourceInstanceChangeNoReason:
return ReasonNone
case plans.ResourceInstanceReplaceBecauseTainted:
return ReasonTainted
case plans.ResourceInstanceReplaceByRequest:
return ReasonRequested
case plans.ResourceInstanceReplaceBecauseCannotUpdate:
return ReasonCannotUpdate
default:
// This should never happen, but there's no good way to guarantee
// exhaustive handling of the enum, so a generic fall back is better
// than a misleading result or a panic
return ReasonUnknown
}
}

I'll submit a PR for these changes shortly. Can you think of any automated way to keep these two enums in sync for future changes? Maybe adding nishanths/exhaustive as a CI check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants