Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helper/schema: Schema.DiffSuppressOnRefresh #882

Merged
merged 1 commit into from Feb 17, 2022
Merged

Conversation

apparentlymart
Copy link
Member

@apparentlymart apparentlymart commented Feb 4, 2022

Unfortunately in the original design of DiffSuppressFunc we made it work only during planning, and so the normalization vs. drift classification it does isn't honored in any other situation.

Unilaterally enabling it for both refresh and apply would be too risky at this late stage where so many existing provider behaviors rely on these subtle details, but here we introduce a more surgical opt-in mechanism whereby specific attributes can be set to also check the Refresh result against DiffSuppressFunc, and so avoid reporting any normalization done by the remote system as if it were drift.

    DiffSuppressFunc: /* as before, unmodified */,
    DiffSuppressOnRefresh: true, // enables the new behavior

This behavior will be primarily useful for strings containing complex data serialized into some particular microsyntax which allows expressing the same information multiple ways. In particular, existing situations where we classify whitespace changes and other such immaterial changes in JSON values as normalization vs. drift would be a good candidate to enable this flag, so that we can get the same results during refreshing and thus avoid churning any downstream resources that refer to the
unexpectedly-updated result.

This deals with only the most visible problem previously discussed over in hashicorp/terraform-plugin-framework#70 -- creating value churn during the refresh phase -- because the risk of a more systematic change to the legacy SDK at this point is too high. Hopefully eventually the new framework will address this more completely (also handling it during the apply step, for example), but I intend this PR as a pragmatic way to help provider developers more easily resolve bug reports related to incorrect classification of normalization by relying on the DiffSuppressFunc implementations already written against this old SDK, so they can be addressed with only minimal schema changes and in particular without switching to the new framework.

My primary goal here is that this should cause absolutely no observable difference in result for any existing schema that doesn't set this new flag, and so the presence of this new code would have no effect on any existing provider unless they also opt in a particular attribute to it. I tried by best to code defensively to avoid e.g. introducing new panics in weird edge cases, but this legacy SDK does have a fair number of weird edge cases and so if you can spot one I didn't think of please let me know!

Although not exactly what was requested there, I think this closes #801 by providing a way to explicitly opt in to the requested behavior, even though it can't be the default behavior.

@apparentlymart apparentlymart requested a review from a team as a code owner February 4, 2022 03:24
@apparentlymart apparentlymart self-assigned this Feb 4, 2022
@apparentlymart
Copy link
Member Author

This lint rule about not naming a variable new is catching cases where I'd followed some existing precedent elsewhere, so I'm going to leave it for now but I'd be happy to update my new additions here to use a different name if maintainers consider it more important to pass lint for new code than to be consistent with existing code.

@detro
Copy link
Contributor

detro commented Feb 4, 2022

I have actually experienced this issue (#801) myself, with the same kind of resource: AWS policies.

So, a way for the provider to deal with this would be lovely.

Copy link
Contributor

@detro detro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from me asking questions that I then self-answered (and please, let me know if I'm missing something or some nuance), I think this looks good (thank you for the long description - that context helped me a lot).

I guess the only thing to deal with are the linter issues, but they should be very easy to address.

helper/schema/schema.go Outdated Show resolved Hide resolved
continue // no schema? weird, but not our responsibility to handle
}
schema := schemaList[len(schemaList)-1]
if !schema.DiffSuppressOnRefresh || schema.DiffSuppressFunc == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why isn't the branching if DiffSuppressOnRefresh handled here instead of, say, at the beginning of this method OR before this method is even called?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I think I can answer my own question: this is to ensure this logic is applied selectively to sub-attributes of a schema. Even if most of it doesn't use it, if there is even just 1 attribute that does, we want to check it for each.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unfortunately I think we do need to visit everything here to make sure none of them are in need of this work.

An additional resource-level opt-in could potentially mitigate that but it seems like an annoying and unprecedented extra step, and in practice most (though certainly not all) resource types have only tens of attributes and so I don't expect this to be prohibitively expensive in the common case.

Copy link
Contributor

@detro detro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@bflad bflad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for the delayed review on my part since my attention has been elsewhere the last few days.

In general, I think this is likely the best approach to take here given the current compatibility promises we need to make, while offering the most flexibility. Creating a separate ResourceData "Set with DiffSuppressFunc" method or "Set with Options" method and having this as an available option is likely a non-starter given the level of effort and potential provider developer confusion.

Most of my comments here are more around making the functionality more debuggable and understandable for provider developers, who may not be familiar with certain terminology that the SDK has hidden away.

We should likely setup a separate terraform-provider-corner case that verifies this new functionality in the unlikely event there are missing nuances in the SDK logic that would prevent it from working with real world providers. Worst case, it could be used just to ensure we maintain backwards compatibility should we need to change this area of functionality in the future. That can be done outside this pull request though, since it involves a separate repository and a dance of merges to properly become a usable integration test.

helper/schema/schema.go Outdated Show resolved Hide resolved
helper/schema/schema.go Outdated Show resolved Hide resolved
@@ -638,6 +638,7 @@ func (r *Resource) RefreshWithoutUpgrade(
state = nil
}

schemaMap(r.Schema).handleDiffSuppressOnRefresh(s, state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: It may be beneficial to offer this as a ResourceData method on top of the schema field implementation, so provider developers can opt into setting this once per resource (or potentially across the provider, if we did some other implementation work). Not saying anything different needs to be done here now, unless we think that is something we want to consider upfront in terms of implementation and doing it outside of schemaMap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did originally consider making this a whole-resource-type-level settable, but decided to go more surgical so that it would be less risky to enable for a long-standing existing resource type implementation.

I'm not sure if I follow exactly what you mean by a ResourceData method here, but perhaps my comment on the other question above addresses why it's considerably harder to get a good result for this at the ResourceData/FieldWriter layer and why I ultimately elected to make it a postprocessing step. If you're thinking of something different than what I described there then I'm happy to consider another approach!

helper/schema/schema.go Outdated Show resolved Hide resolved
helper/schema/schema.go Outdated Show resolved Hide resolved
helper/schema/schema.go Show resolved Hide resolved
@apparentlymart
Copy link
Member Author

apparentlymart commented Feb 8, 2022

(I'm hoisting this design note out of one of the resolved review conversation threads from above, so it's easier to see if we look back on this later to understand why it's implemented this way.)

I originally intended to make ResourceData.Set be the one to handle ignoring writes that don't materially change the existing value.

There were two big challenges with doing this on Set which caused me to abandon that direction in favor of this one:

  • DiffSuppressFunc's signature is designed to deal with already-flatmapped data, because that's what's in the legacy diff data structure it was originally built to filter. ResourceData is built around the "field reader" abstraction which works with arbitrary interface{} and so I found myself having to essentially reimplement the flatmap functionality at that layer in order to get suitable inputs for the DiffSuppressFunc call.
  • The ResourceData.Set API allows (and, in a lot of cases, requires) setting an entire nested data structure in a single call, which means visiting every leaf value in that data structure to decide whether to include it or not, and no part of the call stack in that area has all of the context necessary to make that decision.

While doing it after the fact against the InstanceState object is non-ideal, it benefits from more closely matching how the Diff method works with the InstanceDiff object as a post-processing step, and so it's a better fit for the existing DiffSuppressFunc design.

Copy link
Member

@bflad bflad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for working on this, @apparentlymart, looks good to me from a code perspective (just one small note about the terraform-plugin-log usage). Super excited that we can get a opt-in solution in place for this framework. 🚀 If you could also add a changelog entry, that would be great.

e.g. Adding a .changelog/882.txt file with something like:

```release-note:note
helper/schema: The `Schema` type `DiffSuppressOnRefresh` field can be enabled to check `DiffSuppressFunc` for normalization changes during refresh, similar to planning. This can prevent unintended "Values changed outside of Terraform" messages. For many attributes, this is likely desirable and expected behavior when implementing `DiffSuppressFunc`, however it is opt-in for backwards compatibility reasons.
```

```release-note:enhancement
helper/schema: Added the `DiffSuppressOnRefresh` field to the `Schema` type
```

@@ -985,6 +1024,7 @@ func (m schemaMap) diff(
continue
}

log.Printf("[DEBUG] ignoring change of %q due to DiffSuppressFunc", attrK)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for adding this.

}

if schema.DiffSuppressFunc(k, oldV, newV, d) {
tfsdklog.Debug(ctx, "ignoring change of %q due to DiffSuppressFunc", k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quirk of the current terraform-plugin-log implementation and its logging functions such as tfsdklog.Debug() is that the final variadic argument is expected as key-value pairs rather than operating like the f suffix of other string format functions. There is nothing to catch an errant unpaired key implementation at compile-time. 😖 We'll either need to fmt.Sprintf() the value into the message parameter or switch it to a structured key-value pair:

tfsdklog.Debug(ctx, fmt.Sprintf("ignoring change of %q due to DiffSuppressFunc", k))
// or
tfsdklog.Debug(ctx, "ignoring change due to DiffSuppressFunc", "tf_attribute", k)

I'll take this as real world feedback to create an issue in terraform-plugin-log to enforce this at compile time (e.g. convert those key-value expectations into a ...map[string]interface{} function parameter instead of the variadic ...interface{}) or otherwise create static analysis tooling to warn of this type of issue. I think we'd shy away from a Debugf() variant as it would discourage the structured key-value pairs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, okay! Sorry... should've looked more closely for other examples before jumping in here.

I think I'm going to follow your first example here just because it'll then match with the other log-based one I added in the Diff function, and because it doesn't seem super valuable to be able to filter by this key and thus structured logging is not super important here.

Unfortunately in the original design of DiffSuppressFunc we made it work
only during planning, and so the normalization vs. drift classification
it does isn't honored in any other situation.

Unilaterally enabling it for both refresh and apply would be too risky at
this late stage where so many existing provider behaviors rely on these
subtle details, but here we introduce a more surgical _opt-in_ mechanism
whereby specific attributes can be set to also check the Refresh result
against DiffSuppressFunc, and so avoid reporting any normalization done
by the remote system as if it were drift.

This behavior will be primarily useful for strings containing complex data
serialized into some particular microsyntax which allows expressing the
same information multiple ways. In particular, existing situations where
we classify whitespace changes and other such immaterial changes in JSON
values as normalization vs. drift would be a good candidate to enable
this flag, so that we can get the same results during refreshing and thus
avoid churning any downstream resources that refer to the
unexpectedly-updated result.
@apparentlymart
Copy link
Member Author

Oh whoops 🤦‍♂️ I just noticed that I didn't actually push up my updated commit after I addressed the previous round of feedback. Sorry about that! I think I've now belatedly addressed everything we discussed so far. 🤞

@apparentlymart
Copy link
Member Author

For what it's worth, I have made a small attempt to add a terraform-provider-corner test case that exercises this today, but I can't figure out how to actually describe the situation because the test harness is built for testing only final states and doesn't offer any way to make assertions about the refresh or planning steps alone.

What I've attempted is to add some normalization to the domain part of the email address in the fake "user" object, to observe that normalization being misclassified as drift and then cascading downstream during refresh. I can see it fail before I add the DiffSuppressFunc, but I can't see any way to distinguish between the previous DiffSuppressFunc behavior and the new DiffSuppressOnRefresh: true behavior, because by the time the Check function is running the results from ReadResource and PlanResourceChange are long forgotten and we only have access to the ApplyResourceChange result.

I've run out of time to spend on that for now, so I'm afraid I won't be able to contribute a corner-provider test for this, for now at least. 😖

Copy link
Member

@bflad bflad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks good to me and going to pull this in. Thanks so much, @apparentlymart 🚀

Aside: A design goal of our next generation of provider-level acceptance testing would be to offer multiple layers of abstraction during testing. Theoretically, this would include low level functionality for choosing individual Terraform CLI commands to execute and inspecting any available Terraform CLI data, if necessary. Once that's available, we can convert terraform-provider-corner over to it and hopefully be able to test many more of these types of scenarios that the current acceptance testing framework cannot handle due to its solely high level abstractions.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DiffSuppressFunc should be used for "Objects have changed outside of Terraform" logic
3 participants