Call PlanResourceChange for destroy operations #31179

jbardin · 2022-06-02T13:37:00Z

Terraform currently tries to plan all destroy operations offline, since in principal a destroy operation is always the same, setting the resource value to null. This however doesn't give providers the ability to verify the change, and skips their chance to inspect and modify the private data held in the Terraform state.

Here we add a call to PlanResourceChange in the destroy plan codepath, which will include a null value for the ProposedNewState, as well as the stored private data for the provider. The primary benefit here is for providers that can verify the change, and alert users the operation may fail before an apply is even started. Other diagnostics, including warnings may be added as well that can prove useful.

In concept this change is quite simple, and since the ProposedNewState is documented as being null for destroys, adding this call should produce no unexpected side effects. In practice, as we can see with the minimal mocks used in testing, it's possible some unknown providers may not be equipped to handle destroy plans. To account for this, a Capabilities field is added to the provider schema response in both protocol versions. Capabilities can be extended to indicate when a provider supports an optional feature or behavior which cannot be tested for directly. This means that providers will need to opt-in to use this feature, most likely via an updated version of the SDK. Since the SDK will need to be updated to allow access to the planning of destroy operations anyway, the need to upgrade is expected, so should pose no problems to adoption.

To simplify the core implementation, the negotiation of the PlanDestroy capability is done entirely within the grpc handlers, leaving the core codepath the same for all destroyPlan implementations. The capabilities are still exposed in the internal provider schema structure, and further extensions can still be accessed directly by core if the logic needs to reside there.

Fixes: #30140

bflad

Proposed protocol addition looks good to me conceptually and in the definition. The additional comment about gRPC message sizes looks good too. 👍

As I mentioned out of band, it'd be neat if we could get the protocol major/minor version in the protocol definition as well for logging on the provider side, but certainly doesn't need to be part of this change.

kmoe · 2022-07-06T15:23:43Z

docs/plugin-protocol/tfplugin5.3.proto

+    // supported protocol features. This is used to indicate availability of
+    // certain forward-compatible changes which may be optional in a major
+    // protocol version, but cannot be tested for directly.
+    message Capabilities {


We've discussed a capability framework a few times, including during development of terraform-plugin-framework (e.g. hashicorp/terraform-plugin-framework#85), and decided each time to handle this via protocol versioning, shims, etc.

Do we definitely need it now, and if so, does it merit a separate design to ensure it's useful for any possible future features?

it's possible some unknown providers may not be equipped to handle destroy plans

This is true for internal testing mocks but I'm curious to see providers in the wild, not using SDKv2 or plugin-framework, that would be broken by destroy plans. The existence of protocol v5 providers complicates this of course if we are not willing to declare this a breaking change and therefore protocol version 7.

The reason we added this was specifically because of terraform-plugin-go (and to a lesser extent, theoretical providers built directly on grpc). While both the legacy shims and the framework handle this case correctly, we couldn't rule out unknown providers which don't. Since capabilities are essentially simple flags, I'm not sure what other design you had in mind. The choice for boolean fields was somewhat arbitrary, but IMO does make the resulting client code easier to read than a series of enums which need to be iterated over individually, though I have no strong opposition if there's another reason to change the data structure.

The basis for using the overall style of protobuf+grpc was specifically for backwards compatible additions whenever possible. While the alternative here was a new method, PlanResourceDestroy, that seemed unnecessary for the small edge case of an unsupported provider, and broke from the consistency of the overall API.

I think our experience with protocol version 6 led us to conclude that we basically never want to have another major version of the protocol again, and that backward-compatible additions to protocol version 6 are going to be our main focus moving forward.

Granular capability negotiation like this feels like a well-trodden path with other similar protocols, but I will admit that I didn't interrogate that deeply and just suggested it because it was familiar to me as a solution to similar problems elsewhere.

While I can understand that it might feel quite drastic to introduce something entirely new here, I think in practice the cost of us being "wrong" about this is relatively low: if we never use this capability mechanism ever again then it will be slightly annoying to have this extra message type here in the protocol specification forever, but I don't think it will have any cross-cutting impact to any other codepaths and therefore we can safely ignore it if we find that it's never useful again in practice, or if we find a better way in future.

Putting these things together, a breaking change to the protocol now involves incrementing the minor version and adding a new Capabilities field. These can never be removed or changed. I agree with Martin that this is at worst annoying and that the general granular client/server capabilities design is well known.

apparentlymart · 2022-06-07T20:36:30Z

docs/plugin-protocol/tfplugin5.3.proto

+    message Capabilities {
+        // The plan_destroy capability signals that a provider expects a call
+        // to PlanResourceChange when a resource is going to be destroyed.
+        bool plan_destroy = 1;
+    }


I wonder if we should call this ServerCapabilities just in case we end up wanting to add a similar ClientCapabilities capabililties field to GetProviderSchema.Request in future in order for Terraform Core to announce to the provider that it supports something.

I don't have anything specific in mind right now but it's been my experience that capability negotiation systems like this often end up needing to be two-way at some point, e.g. so that the client can make a set of "offers" of something it supports in the request and then the server can choose zero or more of them to accept in the response.

ServerCapabilities sounds good to me, and leaves open the possibility for client, without making client/server specific flags within the block :D

apparentlymart · 2022-06-07T20:37:26Z

docs/plugin-protocol/tfplugin6.3.proto

+    message Capabilities {
+        // The plan_destroy capability signals that a provider expects a call
+        // to PlanResourceChange when a resource is going to be destroyed.
+        bool plan_destroy = 1;
+    }
+}


Same feedback as above about ServerCapabilities here, of course. 😀

apparentlymart · 2022-06-07T20:47:03Z

internal/plugin/grpc_provider.go

+
+	// If the provider doesn't support planning a destroy operation, we can
+	// return immediately.
+	if r.ProposedNewState.IsNull() && !capabilities.PlanDestroy {
 		return resp


Should we copy r.ProposedNewState into resp.PlannedState here, so that it'll end up being more similar to what would happen if we followed through and did a "real" PlanResourceChange below?

I think you may have commented on a stale or intermediate commit. PlannedState and PlannedPrivate are copied here.

I think I had some old draft comments lurking which got posted when I approved here. For some reason they looked like they were already posted for me until now. GitHub's UI confuses me. 😖

apparentlymart · 2022-07-06T17:15:59Z

docs/plugin-protocol/tfplugin5.3.proto

+    // supported protocol features. This is used to indicate availability of
+    // certain forward-compatible changes which may be optional in a major
+    // protocol version, but cannot be tested for directly.
+    message Capabilities {


I think our experience with protocol version 6 led us to conclude that we basically never want to have another major version of the protocol again, and that backward-compatible additions to protocol version 6 are going to be our main focus moving forward.

Granular capability negotiation like this feels like a well-trodden path with other similar protocols, but I will admit that I didn't interrogate that deeply and just suggested it because it was familiar to me as a solution to similar problems elsewhere.

While I can understand that it might feel quite drastic to introduce something entirely new here, I think in practice the cost of us being "wrong" about this is relatively low: if we never use this capability mechanism ever again then it will be slightly annoying to have this extra message type here in the protocol specification forever, but I don't think it will have any cross-cutting impact to any other codepaths and therefore we can safely ignore it if we find that it's never useful again in practice, or if we find a better way in future.

kmoe · 2022-07-06T17:37:42Z

docs/plugin-protocol/tfplugin5.3.proto

+    // supported protocol features. This is used to indicate availability of
+    // certain forward-compatible changes which may be optional in a major
+    // protocol version, but cannot be tested for directly.
+    message Capabilities {


Putting these things together, a breaking change to the protocol now involves incrementing the minor version and adding a new Capabilities field. These can never be removed or changed. I agree with Martin that this is at worst annoying and that the general granular client/server capabilities design is well known.

Call PlanResourceDestroy during a destroy plan. This allows providers two new abilities: - They can evaluate if the plan is valid, notifying users of any potential errors before an apply is started, which may not be able to complete. - They can inspect and modify their private data during a destroy plan just like they can with an other plan operation.

some of the minimal test provider implementations didn't check for null values.

This is most easily handled in the plugin code, without involving Terraform core. The biggest change here other than checking the PlanDestroy capability, is the removal of the schema helper methods in the plugins. With the addition of the capabilities field, combined with the necessity of checking diagnostics from the schema, the helpers have outlived their usefulness. Perhaps there's a better pattern for these repetitive calls, but for now there isn't too extra verbosity involved.

enable destroy planning for the simple providers used in the e2e tests

github-actions · 2022-07-06T17:57:26Z

Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch.

github-actions · 2022-08-06T02:31:22Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

jbardin changed the title ~~Plan Destroy operations~~ Call PlanResourceChange for destroy operations Jun 2, 2022

vercel bot deployed to Preview June 2, 2022 13:39 View deployment

jbardin force-pushed the jbardin/plan-destroy branch from baf069f to 9621abc Compare June 2, 2022 14:25

vercel bot deployed to Preview June 2, 2022 14:28 View deployment

vercel bot deployed to Preview June 3, 2022 14:06 View deployment

jbardin force-pushed the jbardin/plan-destroy branch from 6b18b3f to 6dc4866 Compare June 3, 2022 14:12

vercel bot deployed to Preview June 3, 2022 14:16 View deployment

jbardin requested review from bflad and a team June 3, 2022 14:19

jbardin self-assigned this Jun 3, 2022

jbardin marked this pull request as ready for review June 3, 2022 14:20

bflad approved these changes Jun 3, 2022

View reviewed changes

Base automatically changed from jbardin/plan-destroy-configure-provider to main June 13, 2022 13:05

jbardin force-pushed the jbardin/plan-destroy branch from 6dc4866 to 5e0ce12 Compare June 23, 2022 15:56

vercel bot deployed to Preview June 23, 2022 15:59 View deployment

jbardin force-pushed the jbardin/plan-destroy branch from 5e0ce12 to 7fb1a40 Compare June 29, 2022 14:08

vercel bot deployed to Preview June 29, 2022 14:11 View deployment

kmoe reviewed Jul 6, 2022

View reviewed changes

apparentlymart approved these changes Jul 6, 2022

View reviewed changes

kmoe approved these changes Jul 6, 2022

View reviewed changes

jbardin added 9 commits July 6, 2022 13:47

fix test mocks to behave when planning destroys

e95bfe6

add provider metas to destroy plan

acba115

fixup broken test fixtures

96c7205

some of the minimal test provider implementations didn't check for null values.

add test for planned private data in destroy

9487cfb

add Schema Capabilities to protocol

b9f1a5a

add e2e test with provider schema capabilities

fd742cd

enable destroy planning for the simple providers used in the e2e tests

s/Capabilities/ServerCapabilities/

26c569e

jbardin force-pushed the jbardin/plan-destroy branch from 7fb1a40 to 26c569e Compare July 6, 2022 17:47

vercel bot deployed to Preview July 6, 2022 17:51 View deployment

jbardin merged commit 1fba244 into main Jul 6, 2022

jbardin deleted the jbardin/plan-destroy branch July 6, 2022 17:57

bflad mentioned this pull request Jul 6, 2022

Implement Protocol Version 5.3 and 6.3 (ServerCapabilities and PlanResourceChange on Destroy) hashicorp/terraform-plugin-go#204

Closed

github-actions bot locked as resolved and limited conversation to collaborators Aug 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call PlanResourceChange for destroy operations #31179

Call PlanResourceChange for destroy operations #31179

jbardin commented Jun 2, 2022 •

edited

bflad left a comment

kmoe Jul 6, 2022

jbardin Jul 6, 2022

apparentlymart Jul 6, 2022

kmoe Jul 6, 2022

apparentlymart Jun 7, 2022

jbardin Jul 6, 2022

apparentlymart Jun 7, 2022

apparentlymart Jun 7, 2022

jbardin Jul 6, 2022

apparentlymart Jul 6, 2022

apparentlymart Jul 6, 2022

kmoe Jul 6, 2022

github-actions bot commented Jul 6, 2022

github-actions bot commented Aug 6, 2022

Call PlanResourceChange for destroy operations #31179

Call PlanResourceChange for destroy operations #31179

Conversation

jbardin commented Jun 2, 2022 • edited

bflad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jul 6, 2022

github-actions bot commented Aug 6, 2022

jbardin commented Jun 2, 2022 •

edited