Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call PlanResourceChange for destroy operations #31179

Merged
merged 9 commits into from Jul 6, 2022
Merged

Conversation

jbardin
Copy link
Member

@jbardin jbardin commented Jun 2, 2022

Terraform currently tries to plan all destroy operations offline, since in principal a destroy operation is always the same, setting the resource value to null. This however doesn't give providers the ability to verify the change, and skips their chance to inspect and modify the private data held in the Terraform state.

Here we add a call to PlanResourceChange in the destroy plan codepath, which will include a null value for the ProposedNewState, as well as the stored private data for the provider. The primary benefit here is for providers that can verify the change, and alert users the operation may fail before an apply is even started. Other diagnostics, including warnings may be added as well that can prove useful.

In concept this change is quite simple, and since the ProposedNewState is documented as being null for destroys, adding this call should produce no unexpected side effects. In practice, as we can see with the minimal mocks used in testing, it's possible some unknown providers may not be equipped to handle destroy plans. To account for this, a Capabilities field is added to the provider schema response in both protocol versions. Capabilities can be extended to indicate when a provider supports an optional feature or behavior which cannot be tested for directly. This means that providers will need to opt-in to use this feature, most likely via an updated version of the SDK. Since the SDK will need to be updated to allow access to the planning of destroy operations anyway, the need to upgrade is expected, so should pose no problems to adoption.

To simplify the core implementation, the negotiation of the PlanDestroy capability is done entirely within the grpc handlers, leaving the core codepath the same for all destroyPlan implementations. The capabilities are still exposed in the internal provider schema structure, and further extensions can still be accessed directly by core if the logic needs to reside there.

Fixes: #30140

@jbardin jbardin changed the title Plan Destroy operations Call PlanResourceChange for destroy operations Jun 2, 2022
@jbardin jbardin requested review from bflad and a team June 3, 2022 14:19
@jbardin jbardin self-assigned this Jun 3, 2022
@jbardin jbardin marked this pull request as ready for review June 3, 2022 14:20
Copy link
Member

@bflad bflad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed protocol addition looks good to me conceptually and in the definition. The additional comment about gRPC message sizes looks good too. 👍

As I mentioned out of band, it'd be neat if we could get the protocol major/minor version in the protocol definition as well for logging on the provider side, but certainly doesn't need to be part of this change.

// supported protocol features. This is used to indicate availability of
// certain forward-compatible changes which may be optional in a major
// protocol version, but cannot be tested for directly.
message Capabilities {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed a capability framework a few times, including during development of terraform-plugin-framework (e.g. hashicorp/terraform-plugin-framework#85), and decided each time to handle this via protocol versioning, shims, etc.

Do we definitely need it now, and if so, does it merit a separate design to ensure it's useful for any possible future features?

it's possible some unknown providers may not be equipped to handle destroy plans

This is true for internal testing mocks but I'm curious to see providers in the wild, not using SDKv2 or plugin-framework, that would be broken by destroy plans. The existence of protocol v5 providers complicates this of course if we are not willing to declare this a breaking change and therefore protocol version 7.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we added this was specifically because of terraform-plugin-go (and to a lesser extent, theoretical providers built directly on grpc). While both the legacy shims and the framework handle this case correctly, we couldn't rule out unknown providers which don't. Since capabilities are essentially simple flags, I'm not sure what other design you had in mind. The choice for boolean fields was somewhat arbitrary, but IMO does make the resulting client code easier to read than a series of enums which need to be iterated over individually, though I have no strong opposition if there's another reason to change the data structure.

The basis for using the overall style of protobuf+grpc was specifically for backwards compatible additions whenever possible. While the alternative here was a new method, PlanResourceDestroy, that seemed unnecessary for the small edge case of an unsupported provider, and broke from the consistency of the overall API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our experience with protocol version 6 led us to conclude that we basically never want to have another major version of the protocol again, and that backward-compatible additions to protocol version 6 are going to be our main focus moving forward.

Granular capability negotiation like this feels like a well-trodden path with other similar protocols, but I will admit that I didn't interrogate that deeply and just suggested it because it was familiar to me as a solution to similar problems elsewhere.

While I can understand that it might feel quite drastic to introduce something entirely new here, I think in practice the cost of us being "wrong" about this is relatively low: if we never use this capability mechanism ever again then it will be slightly annoying to have this extra message type here in the protocol specification forever, but I don't think it will have any cross-cutting impact to any other codepaths and therefore we can safely ignore it if we find that it's never useful again in practice, or if we find a better way in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting these things together, a breaking change to the protocol now involves incrementing the minor version and adding a new Capabilities field. These can never be removed or changed. I agree with Martin that this is at worst annoying and that the general granular client/server capabilities design is well known.

Comment on lines 168 to 172
message Capabilities {
// The plan_destroy capability signals that a provider expects a call
// to PlanResourceChange when a resource is going to be destroyed.
bool plan_destroy = 1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should call this ServerCapabilities just in case we end up wanting to add a similar ClientCapabilities capabililties field to GetProviderSchema.Request in future in order for Terraform Core to announce to the provider that it supports something.

I don't have anything specific in mind right now but it's been my experience that capability negotiation systems like this often end up needing to be two-way at some point, e.g. so that the client can make a set of "offers" of something it supports in the request and then the server can choose zero or more of them to accept in the response.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServerCapabilities sounds good to me, and leaves open the possibility for client, without making client/server specific flags within the block :D

Comment on lines 187 to 192
message Capabilities {
// The plan_destroy capability signals that a provider expects a call
// to PlanResourceChange when a resource is going to be destroyed.
bool plan_destroy = 1;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same feedback as above about ServerCapabilities here, of course. 😀


// If the provider doesn't support planning a destroy operation, we can
// return immediately.
if r.ProposedNewState.IsNull() && !capabilities.PlanDestroy {
return resp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we copy r.ProposedNewState into resp.PlannedState here, so that it'll end up being more similar to what would happen if we followed through and did a "real" PlanResourceChange below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may have commented on a stale or intermediate commit. PlannedState and PlannedPrivate are copied here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I had some old draft comments lurking which got posted when I approved here. For some reason they looked like they were already posted for me until now. GitHub's UI confuses me. 😖

// supported protocol features. This is used to indicate availability of
// certain forward-compatible changes which may be optional in a major
// protocol version, but cannot be tested for directly.
message Capabilities {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our experience with protocol version 6 led us to conclude that we basically never want to have another major version of the protocol again, and that backward-compatible additions to protocol version 6 are going to be our main focus moving forward.

Granular capability negotiation like this feels like a well-trodden path with other similar protocols, but I will admit that I didn't interrogate that deeply and just suggested it because it was familiar to me as a solution to similar problems elsewhere.

While I can understand that it might feel quite drastic to introduce something entirely new here, I think in practice the cost of us being "wrong" about this is relatively low: if we never use this capability mechanism ever again then it will be slightly annoying to have this extra message type here in the protocol specification forever, but I don't think it will have any cross-cutting impact to any other codepaths and therefore we can safely ignore it if we find that it's never useful again in practice, or if we find a better way in future.

// supported protocol features. This is used to indicate availability of
// certain forward-compatible changes which may be optional in a major
// protocol version, but cannot be tested for directly.
message Capabilities {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting these things together, a breaking change to the protocol now involves incrementing the minor version and adding a new Capabilities field. These can never be removed or changed. I agree with Martin that this is at worst annoying and that the general granular client/server capabilities design is well known.

Call PlanResourceDestroy during a destroy plan.

This allows providers two new abilities:
- They can evaluate if the plan is valid, notifying users of any
  potential errors before an apply is started, which may not be able to
  complete.
- They can inspect and modify their private data during a destroy plan
  just like they can with an other plan operation.
some of the minimal test provider implementations didn't check for null
values.
This is most easily handled in the plugin code, without involving
Terraform core.

The biggest change here other than checking the PlanDestroy capability,
is the removal of the schema helper methods in the plugins. With the
addition of the capabilities field, combined with the necessity of
checking diagnostics from the schema, the helpers have outlived their
usefulness. Perhaps there's a better pattern for these repetitive calls,
but for now there isn't too extra verbosity involved.
enable destroy planning for the simple providers used in the e2e tests
@jbardin jbardin merged commit 1fba244 into main Jul 6, 2022
@jbardin jbardin deleted the jbardin/plan-destroy branch July 6, 2022 17:57
@github-actions
Copy link

github-actions bot commented Jul 6, 2022

Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch.

@github-actions
Copy link

github-actions bot commented Aug 6, 2022

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature proposal: Allow providers to review planned resource deletion
4 participants