Easier way to remove/clear pending operations #4265

mitchellmaler · 2020-04-01T19:25:41Z

Today I am testing a bunch with Pulumi which includes some resources not coming up correctly or even me just canceling it to change some code to restart it.
Currently I have to export, delete the pending operations, then import the stack almost each time which is pretty tedious.
Could a stack command be added to just automatically clear pending operations that way it is more user friendly?

lukehoban · 2020-10-06T22:01:58Z

We should also seriously reconsider whether we should block updates on "pending operations" at all. We could choose to warn instead of error - allowing updates to continue. By far the most common case for "pending operations" is a leaked resource creation - which does not actually put a stack in a "stuck" state - it just suggests there may be an out-of-band cleanup step needed.

lukehoban · 2020-11-03T19:36:20Z

My concrete suggestion:

Make the current error a warning - and clean up the text of the message
Add a pulumi state clear-pending-operations that takes care of removing these pending operations (and mention it in the warning)

pgavlin · 2020-11-03T19:39:03Z

We could choose to warn instead of error - allowing updates to continue. By far the most common case for "pending operations" is a leaked resource creation

We could be a bit more precise here and only warn on pending creates. We could continue to error on pending deletes and updates.

We should also fix refresh s.t. it clears pending operations: it was always the intent that that command would resolve these operations.

pgavlin · 2020-11-03T19:42:21Z

For more context, refresh was the intended solution because the problem with pending operations is that we don't know the actual state of the resource being changed. In the case of updates and deletes, refresh can give us the actual state and resolve the pending operation.

lukehoban · 2020-11-03T19:49:47Z

Good points - updated suggestion:

Make the current error a warning
Make pulumi refresh report a warning on pending creates, and clear out pending_operations.
Simplify the warning in (1) to (a) point users to run pulumi refresh (b) be simpler and more specific.

hugbubby · 2021-02-24T04:13:35Z

The comments in this thread almost speak as if this is an unresolved IaC problem. I'm confused as to why there doesn't seem to be a mirror issue in terraform, where I have literally never encountered issues interrupting an apply before. What do they do and why isn't it applicable here?

pgavlin · 2021-02-24T18:04:05Z

I'm confused as to why there doesn't seem to be a mirror issue in terraform, where I have literally never encountered issues interrupting an apply before. What do they do and why isn't it applicable here?

AFAIK, Terraform does not use a model its state as a journal. Instead, their approach relies on refreshing prior to each update in order to detect the current state of resources that might have been interrupted while updating. That approach leaves out the possibility of warning on interrupted creates, etc., and imposes the overhead of refreshing by default.

hugbubby · 2021-04-09T18:59:51Z

@pgavlin

I'm confused as to why there doesn't seem to be a mirror issue in terraform, where I have literally never encountered issues interrupting an apply before. What do they do and why isn't it applicable here?

AFAIK, Terraform does not use a model its state as a journal. Instead, their approach relies on refreshing prior to each update in order to detect the current state of resources that might have been interrupted while updating. That approach leaves out the possibility of warning on interrupted creates, etc., and imposes the overhead of refreshing by default.

We ran an automated system for around a year that regularly interrupted terraform in the middle of resource creation several times a day, and there were no problems with interrupted creates/dangling resources. There was no need to warn in these cases, because terraform always made sure next time to determine what happened to the resource being created from the last apply. Part of the reason we went with terraform for creating this system was its robustness in this respect; no matter what bugs we introduced, we could always make sure that when the dust settled, what was in the terraform state accurately represented what was deployed. As far as I can tell, pulumi's chosen decision to make this event a warning instead of fixing via a refresh doesn't lead to any increase in safety; in fact it's increasing the room for error, because it's relying on (sometimes nonhuman) operators to remove potentially very expensive resources instead of forget about them.

We very much enjoy the speed, but since pulumi does have a mostly equivalent refresh function, it seems to me like it should be able to automatically get() and import schrodinger's resources like terraform does. It doesn't need to do so every time, but in the case of an interrupted apply, someone is going to perform this refresh - pulumi or me. Without this feature, the automation API becomes very tricky to run in any kind of real production environment where containers/processes are expected to be killable at all times. And that previously desirable property of terraform where we could always terraform destroy to clean up all of our resources is pretty much gone.

farvour · 2021-11-16T19:17:06Z

This happens often enough to warrant commenting on this that one has to create an out-of-band job that basically does all the above described:

1.) Export project and stack file.
2.) Use jq or something to wipe pending_operations
3.) Re-import state back into stack.

As hinted above, a nice flag like --ignore-pending-operations could (and should) do all of this internally for us. Maybe have it depend on the --refresh flag even, since during an interrupted run we generally would want to trust the live state of the provider.

farvour · 2021-11-16T19:18:39Z

I'm confused as to why there doesn't seem to be a mirror issue in terraform, where I have literally never encountered issues interrupting an apply before. What do they do and why isn't it applicable here?

AFAIK, Terraform does not use a model its state as a journal. Instead, their approach relies on refreshing prior to each update in order to detect the current state of resources that might have been interrupted while updating. That approach leaves out the possibility of warning on interrupted creates, etc., and imposes the overhead of refreshing by default.

I'm totally fine with that overhead of requiring --refresh if another flag to purge pending ops is included in the CLI. We do that anyway for production deployment pipelines because it's just a Good Idea anyway.. never solely trust the state file IMO.

mitchellmaler · 2021-11-16T19:21:26Z

My current workaround is to run this each time I run into the issue. I created a bash alias/function to make it easy to run.
pulumi stack export | jq '.deployment.pending_operations = []' | pulumi stack import --

Just what it says on the tin. This is implemented by moving the check for pending operations in the last statefile into the deployment executor and making it conditional on whether or not a refresh is being performed (either via `pulumi refresh` or `pulumi up -r`). Because pending operations are not carried over from the base statefile, this has the effect of clearing pending operations if a refresh is performed. Fixes #4265.

nesl247 · 2022-04-06T01:12:49Z

Given that there is still manual work to be done for create operations, shouldn't this remain open then @Zaid-Ajaj ?

Zaid-Ajaj · 2022-04-06T08:47:22Z

Given that there is still manual work to be done for create operations, shouldn't this remain open then?

Hi @nesl247, the reason manual work is still required for pending CREATE operations is because we need the users to explicitly decide what to do with them. If pulumi just clears those, you might end up with live cloud resources that are not tracked by pulumi (orphaned resources) and it's better if the users see this and manually check these live resources before clearing them from the pulumi checkpoint files

nesl247 · 2022-04-06T17:19:06Z

Given that there is still manual work to be done for create operations, shouldn't this remain open then?

Hi @nesl247, the reason manual work is still required for pending CREATE operations is because we need the users to explicitly decide what to do with them. If pulumi just clears those, you might end up with live cloud resources that are not tracked by pulumi (orphaned resources) and it's better if the users see this and manually check these live resources before clearing them from the pulumi checkpoint files

Understood, but it still requires more manual work than should be required, therefore this issue should remain open IMO. I would expect Pulumi to try to refresh to see if the resource was actually created, if it can't get an answer, then to ask the user if they want to remove it from the stack. There should not be any requirement to manually edit the stack.

Frassle · 2022-04-06T18:20:08Z

I would expect Pulumi to try to refresh to see if the resource was actually created

We can't. The big problem with pending Creates is that we don't have an ID to lookup because the Create call never returned to give us one, so we can't look up to see if it was created or not.

Having said that there is scope for us at least to provide easier to use commands to manually resolve pending creates as either "did not create" or "created with id ", and maybe we can come up with something better. I'll leave this open to track that.

hcharley · 2022-04-08T20:55:52Z

Regarding pending creates, is it too much scope creep to use fuzzy name matching and create times when there is no ID?

I realize this probably involves listing and/or searching resources from their provider, but this functionality could also be useful for auto-importing during the Pulumi onboarding process.

A UI or CLI could allow users of Pulumi to more easily import their Cloud resources.

I looked in the docs and while I saw get function, I didn't see a list or similar function, so I assume this functionality doesn't exist and would be quite an undertaking given the number of resources available. So totally understand if this is scope creep, especially in regards to this ticket.

Frassle · 2022-04-11T08:09:06Z

Regarding pending creates, is it too much scope creep to use fuzzy name matching and create times when there is no ID?

It's a really nice idea, but yes it's probably massive scope creep given we'd also need to go and update all the providers to support this as well.

I was thinking of adding commands that we would describe in the warning text, letting you do things like:
pulumi state resolve-create <urn> <id> to resolve the create as successful to the given ID
pulumi state clear-create <urn1> <urn2> to clear the pending creates as unsuccessful.
pulumi state clear-create --all to clear all pending creates.

Not 100% sure that's the right set of commands, but seems like a good starting point to come up with something to help with this last bit of pending-operations pain.

Eventually we might be able to do the list idea as well, but that is hard.

dixler · 2022-07-07T16:46:44Z

pulumi state delete --pending might be a more idiomatic command

edit: just saw that there's a PR #9778 with a lot of feedback

iwahbe · 2022-09-01T14:42:54Z

The resolution we agreed upon is different than what is contained in this thread. For a full description of the resolution, see closing pr: #10394.

leezen added the kind/enhancement Improvements or new features label Apr 2, 2020

leezen mentioned this issue May 11, 2020

Improved recovery for invalid stacks #1094

Closed

leezen mentioned this issue Oct 6, 2020

Is there a way to tell pulumi that resources which are in pending_operations has been created #5517

Closed

This was referenced Nov 13, 2020

Docs for manually resolving pending operations pulumi/docs#1017

Closed

Handle stack pending_operations pulumi/pulumi-kubernetes-operator#9

Open

leezen mentioned this issue Dec 16, 2020

Errors are leaving pending operations forcing manual intervention #5945

Closed

t0yv0 mentioned this issue May 20, 2021

token has expired. #7094

Open

pgavlin mentioned this issue Nov 16, 2021

[engine] Clear pending operations with refresh. #8435

Merged

emiliza assigned pgavlin Nov 16, 2021

emiliza added this to the 0.65 milestone Nov 16, 2021

emiliza added capacity/community-customer size/S Estimated effort to complete (1-2 days). labels Nov 16, 2021

emiliza modified the milestones: 0.65, 0.66 Dec 6, 2021

emiliza mentioned this issue Dec 9, 2021

Resources in a "reading" state can leave pending operations #8559

Open

leezen modified the milestones: 0.66, 0.67 Jan 5, 2022

Frassle reopened this Apr 6, 2022

mikhailshilkov removed this from the 0.69 milestone May 16, 2022

mikhailshilkov unassigned pgavlin and Zaid-Ajaj May 31, 2022

mikhailshilkov added area/cli UX of using the CLI (args, output, logs) and removed resolution/fixed This issue was fixed labels May 31, 2022

iwahbe self-assigned this Jun 2, 2022

iwahbe mentioned this issue Jun 3, 2022

Provide a command for handling pending creates #9778

Closed

3 tasks

mikhailshilkov added this to the 0.74 milestone Jun 7, 2022

mikhailshilkov modified the milestones: 0.74, 0.75 Jul 1, 2022

mikhailshilkov mentioned this issue Jul 12, 2022

Pulumi stack stuck after a resource provisioning issue #5377

Closed

mikhailshilkov removed the capacity/community-customer label Jul 13, 2022

blampe mentioned this issue Jul 14, 2022

Pending operations management pulumi/pulumi-cloud-requests#22

Open

iwahbe removed the size/S Estimated effort to complete (1-2 days). label Jul 20, 2022

mikhailshilkov modified the milestones: 0.75, 0.76 Jul 25, 2022

mikhailshilkov modified the milestones: 0.76, 0.77 Aug 15, 2022

iwahbe mentioned this issue Aug 15, 2022

Enable pulumi refresh to solve pending creates #10394

Merged

3 tasks

iwahbe closed this as completed in #10394 Aug 24, 2022

pulumi-bot added the resolution/fixed This issue was fixed label Aug 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Easier way to remove/clear pending operations #4265

Easier way to remove/clear pending operations #4265

mitchellmaler commented Apr 1, 2020

lukehoban commented Oct 6, 2020

lukehoban commented Nov 3, 2020

pgavlin commented Nov 3, 2020

pgavlin commented Nov 3, 2020

lukehoban commented Nov 3, 2020

hugbubby commented Feb 24, 2021 •

edited

pgavlin commented Feb 24, 2021

hugbubby commented Apr 9, 2021 •

edited

farvour commented Nov 16, 2021

farvour commented Nov 16, 2021

mitchellmaler commented Nov 16, 2021

nesl247 commented Apr 6, 2022

Zaid-Ajaj commented Apr 6, 2022

nesl247 commented Apr 6, 2022

Frassle commented Apr 6, 2022

hcharley commented Apr 8, 2022 •

edited

Frassle commented Apr 11, 2022

dixler commented Jul 7, 2022 •

edited

iwahbe commented Sep 1, 2022

Easier way to remove/clear pending operations #4265

Easier way to remove/clear pending operations #4265

Comments

mitchellmaler commented Apr 1, 2020

lukehoban commented Oct 6, 2020

lukehoban commented Nov 3, 2020

pgavlin commented Nov 3, 2020

pgavlin commented Nov 3, 2020

lukehoban commented Nov 3, 2020

hugbubby commented Feb 24, 2021 • edited

pgavlin commented Feb 24, 2021

hugbubby commented Apr 9, 2021 • edited

farvour commented Nov 16, 2021

farvour commented Nov 16, 2021

mitchellmaler commented Nov 16, 2021

nesl247 commented Apr 6, 2022

Zaid-Ajaj commented Apr 6, 2022

nesl247 commented Apr 6, 2022

Frassle commented Apr 6, 2022

hcharley commented Apr 8, 2022 • edited

Frassle commented Apr 11, 2022

dixler commented Jul 7, 2022 • edited

iwahbe commented Sep 1, 2022

hugbubby commented Feb 24, 2021 •

edited

hugbubby commented Apr 9, 2021 •

edited

hcharley commented Apr 8, 2022 •

edited

dixler commented Jul 7, 2022 •

edited