-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement up --continue-on-error #15740
Conversation
Changelog[uncommitted] (2024-04-22)Features
|
6fb4216
to
7fadab1
Compare
pkg/resource/deploy/source_eval.go
Outdated
return &pulumirpc.RegisterResourceResponse{ | ||
Urn: string(result.State.URN), | ||
Id: string(result.State.ID), | ||
Object: nil, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the SDK gets a response that the resource did register, but just has no outputs, and might have an empty ID?
That seems suspect, I don't trust SDKs are going to handle this well vis-a-vis working out unknown vs undefined vs error'd output states.
4c87d7c
to
841a964
Compare
sdk/nodejs/output.ts
Outdated
@@ -307,6 +307,7 @@ To manipulate the value of this Output, use '.apply' instead.`); | |||
public apply<U>(func: (t: T) => Input<U>, runWithUnknowns?: boolean): Output<U> { | |||
// we're inside the modern `output` code, so it's safe to call `.allResources!` here. | |||
|
|||
runWithUnknowns = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to minimise blast radius we could do a pass removing the ifDryRun
apply logic from nodejs as it's own PR. Nothing should really depend on that, and we can sanity check just that change is safe.
4c2cab2
to
575a9c5
Compare
cmd/pulumi-test-language/testdata/l2-failed-create-continue-on-error/main.pp
Outdated
Show resolved
Hide resolved
4f8cf84
to
7913740
Compare
538a6ef
to
e7a7fb1
Compare
Similar to what we're doing to the other SDKs in pulumi/pulumi#15740, this enables dealing with the SkipReason field in the RegisterResource response.
Similar to what we're doing to the other SDKs in pulumi/pulumi#15740, this enables dealing with the SkipReason field in the RegisterResource response.
pkg/resource/deploy/source.go
Outdated
Failed bool // true if the resource registration failed. | ||
Skipped bool // true if the resource registration was skippeg. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dislike that this is a tri-state express as two bools :( Maybe enum it instead as something like SUCCESS/FAILED/SKIPPED?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah an enum would be much better. Done in 9ac2fde
2b2ffb4
to
39f6e58
Compare
} | ||
|
||
programF := deploytest.NewLanguageRuntimeF(func(_ plugin.RunInfo, monitor *deploytest.ResourceMonitor) error { | ||
stackURN, _, _, _, err := monitor.RegisterResource(resource.RootStackType, "test", false) | ||
failing, _, _, _, err := monitor.RegisterResource("pkgB:m:typB", "failing", true, deploytest.ResourceOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update RegisterResource here to return the Reason field? Maybe join all the fields into a struct at this point rather than adding another return value, could struct-ify in another PR to keep changes small then just rebase and add Reason to the struct in this PR. I don't like that there's a critical response being returned here that we're not even looking at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that sounds like a good idea: done in #15988.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done now. I've rebased this PR and temporarily based this PR on that branch to make reviewing easier until that can go through the merge queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR merged now, so this is again based on master
.
err = monitor.RegisterResourceOutputs(stackURN, outputs) | ||
independent1, _, _, _, err := monitor.RegisterResource( | ||
"pkgA:m:typA", "independent1", true, deploytest.ResourceOptions{ | ||
SupportsResultReporting: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have another test that does the same as this one but without SupportsResultReporting and check the engine still handles that correctly (for old sdks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done for this and below ✔️
programF := deploytest.NewLanguageRuntimeF(func(_ plugin.RunInfo, monitor *deploytest.ResourceMonitor) error { | ||
stackURN, _, _, _, err := monitor.RegisterResource(resource.RootStackType, "test", false) | ||
_, _, _, _, err := monitor.RegisterResource("pkgB:m:typB", "failing", true, deploytest.ResourceOptions{ | ||
SupportsResultReporting: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, we should have a version of this without SupportsResultReporting for testing old SDKs
@@ -445,8 +449,36 @@ func (ex *deploymentExecutor) handleSingleEvent(event SourceEvent) error { | |||
if err != nil { | |||
return err | |||
} | |||
// Exclude the steps that depend on errored steps if ContinueOnError is set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the deployment_executor responsible for this rather than the step generator, or even the source eval? We know the step generator/source eval will not see the new registration till the step_executor has finished it because it's dependent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a few reasons I kept it here:
- It matches what
destroy --continue-on-error
does, and I think that needs to live in the deployment executor because the step generator generates steps for all deletes and doesn't know which ones failed. - These steps do exist, they just don't get executed for deployment. So I think the deployment executor is best suited to make a decision on that. The step generator also has no access to the step executors failed steps, so we'd need to inject them there somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough
39f6e58
to
93cc756
Compare
Co-authored-by: Fraser Waters <fraser@pulumi.com>
c2f68e5
to
a519bc2
Compare
* make dotnet SDK support SkipReason Similar to what we're doing to the other SDKs in pulumi/pulumi#15740, this enables dealing with the SkipReason field in the RegisterResource response. * add changelog * import latest proto files and do fixups
Users are already able to use --continue-on-error for destroy in the automation API. Implement the same for `up` as well. This should only be merged after #15740 is. (Note that this still includes the commits from there, I will rebase once that PR is merged, hence this is only a draft PR for now)
Tentative changelog, but also planning to include #16057 ### Features - [auto/{go,nodejs,python}] Add support for the continue-on-error parameter of the up command to the Automation API [#15953](#15953) - [engine] Add a --continue-on-error flag to pulumi up [#15740](#15740) ### Bug Fixes - [sdk/nodejs] Fix a race condition that could cause the NodeJS runtime to terminate before finishing all work [#16005](#16005) - [sdk/python] Fix an exception when setting providers resource option with a dict [#16022](#16022) - [sdk/python] Fix event loop tracking in the python SDK when using remote transforms [#16039](#16039) - [sdk/python] Workaround lazy module loading regression [#16038](#16038) ### Miscellaneous - [cli/plugin] Move PluginKind type definition into apitype and re-export for backward compatibility
Will wait to merge this until after #16057 merges ### Features - [auto/{go,nodejs,python}] Add support for the continue-on-error parameter of the up command to the Automation API [#15953](#15953) - [engine] Add a --continue-on-error flag to pulumi up [#15740](#15740) ### Bug Fixes - [sdk/nodejs] Fix a race condition that could cause the NodeJS runtime to terminate before finishing all work [#16005](#16005) - [sdk/python] Fix an exception when setting providers resource option with a dict [#16022](#16022) - [sdk/python] Fix event loop tracking in the python SDK when using remote transforms [#16039](#16039) - [sdk/python] Workaround lazy module loading regression [#16038](#16038) ### Miscellaneous - [cli/plugin] Move PluginKind type definition into apitype and re-export for backward compatibility [#15946](#15946)
Similar to destroy --continue-on-error, this flag allows
pulumi up
to continue if any errors are encountered.
Currently when we encounter an error while creating/updating a
resource, we cancel the context of the deployment executor, and thus
the deployment stops once the resources that are being processed in
parallel with the failed one finish being updated.
For --continue-on-error, we ignore these errors, and let the
deployment executor continue. In order for the deployment executor to
exit eventually we also have to mark these steps as done, as the
deployment executor will otherwise just hang, and callers with open
channels waiting for it to finish/report back will hang indefinitely.
The errors in the step will still be reported back to the user by the
OnResourceStepPost callback.
Fixes #14515