Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan Tracking for Vacate can result in a failure to the consumer when the plan is successful #407

Open
1 of 2 tasks
Z3r0Sum opened this issue Jul 28, 2022 · 1 comment
Open
1 of 2 tasks

Comments

@Z3r0Sum
Copy link
Contributor

Z3r0Sum commented Jul 28, 2022

Overview

There are certain situations where we can have non-fatal errors in a plan; however, currently the way the tracking code works is that any and all responses along the way that might contain errors/warnings get bubbled back up as a fatal issue for that plan. In reality, there are situations where these things can happen, but the plan is still successful.

Possible Implementation

It seems to me we need monitor the stream of events coming from the response tracking in order to evaluate whether the above is true.

The relevant code paths:

How Vacates consume it (AFAICT it's the only consumer):

POC (WIP): master...Z3r0Sum:cloud-sdk-go:feat-overall-plan-success-tracking. Summary:

  • This is an attempt to avoid sending what would be perceived as a 'fatal'
    set of errors back to the consumer of the TrackChange() func.
  • The scenario we are trying to account for:
    • There are errors/warnings along the life of a plan as it relates
      to a vacate
    • The above do not cause the plan to halt
    • The plan eventually succeeds
    • The consumer receives these errors in a fatal manner when in
      reality there is no cause for action
  • Problems with current approach:
    • Does not make the end user aware of any problems at all if the
      plan succeeded. Not sure we care?
  • TODO: thorough testing and confirm assumptions about the following:
    • Do all fatal errors manifest themselves properly in the
      plan-completed step, or prevent us from getting to it. If both
      are true, this approach appears safe other than the problem
      outlined above.

Testing

  • Added unit tests
  • Integration testing is needed/required

Context

Vacates report a failure when in reality it was a non-issue and the plan succeeded along with the vacate of the instance.

Your Environment

Internal SaaS.

@gheorghepucea
Copy link

As discussed with @Z3r0Sum we can test if the fatal errors propagate to the plan-completed step and put this behind an integration test (https://github.com/elastic/cloud/tree/master/scala-services/integration-tests) which can act as a contract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants