CI with bors is slow, still flaky, and provides no way to manually override #11449

Frassle · 2022-11-23T16:25:31Z

Pretty much the title.

We're finding that bors has not really improved our issues with CI, it's at best moved them around, and more likely made them worse.

For reasons unrelated to bors our CI actions are flaky and often timeout or crash the github runners. With bors that requires a complete retry of the bors action, while previously we could use githubs "retry failing jobs" feature to just retry the flaky parts.

Bors itself seems to be slow, I've seen PRs sit for hours after a "bors merge" comment with no action taken by bors.

Bors also obstructs the option to manually push things through. For example if we have a P0 bug in a release with a trivial fix we might want to push that fix and release faster accepting the risk of not running all the tests against the benefit of getting a P0 bug fix in customer hands sooner.

Comments like the above have come up repeatedly over the last few months since switching to bors, and we we're generally happy to accept that there would be some teething pain with the switch. However there hasn't been any concrete proposals put forward to fix these issues.

As such this issue is a commitment to either get concrete solutions put in place for the above, that is:

Fixing the need for so many retries, or improving the retry experience to not cost a whole run
Speeding up the action of bors merge to less than an hour of wait time
Providing a way to override the merge queue on a per-PR basis

Failing fixes for the above we should move away from using bors, and return to the old system of PR tests and master tests with retries. We will still want to investigate flakyness at some point in that system, but it's a lot less pressing and doesn't have the other negatives of bors. It also makes our KPI of green master meaningful again, giving us back a metric we've lost to see how stability is going.

Frassle · 2022-11-23T16:31:36Z

Oh also it's littering our release page with tens of draft releases that aren't meaningful.

dixler · 2022-11-23T16:43:58Z

Some community PRs are currently in a weird state where bors won't merge them
They're tracked in this issue: #11423

justinvp · 2023-07-18T06:49:51Z

Given #13501, do we still need to keep this open?

Frassle · 2023-07-18T07:16:50Z

Probably not, especially if GitHub merge queues allow partial retries of failed jobs which was the main complaint for this issue.

blampe · 2023-08-09T05:39:23Z

Bors also obstructs the option to manually push things through. For example if we have a P0 bug in a release with a trivial fix we might want to push that fix and release faster accepting the risk of not running all the tests against the benefit of getting a P0 bug fix in customer hands sooner.

You can jump the queue but I don't think it will let you skip tests (since that's the whole point of the queue). Instead, for P0 situations you could disable the merge queue requirement from the UI -- maybe not ideal, but very easy in a pinch.

especially if GitHub merge queues allow partial retries of failed jobs which was the main complaint for this issue.

I think this is addressed by the only merge non-failing pull requests option, but I haven't used it personally.

Leaving this checkbox unselected can be useful if you have intermittent test failures, but don't want false negatives to hold up the queue.

Frassle added the area/build CI/CD for pulumi/pulumi label Nov 23, 2022

justinvp added the kind/engineering Work that is not visible to an external user label Jul 18, 2023

Frassle mentioned this issue Aug 9, 2023

Migrate from Bors to GitHub merge queues #13501

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI with bors is slow, still flaky, and provides no way to manually override #11449

CI with bors is slow, still flaky, and provides no way to manually override #11449

Frassle commented Nov 23, 2022

Frassle commented Nov 23, 2022

dixler commented Nov 23, 2022

justinvp commented Jul 18, 2023

Frassle commented Jul 18, 2023

blampe commented Aug 9, 2023