Resiliency/noise reduction for ephemeral failures #24805
Replies: 6 comments
-
One solution could be, make sure repeated requests over a, say, 4-6 hour period fail before filing the issues? In my case, I have a bunch of repos configured to use a single org-wide config repo - maybe file an issue there and not on any of the repos where the fix isn't actionable? |
Beta Was this translation helpful? Give feedback.
-
I agree with @ljharb: Maybe the issue should not be raised on the first fail, but rather on the second or so. Also, as I stated in #12226:
|
Beta Was this translation helpful? Give feedback.
-
The challenge with these solutions is that it means switching the way Renovate stores state to be from within the platform/repo itself to instead having awareness of "this repo has failed X times in the past Y minutes", etc. There's nothing about a 404'd preset which we can "timestamp", unlike releases where we do have logic about not raising things immediately in case it settles down later. It also essentially means in practice "do not alert users to real problems for 4-6 hours either", which has a huge downside. e.g. let's say a user renamed or deleted one of their presets which was still in use. There's no way for us to determine the difference between that and when GitHub mistakenly returned 404s like earlier this week. |
Beta Was this translation helpful? Give feedback.
-
Agreed, for probably all other cases than the "strange 404" of the last days, this might be a problem.
Again, agreed. So... I guess it comes down to wording the issue better. I feel, like I said above:
would go a long way. If I have had those two (well, the link to the logs is probably not really required but only for better UX) I would have seen the 404 in the logs, tested the link (like I did), seen that it worked and assumed that someone might have been working on something that incidentally collided with one of my repos being checked... |
Beta Was this translation helpful? Give feedback.
-
Even with no other changes, the issues shouldn’t be opened where they aren’t actionable - in my case, they should have been opened on the config repo and not on the hundreds of repos that use it. |
Beta Was this translation helpful? Give feedback.
-
Just to be sure: was a ticket or similar created for GH itself? It sounds like this is something GH may be interested in fixing actually. |
Beta Was this translation helpful? Give feedback.
-
What would you like Renovate to be able to do?
Ideally not create thousands of config warning issues during ephemeral failures, e.g. when GitHub glitches and returns 404 for a preset.
See #12226
In this case GitHub returned a 404 for the Merge Confidence preset - even though it still existed - and this resulted in Renovate treating people's configs as invalid. Although in this case it was an implicit preset it could have applied to any of them.
If you have any ideas on how this should be implemented, please tell us here.
It's a hard problem to solve, because it was a 404 error and not 429 or 5xx. In short, I don't have any great ideas but raising this issue to discuss and get ideas from the community.
One idea is to use a lazy cache where "expired" results are re-queried but if in failure then the last working version is returned. Even this is challenging because normally you'd trust GitHub not to return 404's and instead you'd plan for this for timeouts or 5xx errors.
Is this a feature you are interested in implementing yourself?
Yes
Beta Was this translation helpful? Give feedback.
All reactions