fix: avoid overkilling execa and child processes asynchronously on error #1138
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The recently merged #1117 (and the 12.3.6 release itself) introduced a quite dangerous defect in the clean-up logic for failed multi-process linter runs.
I've made an attempt to fix those, but unfortunately I'm not sure I can quickly come up with a suitable test coverage for the fixes provided.
Reason being is that I'm not sure how to incorporate such low-level checks into integration-like output matchers I saw in the existing suites.
Maybe you do not even cover such stuff typically?
1. Unexpected arguments (fixed by 1c0e6c0)
The first quite simple issue is just missed code bug. We have
which actually runs as
Maybe it was doing fine during testing, but in my environment on Windows machine it caused the whole thing to halt with
Error: kill ENOSYS
.I've researched it a bit at first:
Naturally similar issues were appearing in different other Node utility libraries. The cause was that a
singal
(second argument) passed to theprocess.kill
was carrying an unexpected (by Windows) value , so maintainers usually had to switch fromSIGHUP
toSIGINT
or something similar.However, our situation is quite different and plain
process.kill(id)
would definitely suffuce our needs.2. Unwanted asynchronousness (fixed by a2b2ae8)
The whole
interruptExecutionOnError
function is based solely on a false expectation of thatloop
function lifetime will be less than theERROR_CHECK_INTERVAL
.Guess what. My work-related laptop is bloated with an immense amount of "endpoint protection" stuff.
To say it short: due to installed "protection" from my employer
await pidTree(id)
lookup takes almost 2 full seconds on average. That is usually 8-10 invocations of theloop
function.The implications of it is that on some invocation of
await pidTree(execaPid)
it will be already late --execa
will be dead in the water already and it will again counterblast the clean-up execution with an uncaught error.I was thinking through different ways to fix it:
execa
is still alive on each error-passingloop
invocation (unnecessary invocations of a fewawait pidTree
until finally one of them resolved and processes are finally killed (with some error-catches of overkills));loop
function (unnecessary interval triggers still);loop
.while (running) { await delay(interval) await invoke() }
but still should be implemented carefully.I hope this approach do not contradict with other logic around. Otherwise we may try to fallback to other options, until it fits the environment.
PS
All in all I was actually surprised there are no ultimate fail-safe wrapping around the linters run. Yes, we indeed try to restore last stash on fail, but those bugs above was halting the execution of whole tool.
I also saw some issues in the repository. The causes might be different, but the result is the same: if there's an uncaught error during the last phases, we can fail restoring the repository state. Even more: sometimes
index.lock
(git lockfile) were still in the.git
folder after a failed run.Although I probably can understand, that actully this is not very high priority issue, because user can simply restore the stash on their own. Guess, it wouldn't actually break anything.