fix: avoid overkilling execa and child processes asynchronously on error #1138

lildeadprince · 2022-04-15T04:54:28Z

The recently merged #1117 (and the 12.3.6 release itself) introduced a quite dangerous defect in the clean-up logic for failed multi-process linter runs.

I've made an attempt to fix those, but unfortunately I'm not sure I can quickly come up with a suitable test coverage for the fixes provided.
Reason being is that I'm not sure how to incorporate such low-level checks into integration-like output matchers I saw in the existing suites.
Maybe you do not even cover such stuff typically?

1. Unexpected arguments (fixed by `1c0e6c0`)

The first quite simple issue is just missed code bug. We have

ids.forEach(process.kill)

which actually runs as

ids.forEach((id, index) => process.kill(id, index))

Maybe it was doing fine during testing, but in my environment on Windows machine it caused the whole thing to halt with Error: kill ENOSYS.

I've researched it a bit at first:
Naturally similar issues were appearing in different other Node utility libraries. The cause was that a singal (second argument) passed to the process.kill was carrying an unexpected (by Windows) value , so maintainers usually had to switch from SIGHUP to SIGINT or something similar.

However, our situation is quite different and plain process.kill(id) would definitely suffuce our needs.

2. Unwanted asynchronousness (fixed by `a2b2ae8`)

The whole interruptExecutionOnError function is based solely on a false expectation of that loop function lifetime will be less than the ERROR_CHECK_INTERVAL.

Guess what. My work-related laptop is bloated with an immense amount of "endpoint protection" stuff.
To say it short: due to installed "protection" from my employer await pidTree(id) lookup takes almost 2 full seconds on average. That is usually 8-10 invocations of the loop function.

The implications of it is that on some invocation of await pidTree(execaPid) it will be already late -- execa will be dead in the water already and it will again counterblast the clean-up execution with an uncaught error.

I was thinking through different ways to fix it:

🚫 a lot of inner try-catches (just messy overall);
🚫 checking whether execa is still alive on each error-passing loop invocation (unnecessary invocations of a few await pidTree until finally one of them resolved and processes are finally killed (with some error-catches of overkills));
🚫 introduction of a "deadman semaphore" variable -- kinda the same as previous, but we set it to "dead=true" immediately on the first error-catching invocation of the loop function (unnecessary interval triggers still);
✅ finally I came to just a simple interval clearing on the first error-catching loop.
❔ The last idea came to my mind only now [as I'm writing this], but in fact we may also properly use promises instead of interval, which should've been considered dangerous for any async function in the first place. so you can consider this approach in the future. The idea resides around while (running) { await delay(interval) await invoke() } but still should be implemented carefully.

I hope this approach do not contradict with other logic around. Otherwise we may try to fallback to other options, until it fits the environment.

PS

All in all I was actually surprised there are no ultimate fail-safe wrapping around the linters run. Yes, we indeed try to restore last stash on fail, but those bugs above was halting the execution of whole tool.
I also saw some issues in the repository. The causes might be different, but the result is the same: if there's an uncaught error during the last phases, we can fail restoring the repository state. Even more: sometimes index.lock (git lockfile) were still in the .git folder after a failed run.

Although I probably can understand, that actully this is not very high priority issue, because user can simply restore the stash on their own. Guess, it wouldn't actually break anything.

iiroj · 2022-04-15T07:46:59Z

Thank you for your PR and the good explanation! Seems like the previous feature had some defects in it.

lib/resolveTaskFn.js

codecov · 2022-04-15T07:50:58Z

Codecov Report

Merging #1138 (1c0e6c0) into master (d327873) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master     #1138   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           25        25           
  Lines          700       701    +1     
  Branches       182       182           
=========================================
+ Hits           700       701    +1

Impacted Files	Coverage Δ
lib/resolveTaskFn.js	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d327873...1c0e6c0. Read the comment docs.

github-actions · 2022-04-15T09:45:15Z

🎉 This PR is included in version 12.3.8 🎉

The release is available on:

Your semantic-release bot 📦🚀

okonet · 2022-04-15T21:44:50Z

I just wanted to say "wow" and thank everyone involved into this work.

lildeadprince added 2 commits April 15, 2022 07:42

fix: clear execution interruption interval on first catch

a2b2ae8

fix: avoid passing unexpected arguments from forEach to process.kill()

1c0e6c0

lildeadprince changed the title ~~Fix/interrupt execution watcher~~ fix: avoid overkilling execa and child processes asynchronously on error Apr 15, 2022

iiroj reviewed Apr 15, 2022

View reviewed changes

lib/resolveTaskFn.js Show resolved Hide resolved

iiroj approved these changes Apr 15, 2022

View reviewed changes

iiroj merged commit 1b1f0e4 into lint-staged:master Apr 15, 2022

github-actions bot added the released label Apr 15, 2022

iiroj mentioned this pull request Apr 18, 2022

Bug: intermittent error which occurs on unknown basis eslint --fix failed without output #1130

Open

s-h-a-d-o-w mentioned this pull request Jun 8, 2022

Bailing on failure when running concurrently should also work with vscode's WSL extension #1123

Closed

iiroj mentioned this pull request Jun 8, 2022

fix: ignore "No matching pid found" error #1173

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: avoid overkilling execa and child processes asynchronously on error #1138

fix: avoid overkilling execa and child processes asynchronously on error #1138

lildeadprince commented Apr 15, 2022

iiroj commented Apr 15, 2022

codecov bot commented Apr 15, 2022

github-actions bot commented Apr 15, 2022

okonet commented Apr 15, 2022

fix: avoid overkilling execa and child processes asynchronously on error #1138

fix: avoid overkilling execa and child processes asynchronously on error #1138

Conversation

lildeadprince commented Apr 15, 2022

1. Unexpected arguments (fixed by 1c0e6c0)

2. Unwanted asynchronousness (fixed by a2b2ae8)

PS

iiroj commented Apr 15, 2022

codecov bot commented Apr 15, 2022

Codecov Report

github-actions bot commented Apr 15, 2022

okonet commented Apr 15, 2022

1. Unexpected arguments (fixed by `1c0e6c0`)

2. Unwanted asynchronousness (fixed by `a2b2ae8`)