Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid overkilling execa and child processes asynchronously on error #1138

Merged

Conversation

lildeadprince
Copy link
Contributor

The recently merged #1117 (and the 12.3.6 release itself) introduced a quite dangerous defect in the clean-up logic for failed multi-process linter runs.

I've made an attempt to fix those, but unfortunately I'm not sure I can quickly come up with a suitable test coverage for the fixes provided.
Reason being is that I'm not sure how to incorporate such low-level checks into integration-like output matchers I saw in the existing suites.
Maybe you do not even cover such stuff typically?

1. Unexpected arguments (fixed by 1c0e6c0)

The first quite simple issue is just missed code bug. We have

ids.forEach(process.kill)

which actually runs as

ids.forEach((id, index) => process.kill(id, index))

Maybe it was doing fine during testing, but in my environment on Windows machine it caused the whole thing to halt with Error: kill ENOSYS.

I've researched it a bit at first:
Naturally similar issues were appearing in different other Node utility libraries. The cause was that a singal (second argument) passed to the process.kill was carrying an unexpected (by Windows) value , so maintainers usually had to switch from SIGHUP to SIGINT or something similar.

However, our situation is quite different and plain process.kill(id) would definitely suffuce our needs.

2. Unwanted asynchronousness (fixed by a2b2ae8)

The whole interruptExecutionOnError function is based solely on a false expectation of that loop function lifetime will be less than the ERROR_CHECK_INTERVAL.

Guess what. My work-related laptop is bloated with an immense amount of "endpoint protection" stuff.
To say it short: due to installed "protection" from my employer await pidTree(id) lookup takes almost 2 full seconds on average. That is usually 8-10 invocations of the loop function.

The implications of it is that on some invocation of await pidTree(execaPid) it will be already late -- execa will be dead in the water already and it will again counterblast the clean-up execution with an uncaught error.

I was thinking through different ways to fix it:

  • 🚫 a lot of inner try-catches (just messy overall);
  • 🚫 checking whether execa is still alive on each error-passing loop invocation (unnecessary invocations of a few await pidTree until finally one of them resolved and processes are finally killed (with some error-catches of overkills));
  • 🚫 introduction of a "deadman semaphore" variable -- kinda the same as previous, but we set it to "dead=true" immediately on the first error-catching invocation of the loop function (unnecessary interval triggers still);
  • βœ… finally I came to just a simple interval clearing on the first error-catching loop.
  • ❔ The last idea came to my mind only now [as I'm writing this], but in fact we may also properly use promises instead of interval, which should've been considered dangerous for any async function in the first place. so you can consider this approach in the future. The idea resides around while (running) { await delay(interval) await invoke() } but still should be implemented carefully.

I hope this approach do not contradict with other logic around. Otherwise we may try to fallback to other options, until it fits the environment.

PS

All in all I was actually surprised there are no ultimate fail-safe wrapping around the linters run. Yes, we indeed try to restore last stash on fail, but those bugs above was halting the execution of whole tool.
I also saw some issues in the repository. The causes might be different, but the result is the same: if there's an uncaught error during the last phases, we can fail restoring the repository state. Even more: sometimes index.lock (git lockfile) were still in the .git folder after a failed run.

Although I probably can understand, that actully this is not very high priority issue, because user can simply restore the stash on their own. Guess, it wouldn't actually break anything.

@lildeadprince lildeadprince changed the title Fix/interrupt execution watcher fix: avoid overkilling execa and child processes asynchronously on error Apr 15, 2022
@iiroj
Copy link
Member

iiroj commented Apr 15, 2022

Thank you for your PR and the good explanation! Seems like the previous feature had some defects in it.

@codecov
Copy link

codecov bot commented Apr 15, 2022

Codecov Report

Merging #1138 (1c0e6c0) into master (d327873) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master     #1138   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           25        25           
  Lines          700       701    +1     
  Branches       182       182           
=========================================
+ Hits           700       701    +1     
Impacted Files Coverage Ξ”
lib/resolveTaskFn.js 100.00% <100.00%> (ΓΈ)

Continue to review full report at Codecov.

Legend - Click here to learn more
Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data
Powered by Codecov. Last update d327873...1c0e6c0. Read the comment docs.

@iiroj iiroj merged commit 1b1f0e4 into lint-staged:master Apr 15, 2022
@github-actions
Copy link
Contributor

πŸŽ‰ This PR is included in version 12.3.8 πŸŽ‰

The release is available on:

Your semantic-release bot πŸ“¦πŸš€

@okonet
Copy link
Collaborator

okonet commented Apr 15, 2022

I just wanted to say "wow" and thank everyone involved into this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

3 participants