Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Worker exits but jest process never finishes (continuation) #13976

Closed
juan-fernandez opened this issue Mar 1, 2023 · 7 comments · Fixed by #14015
Closed

[Bug]: Worker exits but jest process never finishes (continuation) #13976

juan-fernandez opened this issue Mar 1, 2023 · 7 comments · Fixed by #14015

Comments

@juan-fernandez
Copy link

juan-fernandez commented Mar 1, 2023

Version

29.4.3

Steps to reproduce

  1. Clone my repo at git@github.com:juan-fernandez/test-jest-worker-killed-repro.git (thanks @gluxon for the inspiration!). See repo at https://github.com/juan-fernandez/test-jest-worker-killed-repro.
  2. Run npm install
  3. Run npm run test

Expected behavior

The process ends.

Actual behavior

The process hangs indefinitely

Additional context

This seems to be related to the bug described in #13183 and fixed (maybe only partially) in #13566. Also probably related to #13864

The difference between https://github.com/juan-fernandez/test-jest-worker-killed-repro and https://github.com/gluxon/test-jest-worker-killed-repro (the original reproduction scenario) is that there are now more than one worker suddenly being killed.

From the looks of it, jest "runs out of workers" to run the suites and it just hangs forever:

hanging-2

Environment

  System:
    OS: macOS 13.2.1
    CPU: (10) arm64 Apple M1 Max
  Binaries:
    Node: 16.17.0 - ~/.volta/tools/image/node/16.17.0/bin/node
    Yarn: 1.22.19 - ~/.volta/tools/image/yarn/1.22.19/bin/yarn
    npm: 8.15.0 - ~/.volta/tools/image/node/16.17.0/bin/npm
  npmPackages:
    jest: ^29.4.3 => 29.4.3
@gluxon
Copy link
Contributor

gluxon commented Mar 4, 2023

Hey @juan-fernandez — This is a fantastic bug report. I was able to reproduce easily after cloning your repo.

Is this a new problem?

I was a bit worried I made things worse with my bug fix, but I do think this is an existing bug that the fix in #13566 simply uncovered. Before the bug fix, the worker pool coordinator didn't recognize when child processes were killed at all.

Early hypothesis

The coordinator now does recognize killed workers and prints an error message, but I think it's not performing any followup actions as a result. Specifically, I have a feeling it's not reassigning jobs that were delegated to the killed worker.

Setting maxWorkers=2

I noticed that there needs to be more tests present than the value of maxWorkers for the hanging to happen.

For debugging, I was able to simplify the repro by setting maxWorkers to 2 and running only 3 test files. As long as the killed worker runs first, I'm able to see consistent hanging. Similar to your theory @juan-fernandez, I think the simple-2.test.ts file below was assigned to run on the killed worker.

Screenshot 2023-03-04 at 3 00 44 PM

Possible Solutions

Assuming that hypothesis is correct, we need to either:

  1. Spawn a new worker when one is killed.
  2. Or more simply exit the entire test suite when any worker is killed.

I think option 1 makes more sense.

Timelines

I would like to come back and fix this, but need to wrap up a few other commitments first. If any Jest maintainers would like to take over, I'd definitely appreciate the help. Otherwise, I'll try to get a fix open as soon as possible.

@axelchauvin
Copy link

seeing the exact same problem. In our scenario, the SIGKILL is sent by the linux OOMKiller, killing one of the jest child workers and jest will just hang forever

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the Stale label Apr 13, 2023
@gluxon
Copy link
Contributor

gluxon commented Apr 14, 2023

Not stale! Still on my list, but wouldn't mind if anyone wants to take over.

@github-actions github-actions bot removed the Stale label Apr 14, 2023
@PeteTheHeat
Copy link
Contributor

#14015 fixes this, and seems like folks are okay with the approach.

I'll try to get it merged soon.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 20, 2023
@SimenB
Copy link
Member

SimenB commented Jul 7, 2023

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants