Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jest worker hanging #1

Closed
wants to merge 1 commit into from
Closed

Jest worker hanging #1

wants to merge 1 commit into from

Conversation

gluxon
Copy link
Owner

@gluxon gluxon commented Oct 29, 2022

Summary

Fixes jestjs#13183.

Before

Before this change, if a Jest worker process is killed externally (e.g. through kill <PID>), the Jest test harness is never informed and the test runs forever. This happens even if a finite timeout (default 5s) is set.

For my team specifically, we were observing OOM-killer on Linux kill jest-worker processes and cause CI tests to hang forever. Since OOM-killer is included on all Linux distributions, I believe this bug affects any users of Jest on Linux.

hanging

This was difficult to root cause since there were no logs of the failure. Once we realized more memory was required on CI for our project, we bumped our container's memory limit. Previously hanging tests completed successfully from there.

After

After the change in this PR, the test above exits with an error. See https://github.com/gluxon/test-jest-worker-killed-repro for the source code of this repro.

Screenshot 2022-11-06 at 1 35 02 PM

Test plan

This change contains a new test. Before the fix, the test fails with:

Execution

❯ yarn jest WorkerEdgeCases
 FAIL  packages/jest-worker/src/workers/__tests__/WorkerEdgeCases.test.ts (23.206 s)
  ✓ workers/processChild.js should exist (1 ms)
  ✓ workers/threadChild.js should exist
  ✓ types.js should exist
  ProcessWorker
    ✓ should get memory usage (45 ms)
    ✓ should recycle on idle limit breach (652 ms)
    should automatically recycle on idle limit breach
      ✓ initial state (1 ms)
      ✓ new worker starts (71 ms)
      ✓ worker continues to run after kill delay (601 ms)
      ✓ expected state order (1 ms)
    should cleanly exit on out of memory crash
      ✓ starting state
      ✓ worker ready (1 ms)
      ✓ worker crashes and exits (127 ms)
      ✓ worker stays dead (3 ms)
      ✓ expected state order
    should handle regular fatal crashes
      ✓ starting state (1 ms)
      ✓ processes restart (4003 ms)
      ✓ processes exits (2 ms)
  ThreadWorker
    ✓ should get memory usage (75 ms)
    ✓ should recycle on idle limit breach (644 ms)
    should automatically recycle on idle limit breach
      ✓ initial state
      ✓ new worker starts (69 ms)
      ✓ worker continues to run after kill delay (601 ms)
      ✓ expected state order (1 ms)
    should cleanly exit on out of memory crash
      ✓ starting state (1 ms)
      ✓ worker ready
      ✓ worker crashes and exits (137 ms)
      ✓ worker stays dead
      ✓ expected state order
    should handle regular fatal crashes
      ✓ starting state (1 ms)
      ✓ processes restart (4002 ms)
      ✓ processes exits (2 ms)
  should not hang on external process kill
    ✕ onEnd callback is called (10505 ms)

  ● should not hang on external process kill › onEnd callback is called

    thrown: "Exceeded timeout of 10000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      at test (packages/jest-worker/src/workers/__tests__/WorkerEdgeCases.test.ts:421:3)

Test Suites: 1 failed, 1 total
Tests:       1 failed, 31 passed, 32 total
Snapshots:   0 total
Time:        23.351 s
Ran all test suites matching /WorkerEdgeCases/i.

After the fix, the test passes:

❯ yarn jest WorkerEdgeCases

> @jest/monorepo@0.0.0 jest /Users/gluxon/Developer/jest
> node ./packages/jest-cli/bin/jest.js "WorkerEdgeCases"

 PASS  packages/jest-worker/src/workers/__tests__/WorkerEdgeCases.test.ts (13.28 s)
  ✓ workers/processChild.js should exist
  ✓ workers/threadChild.js should exist
  ✓ types.js should exist
  ProcessWorker
    ✓ should get memory usage (44 ms)
    ✓ should recycle on idle limit breach (649 ms)
    should automatically recycle on idle limit breach
      ✓ initial state (2 ms)
      ✓ new worker starts (80 ms)
      ✓ worker continues to run after kill delay (601 ms)
      ✓ expected state order (1 ms)
    should cleanly exit on out of memory crash
      ✓ starting state
      ✓ worker ready (1 ms)
      ✓ worker crashes and exits (141 ms)
      ✓ worker stays dead (4 ms)
      ✓ expected state order
    should handle regular fatal crashes
      ✓ starting state (1 ms)
      ✓ processes restart (4007 ms)
      ✓ processes exits (3 ms)
  ThreadWorker
    ✓ should get memory usage (78 ms)
    ✓ should recycle on idle limit breach (644 ms)
    should automatically recycle on idle limit breach
      ✓ initial state (1 ms)
      ✓ new worker starts (91 ms)
      ✓ worker continues to run after kill delay (601 ms)
      ✓ expected state order
    should cleanly exit on out of memory crash
      ✓ starting state
      ✓ worker ready
      ✓ worker crashes and exits (141 ms)
      ✓ worker stays dead
      ✓ expected state order (1 ms)
    should handle regular fatal crashes
      ✓ starting state (1 ms)
      ✓ processes restart (4003 ms)
      ✓ processes exits (2 ms)
  should not hang on external process kill
    ✓ onEnd callback is called (588 ms)

Test Suites: 1 passed, 1 total
Tests:       32 passed, 32 total
Snapshots:   0 total
Time:        13.45 s, estimated 24 s
Ran all test suites matching /WorkerEdgeCases/i.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant