Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jest-worker] Detect infinite loops #10701

Closed
wants to merge 20 commits into from

Conversation

ayshiff
Copy link

@ayshiff ayshiff commented Oct 25, 2020

Summary

Closes #9785.

As said with @jeysal, here is the Draft PR to show where I am.
As I said here, I have been stuck for a moment on this and I will need some help 👍

I would like to have a TDD oriented approach to be able to make sure that the modifications I make are valid. Can someone give me any hints where to start to test the jest-worker heartbeat behavior as this is the part I'm having a hard time moving forward on ?

Currently the inspector initialization is done in jest-haste-map but I think it should be done in jest-worker or in jest-runner.

I made a graph to show the logic i tried to set up.

Test plan

@jeysal
Copy link
Contributor

jeysal commented Oct 25, 2020

Alright, thanks for the work so far, it looks promising, let me try to respond to some things :)

Currently the inspector initialization is done in jest-haste-map but I think it should be done in jest-worker or in jest-runner.

I agree with that jest-haste-map doesn't really have anything to do with this. I think jest-worker itself would be the best place since we can then also offer this functionality with a configurable timeout in its public interface to other consumers (it is actually a package used by many projects other than Jest!)

Can someone give me any hints where to start to test the jest-worker heartbeat behavior as this is the part I'm having a hard time moving forward on ?

Hmm so I think this could be added to the existing tests of processChild and ChildProcessWorker (the latter would mock the processChild to emit the heartbeat messages). Later, once everything works, integration tests can be added.
A processChild test would probably use jest.useFakeTimers to check that the heartbeat is periodically sent.
Note that I think we will need to be able to pass the interval into the process. I think adding this number to the initialize message would make sense.

BTW, do you expect any difficulties with the Node thread workers or can they support the inspector in the same way that child process workers do?

if (inspectorUrl !== undefined) {
session = new inspector.Session();
await session.connect();
await session.post('Debugger.enable');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also doesn't return a promise, but you can promisify post

@SimenB
Copy link
Member

SimenB commented Oct 25, 2020

(just left some nits on the current code, feel free to ignore while you figure out the approach to take 🙂)

@ayshiff
Copy link
Author

ayshiff commented Oct 26, 2020

@jeysal @SimenB Thanks for your review 👍
I am still investigating about the proper way to implement this.

I moved the inspector initialization inside jest-worker and added a configurable timeout in its public interface.
I also added the same logic as the ChildProcessWorker to the Node thread worker. I don't know yet if they can support the inspector the same way that child process workers do.

I keep testing the heartbeat logic between workers and parents but I am still a little lost in the way they communicate with each other.

Tell me if I am wrong but for me the PARENT_MESSAGE_OK indicates that the child process has succeed and it means that we can clear the heartbeat timeout.

@jeysal
Copy link
Contributor

jeysal commented Oct 26, 2020

I suppose there are multiple ways to go about this - either the child always sends heartbeats from when it is spawned until it is destroyed, or the child only sends heartbeats while working on a task. I suspect the former is easier, and the overhead of an interval sending one small message usually every couple of seconds seems negligible. So in that case the interval would be scheduled in initialize and cleared in exitProcess.

@ayshiff
Copy link
Author

ayshiff commented Oct 26, 2020

I suppose there are multiple ways to go about this - either the child always sends heartbeats from when it is spawned until it is destroyed, or the child only sends heartbeats while working on a task. I suspect the former is easier, and the overhead of an interval sending one small message usually every couple of seconds seems negligible. So in that case the interval would be scheduled in initialize and cleared in exitProcess.

You are right, it seems easier to handle !
I'll look in that direction then ! Thanks.

@ayshiff ayshiff force-pushed the feature/detect-infinite-loops branch from ceae148 to 328bdae Compare October 27, 2020 10:01
packages/jest-worker/src/index.ts Outdated Show resolved Hide resolved
packages/jest-worker/src/index.ts Outdated Show resolved Hide resolved
@@ -72,18 +74,23 @@ export default class JestWorker {
private _farm: Farm;
private _options: FarmOptions;
private _workerPool: WorkerPoolInterface;
private _inspector: Session | undefined;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private _inspector: Session | undefined;
private _inspectorSession: Session | undefined;

easier to grok when we have the inspector import

packages/jest-worker/src/workers/ChildProcessWorker.ts Outdated Show resolved Hide resolved
packages/jest-worker/src/workers/ChildProcessWorker.ts Outdated Show resolved Hide resolved
packages/jest-worker/src/workers/ChildProcessWorker.ts Outdated Show resolved Hide resolved
packages/jest-worker/src/workers/NodeThreadsWorker.ts Outdated Show resolved Hide resolved
packages/jest-worker/src/workers/processChild.ts Outdated Show resolved Hide resolved
@@ -59,6 +66,12 @@ const messageListener: NodeJS.MessageListener = request => {
};
process.on('message', messageListener);

function sendParentMessageHeartbeat() {
if (process && process.send) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (process && process.send) {
if (process?.send) {

when is process not defined?

@@ -42,6 +45,10 @@ const messageListener = (request: any) => {
const init: ChildMessageInitialize = request;
file = init[2];
setupArgs = request[3];
heartbeatIntervalValue = request[4];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as for process

@SimenB
Copy link
Member

SimenB commented Oct 27, 2020

again, feel free to ignore my nits, I was just stuck in line at the grocery store with nothing better to do 😀

@ayshiff
Copy link
Author

ayshiff commented Oct 28, 2020

I try to test the current implementation with a simple test file containing an infinite loop to see how the workers and the child processes handle it:

// infinite_loop.test.ts
it ('should handle infinite loop') {
  while (true) {}
}

But unfortunately when I try to test if the child processes emit correctly the PARENT_MESSAGE_HEARTBEAT to the parent within the setInterval, nothing is logged.

I don’t know if the logs are stopped when the child processes are working or if they don’t emit anything at all. (Maybe they get stopped before the first interval ends.)

if (inspectorUrl !== undefined) {
session = new inspector.Session();
session.connect();
session.post('Debugger.enable');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is async

Suggested change
session.post('Debugger.enable');
await util.promisfy(session.post)('Debugger.enable');

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TS is complaining about the return type of promisify:
Argument of type '"Debugger.enable"' is not assignable to parameter of type '"NodeWorker.disable"'.

It returns something with this type:

// util.promisify(session.post)('Debugger.enable')
function util.promisify<"NodeWorker.disable", unknown>(fn: (arg1: "NodeWorker.disable", callback: (err: Error | null, result: unknown) => void) => void): (arg1: "NodeWorker.disable") => Promise<unknown> (+13 overloads)

Also, what is the best way to handle the asynchronous assignment of the inspector inside constructor ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I had issues with promisify myself #8596 (comment)

Manual new Promise should work 👍

Regarding best way of async init is probably a setup function of some sort... Unless we can intercept the user's function calls as they are all async?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or exposing a factory function instead of the class directly. Can do that in Jest 27 even if this PR doesn't land in time for it

});
}
// @ts-expect-error: adding custom properties to errors.
error.type = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1? should we add a name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add another PARENT_MESSAGE_ERROR ?

// types.ts
export const PARENT_MESSAGE_HEARTBEAT_ERROR: 5 = 5;

export type PARENT_MESSAGE_ERROR =
  ...
  | typeof PARENT_MESSAGE_HEARTBEAT_ERROR;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually worth it though? It's not a message that is actually ever sent is it? It's just to set a property on an error that can further identify the error but that I don't think we really need?
I think declaring messages that are not actually messages ever being passed between parent and child is confusing

@ayshiff ayshiff force-pushed the feature/detect-infinite-loops branch from cba2c2f to 603769f Compare October 28, 2020 15:26
@jeysal
Copy link
Contributor

jeysal commented Oct 28, 2020

@ayshiff isn't the problem you described already integration testing level? I would imagine the first tests for processChild and ChildProcessWorker as I roughly described them earlier wouldn't test such a real use case but only test that the interval is scheduled correctly in the processChild and handled correctly in the ChildProcessWorker

@ayshiff
Copy link
Author

ayshiff commented Oct 31, 2020

@jeysal I come to ask for help if you have time 👍
I am trying to correct the tests that do not pass.
As Simen said, I'm exposing a factory function (create) for jest-worker instead of the class directly.

Inside the tests of jest-haste-map I mock jest-worker like this:

jest.mock('jest-worker', () => ({
  create: jest.fn(worker => {
    mockWorker = jest.fn((...args) => require(worker).worker(...args));
    mockEnd = jest.fn();

    return {
      end: mockEnd,
      worker: mockWorker,
    };
  }),
}));

But in the test 'distributes work across workers', the jestWorker.create.mock.calls has a length of 5 instead of 1 and mockWorker.mock.calls has a length of 1 instead of 5 and it contains only the last worker instead of the five.

@ayshiff ayshiff force-pushed the feature/detect-infinite-loops branch from 870d5c5 to aa595aa Compare October 31, 2020 21:33
@jeysal
Copy link
Contributor

jeysal commented Nov 2, 2020

Hmm that's odd, I thought maybe it would have to be jest.mock('jest-worker', () => ({ default: class Worker { create = jest.fn(...)}})) because jest-worker is a default export, but if that's the problem I'd expect it to fail with 'create is not a function' ... @SimenB ideas?
@ayshiff can you perhaps push any test changes you made so that we can check it out

@ayshiff
Copy link
Author

ayshiff commented Nov 2, 2020

Hmm that's odd, I thought maybe it would have to be jest.mock('jest-worker', () => ({ default: class Worker { create = jest.fn(...)}})) because jest-worker is a default export, but if that's the problem I'd expect it to fail with 'create is not a function' ... @SimenB ideas?
@ayshiff can you perhaps push any test changes you made so that we can check it out

@jeysal My test changes are already pushed inside ChildProcessWorker.test.js and processChild.test.js.
I will push the changes I mentioned above 👍

@jeysal
Copy link
Contributor

jeysal commented Nov 2, 2020

Haven't taken a look yet, but #9054 just randomly popped into my inbox because someone commented, perhaps related?

@jeysal
Copy link
Contributor

jeysal commented Nov 3, 2020

Nope, #9054 was a false report

@jeysal
Copy link
Contributor

jeysal commented Nov 3, 2020

@ayshiff your _getWorker change in jest-haste-map was racy because of the way it accessed the shared this._worker. I pushed a fix for you. Let me know when you need anything else :)

@ayshiff
Copy link
Author

ayshiff commented Nov 6, 2020

@jeysal Now that I have been able to test that the behavior between process childs and workers was consistent (heartbeat logic inside ChildProcessWorker.test.js and processChild.test.js) I would like to be able to:

  1. Correct failed tests

I suspect that some setTimeout are not well cleared.
When I run yarn jest, some tests fail with the heartbeat timeout message Test worker was unresponsive for 5000 milliseconds. There was no inspector connected so we were unable to capture stack frames before it was terminated. whereas when I run the test individually the problem does not appear.

Also, I tried to test if the inspector was properly initialized when we call the Worker.create() factory function and again when I run the test individually it works but when I run it with the others it tells me that the address on which it is launched is already in use: Starting inspector on 127.0.0.1:9229 failed: address already in use.

I imagine that the inspector is not properly closed between tests, but i can't find where.
(I tried to clear the timeout when we are into the other cases of the switch statement inside the _onMessage child message handler but without much success.)

  1. Process error with the stack trace inside monitorHeartbeatTimeout

  2. Add e2e tests for infinite loops detection

Thank you again for taking time to help me 👍

@jeysal
Copy link
Contributor

jeysal commented Nov 8, 2020

2. Add e2e tests for infinite loops detection

I imagine this could be as straightforward as creating a test that does while(true);, running Jest on it and checking that it errors with the stack trace and everything.

  • Process error with the stack trace inside monitorHeartbeatTimeout

I suppose we would do something similar to when receiving a CLIENT_ERROR, that is call _onProcessEnd (in this case we should probably check that there even is a current task running, which may in theory not be the case at the time the infinite loop runs, e.g. if the task scheduled a leftover timeout that then causes the infinite loop), then run the existing force exit logic?


Regarding the test timeouts: Could it be that some tests really just block that long? We have a very high test timeout set, and also consider that the e2e tests are basically doing execSync('jest'). It seems that at the moment the worker heartbeat timeout is fixed, but we may need to pass something more dynamic in, probably based on the testTimeout plus a certain value?
Perhaps it's not actually an issue with any clearing of timeouts. The fact that it passes when running one test could also be due to the limit not being exceeded when the machine is not under heavy load.
@SimenB this leads me to think: Don't we have a problem with dynamic jest.setTimeout()s, or even individual test timeouts, because we won't be able to tell the worker what the new maximum heartbeat interval is? :/

@jeysal jeysal force-pushed the feature/detect-infinite-loops branch from ec57961 to 0c8f158 Compare April 5, 2021 18:17
@jeysal
Copy link
Contributor

jeysal commented Apr 5, 2021

Rebased and pushed some minor cleanup.
The error I'm getting on the e2e test is about the inspector being opened when there is already an open inspector. However looking at the code related to the inspector/stacktrace bit, I'm not quite sure I understand how it's supposed to work. Aren't you attaching the inspector to the worker parent, not the processChildren?

@ayshiff
Copy link
Author

ayshiff commented Apr 5, 2021

Rebased and pushed some minor cleanup.
The error I'm getting on the e2e test is about the inspector being opened when there is already an open inspector. However looking at the code related to the inspector/stacktrace bit, I'm not quite sure I understand how it's supposed to work. Aren't you attaching the inspector to the worker parent, not the processChildren?

Thanks for your help 👍
I am attaching the inspector to the worker parent. Should I attach it to processChild ?

@jeysal
Copy link
Contributor

jeysal commented Apr 6, 2021

I'm not very familiar with inspector, but I thought that's what you would want, because we're interested in the stack trace of the child when it is stopped in an infinite loop, right?

@ayshiff
Copy link
Author

ayshiff commented Apr 6, 2021

Oh yes you are right.
My logic was probably wrong then, I'll try to move the inspector logic inside processChild.

I'll ping you when I'm done with it 👍

@github-actions
Copy link

github-actions bot commented Sep 8, 2022

This PR is stale because it has been open 1 year with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the Stale label Sep 8, 2022
@github-actions
Copy link

github-actions bot commented Oct 8, 2022

This PR was closed because it has been stalled for 30 days with no activity. Please open a new PR if the issue is still relevant, linking to this one.

@github-actions github-actions bot closed this Oct 8, 2022
@github-actions
Copy link

github-actions bot commented Nov 8, 2022

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect infinite loops and stop tests
4 participants