Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test-benchmark-path #18254

Closed
BridgeAR opened this issue Jan 19, 2018 · 11 comments
Closed

Flaky test-benchmark-path #18254

BridgeAR opened this issue Jan 19, 2018 · 11 comments
Labels
arm Issues and PRs related to the ARM platform. benchmark Issues and PRs related to the benchmark subsystem. flaky-test Issues and PRs related to the tests with unstable failures on the CI.

Comments

@BridgeAR
Copy link
Member

Seems like this test some times times out.

https://ci.nodejs.org/job/node-test-binary-arm/12927/RUN_SUBSET=6,label=pi3-raspbian-jessie/console

20:28:04 not ok 15 parallel/test-benchmark-path
20:28:04   ---
20:28:04   duration_ms: 240.61
20:28:04   severity: fail
20:28:04   stack: |-
20:28:04     timeout
@BridgeAR BridgeAR added arm Issues and PRs related to the ARM platform. flaky-test Issues and PRs related to the tests with unstable failures on the CI. labels Jan 19, 2018
@joyeecheung joyeecheung added the benchmark Issues and PRs related to the benchmark subsystem. label Jan 24, 2018
@joyeecheung
Copy link
Member

Looks more like a bug within the benchmark tools

@apapirovski
Copy link
Member

I think this one got resolved at some point? At least that's my recollection.

@gireeshpunathil
Copy link
Member

started to see this again, for example: https://ci.nodejs.org/job/node-test-commit-aix/nodes=aix61-ppc64/15848/consoleFull

I will see if I can recreate locally.

@gireeshpunathil
Copy link
Member

06:32:20 not ok 235 parallel/test-benchmark-path
06:32:20   ---
06:32:20   duration_ms: 120.191
06:32:20   severity: fail
06:32:20   stack: |-
06:32:20     timeout

Trott added a commit to Trott/io.js that referenced this issue Jun 18, 2018
@Trott
Copy link
Member

Trott commented Jun 18, 2018

Failing on AIX in parallel with a timeout suggests that perhaps it should be moved to sequential.

I see on successful runs that it still can take 25 seconds or so, and that's a lot for a test in parallel. Get the machine busy doing something else and have this test compete with just the wrong other tests in parallel, and it times out. Or at least, that would be the first theory.

So I'd say move this to sequential. Or check to see if there's a way to reduce the time it takes to run this test. But I did that and I don't think there's anything to do there unless we want to combine benchmarks into fewer files. It seems like launching processes is where the time is taken up.

PR to move this test to sequential: #21393

@gireeshpunathil
Copy link
Member

Agree that it is a manifestation of parallelism, not much to do with the test as such. Running independantly, it comes out in ~9 seconds.

Do we know what are the tunables which decide the number of tests that run in parallel? I am just wondering if there is a mismatch in those tunables in relation to the resource limits in the failing AIX box: for example, if the number of tests are derived as a function of the number of CPUs but the amount of usable memory / user processes etc. are otherwise constrained through ulimit settings, the process can be seen as timing out? /cc @mhdawson

@gireeshpunathil
Copy link
Member

On a related note: what is the rationale behind running benchmarks in parallel? what inference one can derive, and what purpose they solve?

@Trott
Copy link
Member

Trott commented Jun 19, 2018

Do we know what are the tunables which decide the number of tests that run in parallel?

If JOBS is set as an environment variable, that's the number of parallel jobs that get run. If that variable is not set, then Makefile invokes test.py with -J and that causes the number of parallel jobs to be run to be determined by Python's multiprocessing.cpu_count().

On the AIX failure linked above (https://ci.nodejs.org/job/node-test-commit-aix/nodes=aix61-ppc64/15848/consoleFull), JOBS is explicitly set to 5. I don't know where that value comes from.

@Trott
Copy link
Member

Trott commented Jun 19, 2018

On a related note: what is the rationale behind running benchmarks in parallel?

The general practice is: If a test can be in parallel, then put it in parallel because that will keep the duration of test runs shorter. So, until a test causes problems, it stays in parallel. I'm not saying that's a good approach (although perhaps it is), but that's how we ended up with the current set of test/parallel/test-benchmark-* tests.

@Trott Trott closed this as completed in 8944a4f Jun 19, 2018
@gireeshpunathil
Copy link
Member

JOBS is explicitly set to 5

thanks for the info. This is a minimal value, and should not collide with the resource limitations that would have set upon a user. So I guess we can do nothing about it other than moving to sequential.

If a test can be in parallel, then put it in parallel

Sure, looks like a good stratgey, but the only issue is that such tests won't reveal any benchmark insights, instead only serve as unit tests or probably some stress?

@Trott
Copy link
Member

Trott commented Jun 19, 2018

such tests won't reveal any benchmark insights, instead only serve as unit tests or probably some stress?

Yeah, that's their purpose. The benchmark tests are not used to provide benchmark results. They are minimal tests to make sure that benchmarks included in source code aren't totally broken (which is definitely a thing that has happened). For example, it will catch a breaking change to an API that did not get propagated to a benchmark that uses the API.

targos pushed a commit that referenced this issue Jun 20, 2018
Fixes: #18254

PR-URL: #21393
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Yuta Hiroto <hello@hiroppy.me>
MylesBorins pushed a commit that referenced this issue Dec 22, 2018
Fixes: #18254

PR-URL: #21393
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Yuta Hiroto <hello@hiroppy.me>
MylesBorins pushed a commit that referenced this issue Dec 22, 2018
Fixes: #18254

PR-URL: #21393
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Yuta Hiroto <hello@hiroppy.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arm Issues and PRs related to the ARM platform. benchmark Issues and PRs related to the benchmark subsystem. flaky-test Issues and PRs related to the tests with unstable failures on the CI.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants