Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate flaky test-http2-large-writes-session-memory-leak on Windows CI #31089

Closed
Trott opened this issue Dec 25, 2019 · 15 comments
Closed
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. http2 Issues or PRs related to the http2 subsystem. windows Issues and PRs related to the Windows platform.

Comments

@Trott
Copy link
Member

Trott commented Dec 25, 2019

https://ci.nodejs.org/job/node-test-binary-windows-js-suites/634/RUN_SUBSET=3,nodes=win2012r2-COMPILED_BY-vs2019-x86/console

test-rackspace-win2012r2_vs2019-x64-2

00:24:35 not ok 295 parallel/test-http2-large-writes-session-memory-leak
00:24:35   ---
00:24:35   duration_ms: 0.266
00:24:35   severity: fail
00:24:35   exitcode: 1
00:24:35   stack: |-
00:24:35     events.js:297
00:24:35           throw er; // Unhandled 'error' event
00:24:35           ^
00:24:35     
00:24:35     Error [ERR_HTTP2_STREAM_ERROR]: Stream closed with error code NGHTTP2_ENHANCE_YOUR_CALM
00:24:35         at ClientHttp2Stream._destroy (internal/http2/core.js:2117:13)
00:24:35         at ClientHttp2Stream.destroy (internal/streams/destroy.js:39:8)
00:24:35         at ClientHttp2Stream.[kMaybeDestroy] (internal/http2/core.js:2133:12)
00:24:35         at Http2Stream.onStreamClose (internal/http2/core.js:500:26)
00:24:35     Emitted 'error' event on ClientHttp2Stream instance at:
00:24:35         at emitErrorNT (internal/streams/destroy.js:94:8)
00:24:35         at emitErrorCloseNT (internal/streams/destroy.js:66:3)
00:24:35         at processTicksAndRejections (internal/process/task_queues.js:84:21) {
00:24:35       code: 'ERR_HTTP2_STREAM_ERROR'
00:24:35     }
00:24:35   ...

@nodejs/http2 @nodejs/platform-windows

@Trott Trott added flaky-test Issues and PRs related to the tests with unstable failures on the CI. http2 Issues or PRs related to the http2 subsystem. windows Issues and PRs related to the Windows platform. labels Dec 25, 2019
@richardlau
Copy link
Member

https://ci.nodejs.org/job/node-test-binary-windows-js-suites/1088/RUN_SUBSET=1,nodes=win2012r2-COMPILED_BY-vs2019-x86/console

test-rackspace-win2012r2_vs2017-x64-3

21:55:20 not ok 298 parallel/test-http2-large-writes-session-memory-leak
21:55:20   ---
21:55:20   duration_ms: 0.401
21:55:20   severity: fail
21:55:20   exitcode: 1
21:55:20   stack: |-
21:55:20     events.js:298
21:55:20           throw er; // Unhandled 'error' event
21:55:20           ^
21:55:20     
21:55:20     Error [ERR_HTTP2_STREAM_ERROR]: Stream closed with error code NGHTTP2_ENHANCE_YOUR_CALM
21:55:20         at ClientHttp2Stream._destroy (internal/http2/core.js:2160:13)
21:55:20         at ClientHttp2Stream.destroy (internal/streams/destroy.js:41:8)
21:55:20         at Http2Stream.onStreamClose (internal/http2/core.js:506:12)
21:55:20     Emitted 'error' event on ClientHttp2Stream instance at:
21:55:20         at emitErrorNT (internal/streams/destroy.js:96:8)
21:55:20         at emitErrorCloseNT (internal/streams/destroy.js:68:3)
21:55:20         at processTicksAndRejections (internal/process/task_queues.js:84:21) {
21:55:20       code: 'ERR_HTTP2_STREAM_ERROR'
21:55:20     }
21:55:20   ...

@lundibundi
Copy link
Member

Didn't fail on my local Windows vs2017 machine with -j 32 --repeat 1000, trying out CI
https://ci.nodejs.org/job/node-stress-single-test/38/

@lundibundi
Copy link
Member

Well, stress CI is green, perhaps the machine should be under heavy load (from other tests) for this to actually fail. Anyway, this doesn't seem to fail often (at all anymore?) so closing this for now.

@bzoz bzoz reopened this Jun 1, 2020
@jasnell jasnell added this to Known flakes in Flakees in CI via automation Jun 25, 2020
@Trott
Copy link
Member Author

Trott commented Jul 7, 2020

https://ci.nodejs.org/job/node-test-binary-windows-js-suites/4745/RUN_SUBSET=2,nodes=win2012r2-COMPILED_BY-vs2019-x86/console

test-rackspace-win2012r2_vs2017-x64-4

06:21:27 not ok 320 parallel/test-http2-large-writes-session-memory-leak
06:21:27   ---
06:21:27   duration_ms: 0.257
06:21:27   severity: fail
06:21:27   exitcode: 1
06:21:27   stack: |-
06:21:27     events.js:291
06:21:27           throw er; // Unhandled 'error' event
06:21:27           ^
06:21:27     
06:21:27     Error [ERR_HTTP2_STREAM_ERROR]: Stream closed with error code NGHTTP2_ENHANCE_YOUR_CALM
06:21:27         at new NodeError (internal/errors.js:254:15)
06:21:27         at ClientHttp2Stream._destroy (internal/http2/core.js:2151:13)
06:21:27         at _destroy (internal/streams/destroy.js:62:8)
06:21:27         at ClientHttp2Stream.destroy (internal/streams/destroy.js:55:5)
06:21:27         at Http2Stream.onStreamClose (internal/http2/core.js:515:12)
06:21:27     Emitted 'error' event on ClientHttp2Stream instance at:
06:21:27         at emitErrorNT (internal/streams/destroy.js:133:8)
06:21:27         at emitErrorCloseNT (internal/streams/destroy.js:98:3)
06:21:27         at processTicksAndRejections (internal/process/task_queues.js:80:21) {
06:21:27       code: 'ERR_HTTP2_STREAM_ERROR'
06:21:27     }
06:21:27   ...

@Trott
Copy link
Member Author

Trott commented Jul 7, 2020

Should the test gracefully handle NGHTTP2_ENHANCE_YOUR_CALM? @nodejs/http2

@jasnell
Copy link
Member

jasnell commented Jul 7, 2020

Given what the test is doing, this is most likely hitting up against the max session memory limit. If the test has not accounted for that already, perhaps try increasing that limit significantly and seeing if the failure continues

@Trott
Copy link
Member Author

Trott commented Jul 7, 2020

Hmmm, this comment in the test suggests that the failures may indicate a memory leak.

// Regression test for https://github.com/nodejs/node/issues/29223.
// There was a "leak" in the accounting of session memory leading
// to streams eventually failing with NGHTTP2_ENHANCE_YOUR_CALM.

@Trott
Copy link
Member Author

Trott commented Jul 8, 2020

https://ci.nodejs.org/job/node-test-binary-windows-js-suites/4749/RUN_SUBSET=2,nodes=win2012r2-COMPILED_BY-vs2019-x86/console failed too. test-rackspace-win2012r2_vs2017-x64-4

As with the other one I posted earlier and all the ones before that, it's a rackspace host. I haven't seen this fail on an azure host yet. If I recall correctly, the rackspace hosts are configured with different memory and CPU than the azure hosts.

@Trott
Copy link
Member Author

Trott commented Jul 8, 2020

I have been unable to duplicate this in the node-stress-single-test, running things in parallel, etc. This is frustrating....

@Trott
Copy link
Member Author

Trott commented Jul 8, 2020

Given what the test is doing, this is most likely hitting up against the max session memory limit. If the test has not accounted for that already, perhaps try increasing that limit significantly and seeing if the failure continues

The test sets maxSessionMemory to 1 Mb:

    // Set maxSessionMemory to 1MB so the leak causes errors faster.

Trott added a commit to Trott/io.js that referenced this issue Jul 8, 2020
Add some debugging output for flaky http2 test to try to figure out
what's going on the next time it fails in CI.

Refs: nodejs#31089
@Trott
Copy link
Member Author

Trott commented Jul 11, 2020

I've figured out how to replicate this in CI. I've applied 5a8e165 which makes changes to vcbuild.bat such that it only runs this test a bunch of times, and then I run node-test-commit-windows-fanned jobs against the branch with that patch applied.

Here's the master branch with that patch applied showing many failures: https://ci.nodejs.org/job/node-test-commit-windows-fanned/37176/

Will apply to the first branch that introduced this test and see if it was failing from the beginning. If not, I'll bisect to find the commit that introduced the failures.

@Trott
Copy link
Member Author

Trott commented Jul 11, 2020

Argh, I can't get 18a1796 to compile in CI so I can't test it. Here's the result: https://ci.nodejs.org/view/All/job/node-compile-windows/34942/ Any chance someone knowledgable from @nodejs/build and/or @nodejs/platform-windows can take a look and maybe make a suggestion for getting it to compile? I guess something changed with how we build in CI at some point?

I suspect the issue is this, but I'm also pretty ignorant about building on Windows:

23:57:12 > msbuild "C:\workspace\node-compile-windows\node\tools\msvs\msi\nodemsi.sln" /m /t:Clean,Build /p:WindowsTargetPlatformVersion=10.0.18362.0 /p:PlatformToolset=v142 /p:GypMsvsVersion=2019 /p:Configuration=Release /p:Platform=x64 /p:NodeVersion=14.0.0 /p:FullVersion=14.0.0 /p:DistTypeDir=release  /clp:NoSummary;NoItemAndPropertyList;Verbosity=minimal /nologo 
23:57:13   custom_actions.cc
23:57:14 C:\workspace\node-compile-windows\node\tools\msvs\msi\custom_actions.cc(5,10): fatal error C1083: Cannot open include file: 'wcautil.h': No such file or directory [C:\workspace\node-compile-windows\node\tools\msvs\msi\custom_actions.vcxproj]

@Trott
Copy link
Member Author

Trott commented Jul 11, 2020

Argh, I can't get 18a1796 to compile in CI so I can't test it. Here's the result: https://ci.nodejs.org/view/All/job/node-compile-windows/34942/ Any chance someone knowledgable from @nodejs/build and/or @nodejs/platform-windows can take a look and maybe make a suggestion for getting it to compile? I guess something changed with how we build in CI at some point?

Turns out I don't need to compile that commit because a later commit (415bba) built just fine and stress test was good, so I'm going to bisect and find the bad commit now....

@Trott
Copy link
Member Author

Trott commented Jul 11, 2020

Bisected! The problematic commit is 51ccf1b.

Trott added a commit to Trott/io.js that referenced this issue Jul 11, 2020
@Trott Trott closed this as completed in e1b336f Jul 14, 2020
Flakees in CI automation moved this from Known flakes to Closed Jul 14, 2020
MylesBorins pushed a commit that referenced this issue Jul 14, 2020
This reverts commit 51ccf1b.

Fixes: #31089

PR-URL: #34315
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 16, 2020
This reverts commit 51ccf1b.

Fixes: #31089

PR-URL: #34315
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
cjihrig pushed a commit that referenced this issue Jul 23, 2020
This reverts commit 51ccf1b.

Fixes: #31089

PR-URL: #34315
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
clshortfuse pushed a commit to clshortfuse/node that referenced this issue Oct 7, 2020
This reverts commit 51ccf1b.

Fixes: nodejs#31089

PR-URL: nodejs#34315
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
MylesBorins pushed a commit that referenced this issue Oct 13, 2020
This reverts commit 51ccf1b.

Fixes: #31089

Backport-PR-URL: #34845
PR-URL: #34315
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
MylesBorins pushed a commit that referenced this issue Nov 16, 2020
This reverts commit 51ccf1b.

Fixes: #31089

Backport-PR-URL: #34845
PR-URL: #34315
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. http2 Issues or PRs related to the http2 subsystem. windows Issues and PRs related to the Windows platform.
Projects
Flakees in CI
  
Closed
Development

Successfully merging a pull request may close this issue.

5 participants