Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NGHTTP2_ENHANCE_YOUR_CALM when starting with version 10.16.3 #29223

Closed
jellelute opened this issue Aug 20, 2019 · 22 comments
Closed

NGHTTP2_ENHANCE_YOUR_CALM when starting with version 10.16.3 #29223

jellelute opened this issue Aug 20, 2019 · 22 comments
Labels
http2 Issues or PRs related to the http2 subsystem.

Comments

@jellelute
Copy link

  • Version:
  • Platform:
  • Subsystem:

Version: 10.16.3
Platform: 4.15.0-1044-aws #46-Ubuntu SMP Thu Jul 4 13:38:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Subsystem: http2

I'm reusing a session for performance reasons and opening 10 new requests to a server per second. On node 10.16.2 it runs forever with no problems. On 10.16.3 it starts throwing "Error [ERR_HTTP2_STREAM_ERROR]: Stream closed with error code NGHTTP2_ENHANCE_YOUR_CALM" after a minute or two +- 1000 requests.

Is this a newly introduced bug or how to prevent this?

@addaleax addaleax added the http2 Issues or PRs related to the http2 subsystem. label Aug 20, 2019
@addaleax
Copy link
Member

Can you share any more information? Is the server the program running Node.js, is the client, or are both? What is the value of the maxSessionMemory option on either side (if it is set)? Does running node as node --security-reverts=CVE-2019-9514 help?

(See also #29122 (comment) for some background.)

@jasnell
Copy link
Member

jasnell commented Aug 20, 2019

This is most likely related to the new denial of service attack mitigations. The revert workaround @addaleax suggests should make it work but this is something we're going to need to revisit.

@jellelute
Copy link
Author

jellelute commented Aug 29, 2019

@addaleax I'm not aware of the server site implementation. You can use this url to reproduce:
https://api.binance.com/api/v1/depth?symbol=BTCUSDT. maxSessionMemory was not set explicitly. Running node with --security-reverts=CVE-2019-9514 does not solve the issue.
Probably another commit did cause this issue.

@mikelehen
Copy link

We are experiencing a similar issue in the Google Cloud Firestore SDK. We have 10 or so customers reporting their production apps hitting this error where our http/2 stream is closed with NGHTTP2_ENHANCE_YOUR_CALM.

I can reproduce the issue with node v10.16.3 but not v10.16.2 so it seems to be a regression. Running node with --security-reverts=CVE-2019-9514 does not help.

I've created a repro project here: https://github.com/mikelehen/node-http2-repro

Unfortunately it is not a minimal repro, but it is what I was able to piece together based on our customer reports. Is it enough to debug the issue and isolate a root cause? We would very much appreciate a workaround and/or a fix to node so that we can unblock our customers. Thank you!

@MylesBorins
Copy link
Member

We managed to get a reproducible version of the bug we are experiencing. I can confirm that including the flags --security-reverts=CVE-2019-9514 --security-reverts=CVE-2019-9512 --security-reverts=CVE-2019-9516 --security-reverts=CVE-2019-9518 does not fix the issue.

I was able to bisect and identify this commit as the one causing the failure... @addaleax any thoughts?

/cc @nodejs/http2

@jasnell
Copy link
Member

jasnell commented Oct 2, 2019

Couple of quick tests... can you try setting the maxSessionMemory option significantly higher and seeing if the problem persists? There are a few other knobs there that can be turned also but let's try that one first. The default value for maxSessionMemory is 10, indicating a max of ten megabytes per session, which could be eaten away quickly with a large number of streams.

@addaleax
Copy link
Member

addaleax commented Oct 2, 2019

I was able to bisect and identify this commit as the one causing the failure... @addaleax any thoughts?

@MylesBorins Which commit?

@mikelehen
Copy link

The commit @MylesBorins meant to link is 164ac5b . 😄

@addaleax
Copy link
Member

addaleax commented Oct 2, 2019

Okay, that’s interesting – I’ll take a look. It still would be good to know if turning the maxSessionMemory knob makes a difference here, as @jasnell suggested.

@murgatroid99
Copy link
Contributor

Based on that commit message, specifically "If we are waiting for the ability to send more output, we should not process more input", I think the overall goal of that commit is a bad idea. With that logic implemented on both sides of a connection, if one side ever pushes back on flow control for any reason, the other side will stop reading and start pushing back the other way, which will result in a deadlock where neither side will clear out their read buffer because they're waiting for the other side to do so.

And if just the client implements that logic, it's easy to imagine a scenario where the client doesn't read enough, resulting in the server being unable to write while still receiving more data. It wouldn't surprise me if some servers send ENHANCE_YOUR_CALM when that happens.

@mikelehen
Copy link

@addaleax @jasnell It looks like increasing maxSessionMemory does solve the problem. If I set it to 100, my repro works. If I set it to 1 then it fails much faster.

A couple notes:

  1. Our repro is using http2.connect() not http2.createServer().
  2. FWIW the general usage pattern of our repro is basically: send a request to the backend, wait for the response, send the next request to the backend, wait for the response, etc. It ends up failing after 300+ request/response pairs. Each request / response pair is roughly the same size (requests are < 1k, responses are maybe 1MB). Running on my machine / network, we end up doing around 2-3 requests / second.

@mikelehen
Copy link

FYI- If you need to run my repro in https://github.com/mikelehen/node-http2-repro you'll need access to my project which I'll have to grant you out-of-band. But hopefully the info above may be enough to resolve the issue? Thanks!

@jasnell
Copy link
Member

jasnell commented Oct 2, 2019

@murgatroid99 ... the scenario you describe is entirely possible and worth exploring to see how much of an issue that is in our implementation. I'm thinking, however, that for this specific problem, it's more of a memory management issue. Specifically: we need to get better at it. The current maxSessionMemory limit was really just a finger in the air guesstimate and it's looking like we either need to (a) increase it or (b) find other ways of reducing the memory load per QuicSession.

@mikelehen
Copy link

mikelehen commented Oct 2, 2019

@jasnell Any guesses why 164ac5b would have exacerbated the memory management issue?

FWIW the more I increase maxSessionMemory, the longer my repro runs before failing, even though we are doing fixed-size requests and responses and only have 1 outstanding at a time. So I would expect the necessary sessionMemory to be pretty much flat. So I'm wondering if 164ac5b has introduced a mismatch in IncrementCurrentSessionMemory() / DecrementCurrentSessionMemory() calls.

Like maybe 164ac5b#diff-a2de252da810db89f574ecafa0f8e95aR1901 should instead be:

DecrementCurrentSessionMemory(stream_buf_.len - stream_buf_offset_);

(this is a wild guess. I don't really understand the code; I'm just trying to apply some pattern matching...)

@jasnell
Copy link
Member

jasnell commented Oct 2, 2019

Definitely sounds like a memory accounting issue then... or a straight up memory leak. That gives a good place to start the investigation. First place to check would be to rule out a memory leak, and if that looks good, then we can look at the accounting.

@schmidt-sebastian
Copy link

@jasnell Thanks for all your investigations so far.

I work with @mikelehen on Firestore and was wondering if you could let us know if you had an estimated timeline for a fix? We have a couple of users that are blocked on this. Thank you so much!

@jasnell
Copy link
Member

jasnell commented Oct 9, 2019

With my current workload, it won't be until later this week at the earliest, but definitely haven't forgotten!

@mikelehen
Copy link

In case it helps, I actually tried out my guess above (#29223 (comment)) last week and with that tweak, the error seems to go away. That said, I don't know how to verify that all the accounting is actually coming out correct. It's possible I just messed up the accounting in a different way.

@schmidt-sebastian
Copy link

Thank you for the update @jasnell! Let us know if you need help verifying a fix.

@jellelute
Copy link
Author

Hi, is there any news about this issue? Also could the fix be ported to NodeJS 12, as this version became the LTS version recently. Thanks for your efforts!

@mikelehen
Copy link

Any update here? Would it help if I sent a PR based on my guess in #29223 (comment) ? I don't know the code well enough to know if it's correct or how to definitively validate / test it, though. :-)

mikelehen pushed a commit to mikelehen/node that referenced this issue Nov 27, 2019
The ability to pause input processing was added in 8a4a193 but
introduced a session memory accounting mismatch leading to potential
NGHTTP2_ENHANCE_YOUR_CALM errors.

After pausing
(https://github.com/nodejs/node/blob/f36331c1bfa4c4c202346b05dc3bd672f653e4df/src/node_http2.cc#L871),
the early return on line 873 skips the
DecrementCurrentSessionMemory(stream_buf_.len) call below (line 878).

When we later finished processing the input chunk
(https://github.com/nodejs/node/blob/f36331c1bfa4c4c202346b05dc3bd672f653e4df/src/node_http2.cc#L1858),
we were calling DecrementCurrentSessionMemory(stream_buf_offset_) [line
1875] which was a no-op since we just set stream_buf_offset_ to 0 [line
1873].

The correct amount to decrement by is still stream_buf_.len, since
that's the amount we skipped previously (line 878).

Fixes: nodejs#29223
Refs: nodejs@164ac5b
@mikelehen
Copy link

I'v opened a PR (#30684) that addresses the issue. 🤞 we can get it merged? 😄

addaleax pushed a commit that referenced this issue Nov 30, 2019
The ability to pause input processing was added in 8a4a193 but
introduced a session memory accounting mismatch leading to potential
NGHTTP2_ENHANCE_YOUR_CALM errors.

After pausing
(https://github.com/nodejs/node/blob/f36331c1bfa4c4c202346b05dc3bd672f653e4df/src/node_http2.cc#L871),
the early return on line 873 skips the
DecrementCurrentSessionMemory(stream_buf_.len) call below (line 878).

When we later finished processing the input chunk
(https://github.com/nodejs/node/blob/f36331c1bfa4c4c202346b05dc3bd672f653e4df/src/node_http2.cc#L1858),
we were calling DecrementCurrentSessionMemory(stream_buf_offset_) [line
1875] which was a no-op since we just set stream_buf_offset_ to 0 [line
1873].

The correct amount to decrement by is still stream_buf_.len, since
that's the amount we skipped previously (line 878).

Fixes: #29223
Refs: 164ac5b

PR-URL: #30684
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
targos pushed a commit that referenced this issue Dec 5, 2019
The ability to pause input processing was added in 8a4a193 but
introduced a session memory accounting mismatch leading to potential
NGHTTP2_ENHANCE_YOUR_CALM errors.

After pausing
(https://github.com/nodejs/node/blob/f36331c1bfa4c4c202346b05dc3bd672f653e4df/src/node_http2.cc#L871),
the early return on line 873 skips the
DecrementCurrentSessionMemory(stream_buf_.len) call below (line 878).

When we later finished processing the input chunk
(https://github.com/nodejs/node/blob/f36331c1bfa4c4c202346b05dc3bd672f653e4df/src/node_http2.cc#L1858),
we were calling DecrementCurrentSessionMemory(stream_buf_offset_) [line
1875] which was a no-op since we just set stream_buf_offset_ to 0 [line
1873].

The correct amount to decrement by is still stream_buf_.len, since
that's the amount we skipped previously (line 878).

Fixes: #29223
Refs: 164ac5b

PR-URL: #30684
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
BethGriggs pushed a commit that referenced this issue Dec 6, 2019
The ability to pause input processing was added in 8a4a193 but
introduced a session memory accounting mismatch leading to potential
NGHTTP2_ENHANCE_YOUR_CALM errors.

After pausing
(https://github.com/nodejs/node/blob/f36331c1bfa4c4c202346b05dc3bd672f653e4df/src/node_http2.cc#L871),
the early return on line 873 skips the
DecrementCurrentSessionMemory(stream_buf_.len) call below (line 878).

When we later finished processing the input chunk
(https://github.com/nodejs/node/blob/f36331c1bfa4c4c202346b05dc3bd672f653e4df/src/node_http2.cc#L1858),
we were calling DecrementCurrentSessionMemory(stream_buf_offset_) [line
1875] which was a no-op since we just set stream_buf_offset_ to 0 [line
1873].

The correct amount to decrement by is still stream_buf_.len, since
that's the amount we skipped previously (line 878).

Fixes: #29223
Refs: 164ac5b

PR-URL: #30684
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
MylesBorins pushed a commit that referenced this issue Dec 17, 2019
The ability to pause input processing was added in 8a4a193 but
introduced a session memory accounting mismatch leading to potential
NGHTTP2_ENHANCE_YOUR_CALM errors.

After pausing
(https://github.com/nodejs/node/blob/f36331c1bfa4c4c202346b05dc3bd672f653e4df/src/node_http2.cc#L871),
the early return on line 873 skips the
DecrementCurrentSessionMemory(stream_buf_.len) call below (line 878).

When we later finished processing the input chunk
(https://github.com/nodejs/node/blob/f36331c1bfa4c4c202346b05dc3bd672f653e4df/src/node_http2.cc#L1858),
we were calling DecrementCurrentSessionMemory(stream_buf_offset_) [line
1875] which was a no-op since we just set stream_buf_offset_ to 0 [line
1873].

The correct amount to decrement by is still stream_buf_.len, since
that's the amount we skipped previously (line 878).

Fixes: #29223
Refs: 164ac5b

PR-URL: #30684
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
MylesBorins pushed a commit that referenced this issue Dec 17, 2019
The ability to pause input processing was added in 8a4a193 but
introduced a session memory accounting mismatch leading to potential
NGHTTP2_ENHANCE_YOUR_CALM errors.

After pausing
(https://github.com/nodejs/node/blob/f36331c1bfa4c4c202346b05dc3bd672f653e4df/src/node_http2.cc#L871),
the early return on line 873 skips the
DecrementCurrentSessionMemory(stream_buf_.len) call below (line 878).

When we later finished processing the input chunk
(https://github.com/nodejs/node/blob/f36331c1bfa4c4c202346b05dc3bd672f653e4df/src/node_http2.cc#L1858),
we were calling DecrementCurrentSessionMemory(stream_buf_offset_) [line
1875] which was a no-op since we just set stream_buf_offset_ to 0 [line
1873].

The correct amount to decrement by is still stream_buf_.len, since
that's the amount we skipped previously (line 878).

Fixes: #29223
Refs: 164ac5b

PR-URL: #30684
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http2 Issues or PRs related to the http2 subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants