Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce likelihood of race conditions on keep-alive timeout calculatio… #52653

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zanettea
Copy link

Added 1 seconds threshold in keepalive timeout client-side
Added 1 second threshold in keepalive timeout server-side (expire the socket timeout 1 sec after the announced timeout)

Probably better to use a configurable threshold like in undici keepAliveTimeoutThreshold (nodejs/undici#291)

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/http
  • @nodejs/net

@nodejs-github-bot nodejs-github-bot added http Issues or PRs related to the http subsystem. needs-ci PRs that need a full CI run. labels Apr 23, 2024
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 23, 2024
@mcollina
Copy link
Member

ping @mweberxyz

@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 23, 2024
@nodejs-github-bot
Copy link
Collaborator

// Let the timer expires before the announced timeout to reduce
// the likelihood of ECONNRESET errors
let serverHintTimeout = ( NumberParseInt(hint) * 1000 ) - 1000;
serverHintTimeout = serverHintTimeout > 0 ? serverHintTimeout : 0;

if (serverHintTimeout < agentTimeout) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conditional needs to be fixed - agentTimeout defaults to 0, so the serverHintTimeout is never being set.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the socket shouldn't be reused at all. Is setting socket.setTimeout(0) the right way to do it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at socket docs - socket.setTimeout(0) doesn't mean immediate, it means never.

Copy link
Author

@zanettea zanettea Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have limited understanding of the network internals. In undici ( https://github.com/nodejs/undici/pull/291/files ) if the keepAliveTimeout goes down to 0 they flag the connection as reset:

if (!keepAliveTimeout || keepAliveTimeout < 1e3) {
          client[kReset] = true
        } 

I don't know how to achieve the same result here. Maybe:

socket.setKeepAlive(false);

?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have a limited understanding of network internals, but I think:

if (!agentTimeout || serverHintTimeout < agentTimeout) {

is what you want.

socket.setTimeout(server.keepAliveTimeout);
// Increase the internal timeout wrt the advertised value to reduce likeliwood of ECONNRESET errors
// due to race conditions between the client and server timeout calculation
socket.setTimeout(server.keepAliveTimeout + 1000);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this server change is necessary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be the cause of all the test failures.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix protects clients that are using the hint timeout as is (current node impl) without adjusting it for network (or cpu load) delays

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If server.keepAliveTimeout is set to 0 (never time out) this will change it to 1 second.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If server.keepAliveTimeout == 0 then the if condition line1012 is false. The timeout on the socket is set only if server.keepAliveTimeout != 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, sorry I missed that.

const serverHintTimeout = NumberParseInt(hint) * 1000;
// Let the timer expires before the announced timeout to reduce
// the likelihood of ECONNRESET errors
let serverHintTimeout = ( NumberParseInt(hint) * 1000 ) - 1000;
Copy link

@mweberxyz mweberxyz Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case the server responds with a 1 second keepalive, this will set the disable the timeout because serverHintTimeout will be 0 -- maybe go with - 500 in place of - 1000 ?

serverHintTimeout = serverHintTimeout > 0 ? serverHintTimeout : 0;
if (serverHintTimeout === 0) {
// cannot safely reuse the socket because the server timeout is too short
canKeepSocketAlive = false;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR to just skip keep alive if the timeout is too short. In my local build the test now pass.

lib/_http_agent.js Outdated Show resolved Hide resolved
@meyfa

This comment has been minimized.

@zanettea zanettea changed the title Reduce likeliwood of race conditions on keep-alive timeout calculatio… Reduce likelihood of race conditions on keep-alive timeout calculatio… Apr 24, 2024
@zanettea zanettea force-pushed the main branch 2 times, most recently from 8407fd7 to bc34680 Compare April 24, 2024 10:12
@mweberxyz
Copy link

Found the same issue in dotnet -- they went with a 1 second offset as well, so that was a good choice. 👍

@zanettea zanettea force-pushed the main branch 3 times, most recently from ba0d895 to 2124b55 Compare April 24, 2024 15:30
Copy link

@mweberxyz mweberxyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! I verified that the test cases in #52649 pass without error.

I don't have Approve permission - lgtm @mcollina

Copy link
Contributor

@ShogunPanda ShogunPanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

reduce likelihood of race conditions on keep-alive timeout
calculation between http1.1 servers and clients and honor server
keep-alive timeout when agentTimeout is not set

Fixes: nodejs#47130
Fixes: nodejs#52649
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants