Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaking 'connect' handlers with new timeout support #2438

Closed
mdlavin opened this issue Oct 26, 2016 · 28 comments
Closed

Leaking 'connect' handlers with new timeout support #2438

mdlavin opened this issue Oct 26, 2016 · 28 comments

Comments

@mdlavin
Copy link

mdlavin commented Oct 26, 2016

After adopting 2.76.0 I noticed that the memory on some of our servers using request started to grow when it was not growing before. I took a heap snapshot, and it looked like there were 1000+ 'connect' listeners on a TLS socket, holding on to other memory as a result.

It seems like the 'connect' listener on a socket might not be removed if there is a timeout connecting in the new timeout implementation.

Once I reverted back to 2.75.0, the memory leak was gone.

@mdlavin
Copy link
Author

mdlavin commented Oct 26, 2016

I haven't yet written a test to reproduce it, but I believe the connect listener should be removed somewhere near here:

self.abort()

@simov
Copy link
Member

simov commented Oct 26, 2016

ping @mscdex

@mdlavin
Copy link
Author

mdlavin commented Oct 26, 2016

Thanks for the quick fix. I'll deploy a server that uses that branch and see if the memory usage is stable again. It will be a couple of hours to get good data

@mdlavin
Copy link
Author

mdlavin commented Oct 26, 2016

Your code looked good to me, but when I deployed, I still see leaking 'connect' handlers. I'm sure that I'm running the right version because the leaked functions are named 'onReqSockConnect'.

I don't know why it would matter, but something I didn't mention before is that my sockets are all TLS sockets.

Looking at the code again, perhaps the 'connect' listener is also leaking if an error occurs while connecting. Could

clearTimeout(self.timeoutTimer)
execute and cause the timer to be cleared, and the 'connect' event left in place?

mscdex added a commit to mscdex/request that referenced this issue Oct 26, 2016
@mdlavin
Copy link
Author

mdlavin commented Oct 27, 2016

@mscdex with the latest fix, including the cleanup on error, it seems like the the memory leak is much smaller. I let the patched version run overnight and then I looked at a heapdump this morning. Instead of seeing 500+ leaked 'connect' handlers, I now see 20-50 connect onReqSockConnect handlers per TLS socket.

I don't know the code well, but I would have expected 0-1 handlers. Maybe there is still another edge case that's not cleaning up quite right, but it's much less severe than it was.

@mdlavin
Copy link
Author

mdlavin commented Oct 27, 2016

I'm sorry to add confusion. I think I was testing the wrong patch in my second test. Let me try again to see if it improves.

Scratch that, the correct patch was deployed and the leak remains. After more time during the day, I see see a high number of 'connect' listeners leaked.

@morus12
Copy link

morus12 commented Oct 27, 2016

I've also noticed 'connect' listeners leak after update to 2.76. It's plain socket with http.Agent and keep alive enabled. Disabling keep alive fixed the issue for me.

@robinjmurphy
Copy link

robinjmurphy commented Oct 28, 2016

Have spotted this as well making TLS connections. Getting regular errors with 2.76.0:

possible EventEmitter memory leak detected. 11 listeners added. Use emitter.setMaxListeners() to increase limit.

I don't see the same issue in 2.75.0.

@simov
Copy link
Member

simov commented Oct 28, 2016

/cc @mscdex

@simov
Copy link
Member

simov commented Oct 31, 2016

@mdlavin @morus12 @robinjmurphy can you provide a short code example to reproduce the issue?
Currently my best bet is that the onReqSockConnect should be placed outside of the self.req.on('socket' handler.

@robinjmurphy
Copy link

Hi @simov I'll try and put a little script together to reproduce.

@robinjmurphy
Copy link

robinjmurphy commented Oct 31, 2016

This code triggers a warning after 11 requests.

The warning doesn't appear if I remove the timeout or forever options.

'use strict';

const request = require('request').defaults({
  forever: true,
  timeout: 2000
});

function makeRequest() {
  request('https://raw.githubusercontent.com/request/request/master/README.md', (err, res) => {
    if (err) throw err;

    console.log((new Date()).toISOString(), res.statusCode, res.request.href);

    setTimeout(makeRequest, 100);
  });
}

makeRequest();
2016-10-31T16:24:41.692Z 200 https://raw.githubusercontent.com/request/request/master/README.md
2016-10-31T16:24:41.803Z 200 https://raw.githubusercontent.com/request/request/master/README.md
2016-10-31T16:24:41.911Z 200 https://raw.githubusercontent.com/request/request/master/README.md
2016-10-31T16:24:42.024Z 200 https://raw.githubusercontent.com/request/request/master/README.md
2016-10-31T16:24:42.135Z 200 https://raw.githubusercontent.com/request/request/master/README.md
2016-10-31T16:24:42.247Z 200 https://raw.githubusercontent.com/request/request/master/README.md
2016-10-31T16:24:42.354Z 200 https://raw.githubusercontent.com/request/request/master/README.md
2016-10-31T16:24:42.465Z 200 https://raw.githubusercontent.com/request/request/master/README.md
2016-10-31T16:24:42.576Z 200 https://raw.githubusercontent.com/request/request/master/README.md
2016-10-31T16:24:42.686Z 200 https://raw.githubusercontent.com/request/request/master/README.md
2016-10-31T16:24:42.798Z 200 https://raw.githubusercontent.com/request/request/master/README.md
(node) warning: possible EventEmitter memory leak detected. 11 connect listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at TLSSocket.addListener (events.js:239:17)
    at TLSSocket.Readable.on (_stream_readable.js:680:33)
    at TLSSocket.once (events.js:265:8)
    at ClientRequest.<anonymous> (/Users/murphr08/tmp/request-test/node_modules/request/request.js:770:14)
    at emitOne (events.js:82:20)
    at ClientRequest.emit (events.js:169:7)
    at tickOnSocket (_http_client.js:502:7)
    at onSocketNT (_http_client.js:514:5)
    at nextTickCallbackWith2Args (node.js:442:9)
    at process._tickCallback (node.js:356:17)

Request 2.76.0. Node 4.4.7. Tested on macOS and CentOS

mscdex added a commit to mscdex/request that referenced this issue Oct 31, 2016
@mscdex
Copy link
Contributor

mscdex commented Oct 31, 2016

@robinjmurphy Thanks for the repro. I've updated my PR at #2439

@mdlavin Can you also try my PR again?

mscdex added a commit to mscdex/request that referenced this issue Oct 31, 2016
mscdex added a commit to mscdex/request that referenced this issue Oct 31, 2016
This commit ensures that both a 'connect' timer only starts when a
new socket is being used for a request and that the 'connect' timer
callback is removed in more places.

Fixes: request#2438
mscdex added a commit to mscdex/request that referenced this issue Oct 31, 2016
This commit ensures that both a 'connect' timer only starts when a
new socket is being used for a request and that the 'connect' timer
is stopped in more places.

Fixes: request#2438
mscdex added a commit to mscdex/request that referenced this issue Oct 31, 2016
This commit ensures that both a 'connect' timer only starts when a
new socket is being used for a request and that the 'connect' timer
is stopped in more places.

Fixes: request#2438
@cgrayson
Copy link

I'm not sure if the problem discussed in this issue is the same as what we found, but we had some unit tests that started failing with 2.76.0 (they still work with 2.75.0). We're simulating a series of several requests using nock, using .delayConnection() to force a timeout after the 2nd. But with request 2.76.0, the timeout never occurs.

I've tried the set of tests with the latest commit on this PR (001eae3) and see the same broken behavior.

@mscdex
Copy link
Contributor

mscdex commented Oct 31, 2016

@cgrayson Can you provide a simple example that exhibits the incorrect behavior?

@robinjmurphy
Copy link

@cgrayson we saw the same with Nock using delayConnection

@cgrayson
Copy link

cgrayson commented Nov 1, 2016

Can you provide a simple example that exhibits the incorrect behavior?

Assuming this is in repro.js, I ran it with node repro.js:

var nock = require('nock');
var request = require('request');

var runTest = function() {
  nock('http://foobar.com').post('/baz').delayConnection(2000).reply(200, '{"answer": 42}', {});
  var timeoutMs = 1000;

  request({uri: 'http://foobar.com/baz', method: 'POST', timeout: timeoutMs}, function(err, response, body) {
    if (err)
      console.log("Yay, the request timed out (as it should have): ", err);
    else
      console.log("Oops, there was no `err`; the request should have timed out");
  });
};

runTest();

When that uses 2.76.0, the timeout does not occur, with 2.75.0, it does:
Yay, the request timed out (as it should have): { [Error: ETIMEDOUT] code: 'ETIMEDOUT', connect: false }

@mscdex
Copy link
Contributor

mscdex commented Nov 2, 2016

Ok, the problem here is that nock's fake Socket object is deficient. IMHO it should at least support one or both of the _connecting/connecting properties on the socket object and update them accordingly.

EDIT: I've submitted an issue about this here.

@mdlavin
Copy link
Author

mdlavin commented Nov 2, 2016

@mscdex I missed your comment about re-attempting with the new patch. I'm deploying the change now and will watch it today.

@mdlavin
Copy link
Author

mdlavin commented Nov 2, 2016

@mscdex the new version looks good to me. I deployed it this morning and it ran for a couple hours under user load without any leaking 'connect' handlers.

@ZJONSSON
Copy link
Contributor

ZJONSSON commented Nov 3, 2016

Any ETA on patch version on npm?

@mscdex
Copy link
Contributor

mscdex commented Nov 3, 2016

@ZJONSSON If @robinjmurphy can confirm the latest changes to the PR are working them too, then I think it should good to merge, there will still be a problem with nock though until they have a fix.

@robinjmurphy
Copy link

Will get this tested today for you @mscdex

@robinjmurphy
Copy link

Re-ran my own test and it's now working against your fix-timeout-regression branch @mscdex

@mscdex
Copy link
Contributor

mscdex commented Nov 3, 2016

@robinjmurphy Awesome, thanks for the feedback!

@simov
Copy link
Member

simov commented Nov 3, 2016

The patch is published. Good job everyone!

ZJONSSON added a commit to ZJONSSON/pquest that referenced this issue Nov 3, 2016
ZJONSSON added a commit to ZJONSSON/impersonate that referenced this issue Nov 3, 2016
@kazaff
Copy link

kazaff commented Dec 29, 2016

@morus12 hi, can you show the code for disable the keep alive? thankyou

@morus12
Copy link

morus12 commented Jan 3, 2017

@kazaff I'm using nodejs 4.x - it's off by default.
https://nodejs.org/dist/latest-v4.x/docs/api/http.html#http_class_http_agent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants