Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: only destroy socket once error is sent #26356

Closed
wants to merge 1 commit into from

Conversation

yannh
Copy link
Contributor

@yannh yannh commented Feb 28, 2019

This patch fixes the race condition: sometimes the socket is
destroyed before the data is sent. If node is proxied by, for
example, an AWS ELB, a 504 error would be returned instead
of the expected 400.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

This patch fixes a race condition: sometimes the socket is
destroyed before the data is sent. If node is proxied by, for
example, an AWS ELB, a 504 error would be returned instead
of the expected 400.
@nodejs-github-bot nodejs-github-bot added the http Issues or PRs related to the http subsystem. label Feb 28, 2019
Copy link
Member

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be so kind and provide a regression test for this as well?

@lpinca
Copy link
Member

lpinca commented Feb 28, 2019

This popped up in review comments here #24757.

@lpinca
Copy link
Member

lpinca commented Feb 28, 2019

It would be nice to have a regression test but I think it's not easy because the test added in #24757 works and it's not flaky. I mean, testing with no delay does not reproduce the issue.

@yannh
Copy link
Contributor Author

yannh commented Mar 1, 2019

This does not completely fix the 504 errors I am seeing after all 😞 Not sure if much better than without the fix.

@BridgeAR BridgeAR requested a review from mcollina March 7, 2019 09:11
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should land this without a unit test.
Moreover, this is

this.write(response);
this.write(response, () => {
this.destroy(e);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't think there is a guarantee that this callback is actually called.

@dosentmatter
Copy link

dosentmatter commented Mar 7, 2019

Let me know if this comment is too much of a support issue and not a bug issue and I can move it somewhere else. I guess I don't know if this is a bug in node or not and I don't know how to find out if it is. I also can't find anyone else with a similar issue.

Hey @yannh, I think I have a similar or same problem as you and I'm a bit lost on how I could fix it. I feel like I have exhausted all of my options and maybe the bug might be in AWS ELB or node itself.

I have an AWS ELB pointing to a node express application running in kubernetes and I get intermittent 504s. I have simplified my code to a basic hello world server like so:

const express = require('express');
const app = express();
const port = 3000;

app.use((req, res) => {
  const msg = `${req.originalUrl}`;
  console.log(msg);
  res.send(msg + '\n');
});
app.listen(port);

I get my app deployed and I do run this while loop curl that just outputs the HTTP response and highlights any 504s. ggrep is GNU grep.
while :; do curl -i -s -X GET 'https://api.{domain}.com/{path}' | head -1; done | ggrep --color -E '^|504'

I still get 504s.

I've also tried hapi with the following code with no success:

const Hapi = require('hapi');

const server = Hapi.server({
    port: 3000,
    host: '0.0.0.0'
});

server.route({
    method: '*',
    path: '/{blah*}',
    handler: (req, h) => {
      const msg = `${req.method} ${req.url}`;
      console.log(msg);
      return msg + '\n';
    }
});

const init = async () => {
  await server.start();
  console.log(`Server running at: ${server.info.uri}`);
};

process.on('unhandledRejection', (err) => {
  console.log(err);
  process.exit(1);
});

init();

Then I thought that maybe the problem exists in express or hapi libraries, so I tried using the http module directly, but still get 504s:

const http = require('http');
http.createServer(function (req, res) {
  const msg = `${req.method} ${req.url}`;
  console.log(msg);
  res.write(msg + '\n');
  res.end();
}).listen(3000, function(){
  console.log("server start at port 3000");
});

I tried using http-echo-server, a TCP echo server, which uses the net module directly. It doesn't seem to have any issues.

I also tried a python server with python -m http.server 3000 and that seems to run fine too.

Keep in mind, I can't reproduce the 504s consistently, so I can confirm that it doesn't work, but it's hard to say that it works 100%. I just poll it for ~ 1 hour with no 504s.


Another thing I tried is to kubectl port-forward the kubernetes pod or service to my local machine and then make requests to localhost. I didn't get any 504s, so it looks like it's an issue between the ELB and node - node isn't responding with 504s itself.

I did a tcpdump -w httpdebug.pcap -i eth0 port 3000 and noticed that when I get a 504, the HTTP packet is sent to the container but for some reason it doesn't get to node. Notice in the image below, at the top, there are 3 GETs (which correspond to 3 504s), followed by a successful request and HTTP 200. If you look at the payloads of the 3 requests, wireshark will show that they have no matching response.
image

I don't know what else I can try to trace the packet from being received on the conatiner -> being sent to the node application. I don't know if node or the ELB is closing the tcp connection and causing a 504.


I'm not sure what else to try. I think the AWS ALB might have worked. I have to do more testing with it before I switch to it.

EDIT:

I tried the ALB and polled it for more than an hour without any 504s. I'm still not sure if this is an ELB, node, or my own code issue. I think the ALB uses HTTP/2, at least that's what I see in the HTTP response. Not sure if that is relevant.

@mcollina
Copy link
Member

mcollina commented Mar 8, 2019

@dosentmatter you should handle the 'clientError'  case and log the error you are getting. You are getting an error related to too many HTTP headers or some other conditions.

@yannh
Copy link
Contributor Author

yannh commented Mar 8, 2019

@dosentmatter > Yes, this is the same exact issue I've been chasing too (also ELBs, express, Kubernetes). I have been doing the same tests that you are doing. There is one thing you should ensure - is that the Express connection timeout it set to a higher number than the ELB timeout (search for ELB / timeout). Still that hasn't solved it for me.
I am down to this blog post about some form of ELB pre-connect which I want to troubleshoot: https://medium.com/@liquidgecka/a-tale-of-unexpected-elb-behavior-5281db9e5cb4

I will close this ticket as I have seen this happen on regular requests too.

@yannh yannh closed this Mar 8, 2019
@dosentmatter
Copy link

@yannh, I haven't tried changing the timeout for express. I think that would be server.setTimeout(msecs). It says the default is 2min for 6.x. Originally, my ELB timeout was 1800s (accidental), but I changed it back to 60s (less than 2min), with no success.

I read the troubleshooting ELB 504 page

Enable keep-alive settings on your EC2 instances and make sure that the keep-alive timeout is greater than the idle timeout settings of your load balancer.

#13391 isn't exactly the same issue, but I tried npm start -- --keepAliveTimeout 20000, and that didn't work either.

I've also tried the last 3 LTS - 6, 8, and 10.

Hmm, thanks for the article. I'll check it out.

If you don't have anymore time to debug, maybe you can switch to an Application Load Balancer (ALB) and come back to the issue later. It seems to work for me.


Thanks, @mcollina, I'll try out the clientError case and see if I find anything. I don't think I have too many HTTP headers. However, I do currently have one pretty large header, that needs to be shrunk. Perhaps that is the issue. I wonder why it would only break when it's deployed with an ELB though.

I read that the HTTP header size for node was 80KB and is currently 8KB. My large header is smaller than that, ~5KB. The rest of the headers are pretty small - less than 100 bytes.

@yannh, do you have anything out of the ordinary going on with your headers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants