Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing HTTP status codes for cancelled search requests #73849

Closed
jporter-dev opened this issue Jul 30, 2020 · 24 comments
Closed

Missing HTTP status codes for cancelled search requests #73849

jporter-dev opened this issue Jul 30, 2020 · 24 comments
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Search Querying infrastructure in Kibana regression

Comments

@jporter-dev
Copy link

jporter-dev commented Jul 30, 2020

Kibana version: 7.8.1

Elasticsearch version: 7.8.1

Server OS version: Ubuntu 16.04.6 LTS

Browser version: Firefox 79.0b9

Browser OS version: macOS Catalina 10.15.6

Original install method (e.g. download page, yum, from source, etc.): apt

Describe the bug: Some responses from /internal/search/es respond with empty status code, header, and body. Presumably from cancelled queries.

Steps to reproduce:

  1. Load up a dashboard and query/filter
  2. Remove filters
  3. Re-add filters
  4. Observe network tab with blank status codes, zero headers, and no response body.

Expected behavior:
Network log should contain proper queries to the ES API with corresponding responses. Cancelled queries should respond with a proper HTTP status code (204 seems appropriate).

Screenshots (if relevant):
kibana-statuscode-issue

Errors in browser console (if relevant):
Failed to load resource: the server responded with a status of 502 (Proxy Error) - this only occurs when Kibana is reverse proxied.

Any additional context:
This is primarily an issue when reverse proxying Kibana. Apache fusses when it receives an upstream response that doesn't have a valid header or is missing the response code.

A temporary remediation for this with Apache is setting ProxyBadHeader Ignore in the server config. Obviously this is not the ideal solution, since the cancelled inflight requests should return a valid HTTP status code despite responding with no content.

Edit: The ProxyBadHeader fix doesn't fully resolve the problem. Not 100% why but I speculate that the upstream server is both sending empty responses and terminating requests early.

@bhavyarm bhavyarm added Feature:Security/Authentication Platform Security - Authentication Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! labels Jul 30, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-security (Team:Security)

@bhavyarm bhavyarm added the bug Fixes for quality problems that affect the customer experience label Jul 30, 2020
@legrego legrego added Team:AppArch and removed Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! labels Jul 30, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-arch (Team:AppArch)

@legrego legrego removed the Feature:Security/Authentication Platform Security - Authentication label Jul 30, 2020
@kibanamachine kibanamachine added this to To triage in kibana-app-arch Jul 30, 2020
@lukeelmers lukeelmers added Feature:Search Querying infrastructure in Kibana regression labels Jul 30, 2020
@BBQigniter
Copy link

we see a similar problem - i tried the ProxyBadHeader Ignore suggestion but it didn't help. I also tried to set in the apache's ProxyPass line the addtional parameters retry=1 acquire=3000 timeout=600 KeepAlive=On

It happens for example every few clicks on the "view single document"-button at a message. sometimes it even says that there is no document with that id - a refresh then works. sometimes we also get following error-message "proxy error" with following log-output

_construct@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:466674
Wrapper@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:466064
_createSuper/<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:464852
HttpFetchError@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:467831
_callee3$@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1292397
l@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:969217
s/o._invoke</<@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:968971
_/</e[t]@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:969574
asyncGeneratorStep@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1285920
_next@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286249
promise callback*asyncGeneratorStep@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286038
_next@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286249
promise callback*asyncGeneratorStep@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286038
_next@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286249
_asyncToGenerator/</<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286386
_asyncToGenerator/<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286143
fetchResponse@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1292730
_callee$@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1288895
l@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:969217
s/o._invoke</<@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:968971
_/</e[t]@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:969574
asyncGeneratorStep@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1285920
_next@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286249
promise callback*asyncGeneratorStep@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286038
_next@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286249
_asyncToGenerator/</<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286386
_asyncToGenerator/<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286143
_callee2$/</<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1289593
_callee2$@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1288382
l@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:969217
s/o._invoke</<@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:968971
_/</e[t]@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:969574
asyncGeneratorStep@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1285920
_next@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286249
_asyncToGenerator/</<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286386
_asyncToGenerator/<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1286143
Fetch/</<@https://test-kibana.REMOVED/32141/bundles/commons.bundle.js:3:1289716
search@https://test-kibana.REMOVED/32141/bundles/plugin/data/data.plugin.js:6:47756
search/</<@https://test-kibana.REMOVED/32141/bundles/plugin/dataEnhanced/dataEnhanced.plugin.js:1:145473
__kbnSharedDeps__</d</t.prototype._tryNext@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:391171
__kbnSharedDeps__</d</t.prototype._next@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:391073
__kbnSharedDeps__</u</t.prototype.next@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:21:41156
l@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:1151083
__kbnSharedDeps__</o</t.prototype._execute@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:409061
__kbnSharedDeps__</o</t.prototype.execute@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:408877
__kbnSharedDeps__</o</t.prototype.flush@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:405111
setInterval handler*__kbnSharedDeps__</o</t.prototype.requestAsyncId@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:408580
__kbnSharedDeps__</o</t.prototype.schedule@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:408478
__kbnSharedDeps__</r</e.prototype.schedule@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:920108
__kbnSharedDeps__</o</t.prototype.schedule@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:404983
s/<@https://test-kibana.REMOVED/32141/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:271:1150975

the apache access and error logs look like

# access_log
10.115.50.90 - my.user [18/Aug/2020:10:21:31 +0200] "POST /api/index_management/indices/reload HTTP/1.1" 502 341 "https://test-kibana.REMOVED/app/kibana" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0" 689 536

# error_log
[Tue Aug 18 10:21:31.566807 2020] [proxy_http:error] [pid 26954:tid 139893653104384] (104)Connection reset by peer: [client 10.115.50.90:0] AH01102: error reading status line from remote server REPLACED.NAME:5601, referer https://test-kibana.REMOVED/app/kibana
[Tue Aug 18 10:21:31.566936 2020] [proxy:error] [pid 26954:tid 139893653104384] [client 10.115.50.90:0] AH00898: Error reading from remote server returned by /api/index_management/indices/reload, referer https://test-kibana.REMOVED/app/kibana

It also happens connecting directly to Kibana without using the in-between apache reverse-proxy

Happens with latest Firefox 79.0 and Chrome 84.0.4147.125 on windows 10

With ELK 6.8 we did not have those problems.

@jporter-dev
Copy link
Author

@BBQigniter - I have been seeing the errors even after adding the ProxyBadHeader setting. I think there are two things happening here - some responses are blank with the missing header (ProxyBadHeader should resolve those), and some responses are being terminated prematurely by Kibana (connection reset by peer) - I think these are giving the remaining errors.

The errors are less common when setting ProxyBadHeader, but it could also be a placebo. This is a pretty glaring issue since errors popping up while using Kibana are not at all ideal.

@BBQigniter
Copy link

yes, those errors are super annoying and thankfully we only have those currently on our test-cluster where I'm testing the upgrade procedure.

Just upgraded whole cluster to version 7.9 - same errors :(

@lizozom
Copy link
Contributor

lizozom commented Aug 23, 2020

@joshporter1 In terms of the search feature, unnecessary requests (for example when running a new search before the previous one had finished) get canceled, from the browser, using an AbortController.

It is typical for a request canceled in flight, to show an empty response body and no status code (status code 0).

Here is a command line example fetching and cancelling a request to some random online API in Chrome.
You can see that the body is empty and the request is marked ask canceled:

image

I was surprised however to find out that Firefox doesn't mark the request as canceled, but it's still the same scenario:

image

So anyway, the empty responses are part of a normal workflow. I wonder how we can work around this in a reverse proxy scenario however.

@jporter-dev
Copy link
Author

@lizozom I kind of figured that's what was happening, the fact that Firefox wasn't showing it as such is a tad confusing. However, Kibana still shows errors in other browsers such as Chrome and Brave since the reverse proxy server returns 502's on some of the aborted requests.

I'm hoping to find a good solution to solve this at the reverse proxy layer but I've combed through the docs a few times and haven't found anything useful.

@lizozom
Copy link
Contributor

lizozom commented Aug 24, 2020

Can you share some of the resources you've been looking into? I'm curious to try and do this setup, to see if I can reproduce the issue.

@jporter-dev
Copy link
Author

Sure thing, I've mostly been digging through the mod_proxy docs at the options available for the Apache reverse proxy.

The ProxyBadHeader option is the only one that I found that had any impact, setting it to Ignore resolved the invalid header error message, but we are still left with the connection reset by peer message.

@BBQigniter
Copy link

Like I wrote in my first comment, I saw "proxy errors" and or "no document with that id" also while pointing my browser directly to Kibana without going through the reverse proxy. It only happens not that often and is hardly reproducable.

The next thing I sometimes experience is, that clicking on links etc. sometimes does not work correctly like in this gif.

This also happens with and without reverse proxy usage. If this happens I have to refresh the browser tab in Chrome or Firefox. Kibana 7.9

@lizozom
Copy link
Contributor

lizozom commented Aug 25, 2020

@BBQigniter can you please explain what the gif should be showing?

@BBQigniter
Copy link

BBQigniter commented Aug 25, 2020

@lizozom sure. sry, if it's not clear

  1. I opened Firefox and browsed to our test-kibana
  2. I added the _id column to show only that in "Discover"
  3. I expanded one of the events
  4. I clicked on "View single document" and expect to show only that info
  5. It showed me again "Discover" and also reset the the chosen column
  6. After refreshing the page - everything works again as expected.

@BBQigniter
Copy link

BBQigniter commented Aug 25, 2020

I now found this wonderful blog https://newweaver.com/the-mystery-of-the-502-kibana-errors/

This guy describes EXACTLY the same problem!

/usr/share/kibana/src/core/server/http/http_tools.js function createServer starting at line 109

function createServer(serverOptions, listenerOptions) {
  const server = new _hapi.Server(serverOptions);
  server.listener.keepAliveTimeout = listenerOptions.keepaliveTimeout;
/**
 * fix proposed from https://newweaver.com/the-mystery-of-the-502-kibana-errors/
 */
  server.listener.headersTimeout = listenerOptions.keepaliveTimeout + 1000;
  server.listener.setTimeout(listenerOptions.socketTimeout);
  server.listener.on('timeout', socket => {
    socket.destroy();
  });
  server.listener.on('clientError', (err, socket) => {
    if (socket.writable) {
      socket.end(Buffer.from('HTTP/1.1 400 Bad Request\r\n\r\n', 'ascii'));
    } else {
      socket.destroy(err);
    }
  });
  return server;
}

and restarted Kibana. I now have that "fix" for a few minutes and didn't get any proxy errors for some time. Hopefully that's it.

@jporter-dev
Copy link
Author

@BBQigniter thanks for that link - please keep me updated on if it does the trick!

@BBQigniter
Copy link

@joshporter1 it seems the fix really helps, no "proxy error" issues yet while clicking around. Only issues like in the gif and later comment described sometimes still happen.

in the linked blog the guy also mentions #44062 so if elastic updates the underlying nodejs to >v12.17.0 it should be fixed without the need to add the line to the function

@lizozom
Copy link
Contributor

lizozom commented Aug 26, 2020

We constantly update nodejs versions, so I think its safe to assume this will be resolved soon 🙏

@lizozom
Copy link
Contributor

lizozom commented Oct 5, 2020

@joshporter1 any updates on this? Can this be closed?

@jporter-dev
Copy link
Author

@lizozom I applied the workaround "fix" but have not updated or removed the code to verify if it was solved in a future version. So as far as I can tell, it shouldn't be closed unless it was truly resolved without requiring users to alter the core code.

If the underlying nodejs version was upgraded then the issue should probably be re-evaluated.

@kobelb
Copy link
Contributor

kobelb commented Oct 13, 2020

As far as I can tell, this bug is being caused by nodejs/node#27363. Once Kibana is upgraded to Node > 12.9.0, this will be resolved. There are also potential workarounds that are discussed later.

Node.js has three timeouts that apply to HTTP requests:

  • Idle socket timeout: how long the TCP socket itself can be idle, this is a sliding timeout.
  • Headers timeout: how long the client has to send their headers, this is a fixed timeout that is supposed to be restarted for every HTTP request.
  • Keepalive timeout: how long after a HTTP request has finished that the client has to begin sending TCP keepalives before the socket is destroyed, this is a sliding timeout.

There's a bug in Node.js < 12.9.0 where when multiple HTTP requests occur over the same TCP socket when keepalives are used, that the headers timeout is not reset correctly. This is what the headers timeout is supposed to control:

Screen Shot 2020-10-13 at 11 10 16 AM

But with the bug, this is how it's used:

Screen Shot 2020-10-13 at 11 09 31 AM

I've been able to replicate this behavior against a test Node.js HTTP server, and Kibana itself using the default timeouts and a basic test client, which is included below. This effectively means that Kibana is pretty much unusable currently behind a reverse-proxy that reuses TCP sockets.

const http = require('http');

const agent = new http.Agent({
    keepAlive: true,
    keepAliveMsecs: 30000
});

let n = 0;

const request = () => {
    const req = http.request({
        protocol: 'http:',
        host: 'localhost',
        port: 5601,
        path: '/',
        method: 'GET',
        agent,
    }, function (res) {
        let data = '';
        res.on('data', (chunk) => {
            data += chunk;{}
        });
        res.on('end', () => {
            console.log(`Done: [${res.statusCode}] - ${data}`);
            if (++n == 1) {
                setTimeout(request, 58000);
            }
        })
    });

    req.on('socket', (socket) => {
        socket.write("GET / HTTP/1.1\r\n");
        setTimeout(() => {
            socket.write(`user-agent: ${Math.random() * 1000}\r\n`);
        }, 1000);
        setTimeout(() => {
            socket.write(`user-agent: ${Math.random() * 1000}\r\n`);
        }, 2000);
        setTimeout(() => {
            socket.write('\r\n')
            req.end();
        }, 3000);
    });
    
    req.on('error', (error) => {
        console.error('error', error);
    });
};

request();

Potential workarounds

I don't really like any of these, but I'm interested in hearing your opinions @restrry.

Allow the end-user to configure the headers timeout

This would allow the end-user to work-around this bug when Kibana is behind a reverse-proxy, and set it to a value higher than the server.keepaliveTimeout. This is the work-around suggested in this issue, and in the blog-post. This leads to a nondeterministic behavior, where the first HTTP request which occurs over a TCP socket has a longer period of time to send their headers than subsequent HTTP requests. This is rather awkward because once we upgrade to Node 12.9.0, this behavior will change and headers timeouts will be used consistently and appropriately everywhere, potentially causing the value that was set by the users in the kibana.yml to need to be adjusted.

Automatically configure the headers timeout to be larger than the keepalive timeout

This would still have the nondeterministic behavior as above, but we at least don't end up violating end-users expectations when we do get to upgrade to Node 12.9.0. This will potentially lead to the headers timeout being set to a much larger value than it would be by default, minimizing the protection that it provides against the [slowloris attack](The headers timeout was originally added to prevent against the slowloris attack so we end up removing this protection. ).

Disable the headers timeout

It's possible to disable the headers timeout by setting this to 0. This removes the protection entirely against the slowloris attack. Removing this protection for users who aren't reusing TCP keepalives and reusing sockets is a potential change in expectation if they were relying on Kibana itself providing this protection.

@mshustov
Copy link
Contributor

mshustov commented Oct 18, 2020

I read through the nodejs thread nodejs/node#27363 and it seems that:

  • Disable the headers timeout is not an option due to security reasons.
  • Allow the end-user to configure the headers timeout requires the end-users to identify the problem and find a recommended solution in the Kibana docs. Furthermore, we are going to update the nodejs version to the fixed v12.19 in Kibana v7.11, but we will have to maintain the headersTimeout setting till v8.0 for BWC reasons.
  • Automatically configure the headers timeout to be larger than the keepalive timeout seems to be an optimal solution: Kibana works out-of-the-box, and we can remove an internal workaround as soon as update nodejs version in the upcoming release. Regarding This will potentially lead to the headers timeout being set to a much larger value than it would be by default, minimizing the protection that it provides against the slowloris attack. From the official docs headersTimeout is 40sec (increased to 60sec in v12), while keepAlive timeout for Kibana is 120sec. The existing difference is not on the level of the order of magnitude to say we disable the protection against slowlorais attack completely.

@kobelb WDYT about this?

const headersTimeout = listenerOptions.keepaliveTimeout + 2*server.listener.headersTimeout;
server.listener.headersTimeout = headers Timeout;

@kobelb
Copy link
Contributor

kobelb commented Oct 19, 2020

@restrry I think your suggestion is the "least bad" option that we have, unfortunately... I'll take it :)

@mshustov mshustov moved this from In Progress to 7.11 in kibana-core [DEPRECATED] Oct 27, 2020
@mshustov
Copy link
Contributor

@lizozom Could you confirm this problem has been fixed by #61587 ?

@lizozom
Copy link
Contributor

lizozom commented Nov 23, 2020

IMO yes, but I'd let @kobelb confirm as he did most of the research

@kobelb
Copy link
Contributor

kobelb commented Nov 23, 2020

Everything is looking good from my perspective. I just tested out master and I'm seeing these timeouts working properly with TCP keepalives.

kibana-app-arch automation moved this from To triage to Done in current release Nov 23, 2020
kibana-core [DEPRECATED] automation moved this from 7.11 to Done (7.11) Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Search Querying infrastructure in Kibana regression
Projects
kibana-app-arch
  
Done in current release
Development

No branches or pull requests

9 participants