Performance degradation with increase of the number of concurrent connections #590

keshonok · 2016-07-21T12:28:15Z

Recently a series of tests was executed on real hardware to compare the performance of Tempesta with the performance of Nginx as one of the most speediest and widespread web server. The tests revealed that Tempesta's performance degrades, sometimes significantly, with increase of the number of concurrent connections. That's something that should not happen.

As the degradation is observed, the server that runs Tempesta doesn't show any memory limitations, the system has plenty of memory (about 3Gb out of 4Gb RAM available on the server). Also there are no warnings from the TCP stack that it experiences a lack of memory. So this issue doesn't seem to be related to memory pressure.

Below is a collection of all settings and tools used to run the tests.

Servers that were used in tests were connected directly (without a switch) with 10G Ethernet links. The base hardware configuration was as follows:
- (227) Xeon E5620 @ 2.40GHz with 4Gb RAM and Intel 82599ES 10G Ethernet adapter.
- (228) Xeon E5620 @ 2.40GHz with 2Gb RAM and Intel 82599ES 10G Ethernet adapter.
- (229) Xeon E5405 @ 2.00GHz with 8Gb RAM and Intel X520 10G Ethernet adapter.
The following software environment was installed on the servers:
- (227) Debian GNU/Linux 8 (jessie)
- (228) Debian GNU/Linux 8 (jessie)
- (229) Ubuntu 14.04.4 LTS
Vanilla Linux 4.1.24 kernel was used to run Nginx without Tempesta, or run ab benchmark tests. Tempesta Linux 4.1.12 kernel was used to run Tempesta. Pre-installed 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux was used to run ab benchmark on 229 server.
Latest ixgbe 4.4.6 driver was used in tests.
Nginx and Tempesta tests were conducted on (227) and (228) machines. ab benchmark was run on all three machines.
ab installed from packages was used as the benchmark.
- (227) Version 2.3 <$Revision: 1604373 $> md5 a0410a0ea05f60bcd6335c56803f88eb
- (228) Version 2.3 <$Revision: 1604373 $> md5 a0410a0ea05f60bcd6335c56803f88eb
- (229) Version 2.3 <$Revision: 1528965 $> md5 6e901c0c908ac1cee3f44c2a855753ea
Nginx installed from packages was used as the benchmark.
- (227) nginx/1.6.2 md5 8a5c305627c72066a50947072091c9a4
- (228) nginx/1.6.2 md5 8a5c305627c72066a50947072091c9a4
Tempesta was at version c178468
Tempesta was configured to work from cache. Nginx was also configured to use cache.
Test and benchmark servers were tuned to expand TCP stack's available memory and optimize several TCP stack parameters.

Tempesta configuration:

# Listen on 10G interfaces only.
listen 192.168.200.150:80;
listen 192.168.100.150:80;
# Back end Nginx server runs on the same machine.
server 127.0.0.1:8080 conns_n=8;
cache 1;
cache_fulfill * *;

The tests were run by pointing the ab benchmark at either Nginx or Tempesta server. The test results were collected with various combinations of the keep-alive option, the number of concurrent connections, and the file size at the server. The ab benchmark was run in one, two, or three instances simultaneously with help of parallel utility.

The files sizes of 10, 100, 500, 1024, 10240, 51200, and 102400 bytes were chosen for the tests. The files of these specific sizes were generated by filling them with random data.

The following number of concurrent connections was tried in the tests: 1, 10, 64, 128, 512, 1024, 2048, 8192, 16384, 20000.

The resulting RPS (requests per second) were extracted from ab log files, and gnuplot data files were prepared for each file size and each test. The graphs plotted with gnuplot were the end result of these tests that illustrated the dependency of RPS on the number of concurrent connections for Nginx and Tempesta in the same tests.

The graphs of test results taken at different times, with different parameters and different server tunings all show a very distinctive pattern: the RPS degrades significantly as the number of concurrent connections increases from a certain point.

The text was updated successfully, but these errors were encountered:

krizhanovsky · 2016-08-23T14:24:46Z

Actually there are at least several problems, not just degradation on increasing number of connections. This case is shown on the figure (degradation is circled in red):

The second problem is ridiculous small RPS on small file size and small number of connections followed ин quite sharp jump on higher connections number:

Probably the problem is linked with TCP issues like #439, #488 or #583.

The next problem is about large files. Typically on large files sendfile(2) transmission prevails over other Web-accelerator activities. sendfile() is well optimized pure kernel function, so we don't expect that Tempesta significantly outperforms Nginx in single processor installation (NUMA is different story where Tempesta DB shines) with few files (which all of them are in OS caches and can be immediately transferred). However, TempestaDB still must be faster than VFS calls, but we see performance degradation on large files and many connections, just like the first case. In this case we ever much slower that Nginx:

keshonok · 2016-08-30T15:36:44Z

The issue of a small RPS number on a small file size and small number of connections that is illustrated by graph 2 and 3 above is most likely caused by the lack of system resources exactly at the time these particular tests are run.

These graph are for Tempesta FW server that runs on 2Gb RAM machine. The test results are for tests with ab without the keep-alive option. These tests run immediately after the previous series of tests finishes, where the latest test of the previous series is the heaviest: 20000 concurrent connections, 102400 bytes file size, the keep-alive option is set which means that all concurrent connections are active at the same time. In the test with these characteristics the system with 2Gb RAM struggles heavily and issues messages related to insufficient memory in the TCP/IP stack.

Right after this heaviest test finishes it takes some time for the system to "recover" from the consequences of the heavy stress on the system resources. That's when the next lightweight tests don't show the expected results. If these tests are run independently, the results are as expected. Also, graphs for a Tempesta FW server on a 4Gb RAM machine don't show the above pattern.

krizhanovsky · 2016-09-03T20:53:16Z

The tests are outdated. I didn't see the rough performance degradation on 16 byte files and keep-alive connections for the latest master:

I used wrk in following way to get the results:

    ./wrk -t 1 -c 1 -d 30s http://192.168.100.100:$PORT/
    ./wrk -t 8 -c $CONNECTIONS -d 30s http://192.168.100.100:$PORT/

, where $PORT is port for Temepsta or Nginx and $CONNECTIONS is number of connections >=8.

'Connection: close' tests on small 16B files also didn't show the performance anomaly:

The benchmark command is

    ./wrk -t 8 -c $CONNECTIONS -d 30s --header 'Connection: close' http://192.168.100.100:80/

Tempesta configuration file:

    # cat etc/tempesta_fw.conf 
    listen 192.168.100.100:80;
    server 127.0.0.1:8080 conns_n=8;
    cache 1;
    cache_fulfill * *;

Nginx configuration:

    # cat /opt/nginx-1.11.3/conf/nginx.conf
    # Don't use gzip module since we don't test very large content files.

    user www-data;
    pid /run/nginx-proxy.pid;

    worker_processes 8;
    worker_rlimit_nofile 100000;

    error_log /opt/nginx-1.11.3/logs/error-proxy.log crit;

    events {
            worker_connections 16384;
            use epoll;
            multi_accept on;
            accept_mutex off;
    }

    http {
            server {
                    listen 9000 backlog=131072 deferred reuseport fastopen=4096;

                    location / {
                            proxy_pass http://127.0.0.1:8080/;
                    }
            }

            access_log off;

            proxy_cache_path /opt/nginx/cache keys_zone=one:10m;
            proxy_cache one;
            proxy_cache_valid any 30d;
            proxy_cache_valid 200 302 10m;

            open_file_cache max=100000 inactive=30d;
            open_file_cache_valid 30d;
            open_file_cache_min_uses 2;
            open_file_cache_errors on;

            # Has no sense for small files, see
            # https://www.rootusers.com/linux-web-server-performance-benchmark-2016-results/
            #sendfile on;
            etag off;

            tcp_nopush on;
            tcp_nodelay on;
            keepalive_timeout 65;
            keepalive_requests 100000000;

            reset_timedout_connection on;
            client_body_timeout 40;
            send_timeout 10;
    }

Tempesta utilizes just about 60% of CPU for softirq for the top load in both the tests whilw wrk fully utilizes all 8 cores. The same for 8 or 32K connections regardless keep-alive or closing connections - the bottleneck is wrk, not Tempesta. @keshonok how did you get 190K RPS from ab? It seems we need must stronger hardware for traffic generator to see real Tempesta numbers. Also after some Nginx tuning I reached 95KRPS instead of 60KRPS, so now the benchmarks are more fair.

Meantime I also tried

    # ./wrk -t 1 -c 1 -d 30s --header 'Connection: close' http://192.168.100.100:80/

and got 1479 and 2456 RPS for nginx-1.11.3 and Tempesta correspondingly, i.e. we don't see so low RPS for Tempesta at low number of connections.

The bad thing is that I caught deadlock #613 when I try more concurrent connections with Connection: close.

P.S. #613 is fixed.

keshonok added crucial performance labels Jul 21, 2016

keshonok added this to the 0.5.0 Web Server milestone Jul 21, 2016

krizhanovsky assigned krizhanovsky and keshonok Jul 21, 2016

krizhanovsky added the bug label Aug 2, 2016

krizhanovsky closed this as completed Sep 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance degradation with increase of the number of concurrent connections #590

Performance degradation with increase of the number of concurrent connections #590

keshonok commented Jul 21, 2016 •

edited

krizhanovsky commented Aug 23, 2016 •

edited

keshonok commented Aug 30, 2016

krizhanovsky commented Sep 3, 2016 •

edited

Performance degradation with increase of the number of concurrent connections #590

Performance degradation with increase of the number of concurrent connections #590

Comments

keshonok commented Jul 21, 2016 • edited

krizhanovsky commented Aug 23, 2016 • edited

keshonok commented Aug 30, 2016

krizhanovsky commented Sep 3, 2016 • edited

keshonok commented Jul 21, 2016 •

edited

krizhanovsky commented Aug 23, 2016 •

edited

krizhanovsky commented Sep 3, 2016 •

edited