Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubiquity php-pm High concurrency level benchmark #456

Open
jcheron opened this issue Jun 6, 2019 · 45 comments
Open

Ubiquity php-pm High concurrency level benchmark #456

jcheron opened this issue Jun 6, 2019 · 45 comments
Labels

Comments

@jcheron
Copy link

jcheron commented Jun 6, 2019

Hi,
I implemented a PHP-PM adapter for Ubiquity framework, which I test the performance on TechEmpower benchmarks.
For the moment, the results are not conclusive (see Ubiquity and Ubiquity-react).

For the Multiple Queries test, which requires a connection to the database and could have benefited php-pm, I have the following results:

Responses per second for multiple queries test (1, 5, 10, 15, 20), concurrency level of 512:
image

The source code and configuration are available on TechEmpower repository.
The PHP-PM test (currently called Ubiquity-react) uses php-pm as a web server and load balancer.
Dockerfile to build & run is ubiquity-react.dockerfile.

I have the same symptoms as @oozone with Laravel (see #453): poor performance, and warning on some requests:

Script did not return a valid HTTP response. Maybe it has called exit() prematurely?

I also tried to manually test the script with ab:
At concurrency level 5, php-pm is more efficient than php-fpm
PHP-PM
image
Apache
image

If I increase the concurrency level, performance degrades for php-pm:
PHP-PM
image
Apache
image

TechEmpower multiple query tests are run at 512 concurrency...
At this level, my php-pm use generates warnings.
The results are the same for the other tests (Plaintext or Json).

I wonder if I'm doing something wrong somewhere.
The adjustment of the number of workers does not change much in this context.

@andig
Copy link
Contributor

andig commented Jun 6, 2019

Ping @marcj I could imagine that degrading performance at higher concurrency lavels might be due to only having a single incoming process for all connections? What really discomforts me again is the unclear slave not responding errors. Do you have any idea how to improve logging to catch those?

@andig
Copy link
Contributor

andig commented Jun 6, 2019

@jcheron regarding your config you should adjust the number of workers with the level of parallelism. Each worker will only handle a single request at a time. Once you have more requests than the workers can handle the requests will start queuing up at the master and worst case time out before handed to the slave. Ppm log output for your test cases would be helpful, too.

Also- as far as Ubiquit goes- are you really sure that you‘re not starting an entire application instead of just processing a request in https://github.com/phpMv/ubiquity-php-pm/blob/master/src/PHPPM/Ubiquity.php#L55?

@jcheron
Copy link
Author

jcheron commented Jun 6, 2019

@andig
Thank you for your answer

Do you have any idea how to improve logging to catch those?

When I activate the logging at php-pm server starting, all requests return a status of 200 until we get to the warnings.
But I have no additional information.

image

with -vvv:

image

If I increase the total number of requests to be made, I realize that the problems always occur on the last 10 percent.
If I increase the number of workers, (128 for concurrency 256), it doesn't change anything.

are you really sure that you‘re not starting an entire application instead of just processing a request

Yes, the Startup::run method just handle the request.
The framework is started by including the file services.php, which uses a config.php file adapted to php-pm.

@jcheron
Copy link
Author

jcheron commented Jun 6, 2019

I didn't notice anything wrong on the framework side by enabling all the log and error display possibilities.

By removing any reference to the framework in the handle method of the Ubiquity bridge:

public function handle(ServerRequestInterface $request): ResponseInterface {
	return new \React\Http\Response(200, ['Content-Type'=>'text/plain'], 'Hello World!');
}

I still have the same warnings with concurrency 512
vendor/bin/ppm --bridge='\PHPPM\Ubiquity' --bootstrap='\PHPPM\Ubiquity' start --debug 0 --logging 0 --workers 32 --max-requests 1024 --port=8090

ab -n 1000 -c 512 http://127.0.0.1:8090/foo

image

@andig
Copy link
Contributor

andig commented Jun 6, 2019

Do you see any chance of adding logging in the bridge? Does the bridge even receive those stray requests or does the redponse somehow get stuck?

@jcheron
Copy link
Author

jcheron commented Jun 6, 2019

I added logging in the bridge. But stray requests do not go through there.

@andig
Copy link
Contributor

andig commented Jun 8, 2019

Then the next step would be to find out if the slave even receives the request or if it gets stuck in the master. Could you also find out if the slave ever sees an incoming connection in those cases?

@jcheron
Copy link
Author

jcheron commented Jul 8, 2019

Until we find a solution to the high concurrency level problem, I will remove the Ubiquity-react (php-pm) benchmark from the TechEmpower results.

For now, it's bad publicity for php-pm as much as it is for Ubiquity.

@acasademont
Copy link
Contributor

Hi @jcheron the problems were the same when running 512 php-pm slaves? As I can see in your repo, php-pm is only given 64 slaves, no?

@jcheron
Copy link
Author

jcheron commented Jul 11, 2019

@acasademont
I gradually increased the number of workers:
16, 32 and 64, with no impact on performance so far.
I concluded, perhaps hactorily, that the problem was elsewhere.
Initially, I understood that the number of workers had to be correlated to the core number of the processor, and not to a concurrency level that we don't have control in production.

Some of the TechEmpower tests go up to a concurrency level of 16 384 (Plaintext benchmark)...
But I'd be willing to make an attempt at 512.

@acasademont
Copy link
Contributor

@jcheron php-pm behaves exactly the same way as php-fpm, it can only process one request at a time per worker. Therefore, if you use a concurrency level of 512 but you limit the number of workers to 64, you're artificially limiting the number of requests than can be managed concurrently.

In an ideal world you're right, the number of workers should equal the number of processors (it's what you do in nginx, for example), but php is not an async language, any IO call would block the whole process. That's why php-pm decided to go for the safer approach of limiting one concurrent request per worker.

If you have a min_children directive set in the php-fpm config, the same number should be given to php-pm. We might then encounter other problems (for sure!), but at least we're giving php-pm the minimum number of resources it needs :p

@andig
Copy link
Contributor

andig commented Jul 12, 2019

@acasademont we still have the Script did not return a response problem that we've also seen in another, very simple Symfony app benchmark we're seing even without Ubiquity above. I couldn't find what causes those but ssupect that it might also have an impact on avg response times. Any idea what might be causing those?

@acasademont
Copy link
Contributor

Yes, definitely, we should address that problem too, but I wanted to start with a fair comparison first :p I'll try to replicate the issue and see what I can find 👌

@jcheron
Copy link
Author

jcheron commented Jul 31, 2019

@acasademont
I tried to increase the number of workers to 512, but it didn't work.
Travis-ci tests failed.

image

So for now I'm removing Ubiquity with PHP-PM from Techempower benchmarks, until we find a solution.

So I tried it with 16, 32, 64 and 128 workers, without seeing any real impact on performance.

@andig
Copy link
Contributor

andig commented Oct 8, 2019

I've looked some more into this. It is definitely a problem in the core:

bin/ppm start --workers=8 --bridge=StaticBridge --static-directory=web --max-requests=32 -v &
ab -n 1000 -c 38 http://localhost:8080/

ab will fail sooner, the higher the concurrency level. It will also fail sooner the larger the number of already executed requests. With 38 concurrent I do sometimes get a full run on first try, but often not:

❯ ab -n 1000 -c 38 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
apr_socket_recv: Connection reset by peer (54)
Total of 16 requests completed

With 32 concurrent it will almost always work. If it breaks we'll have these messages in the log:

Script did not return a valid HTTP response. Maybe it has called exit() prematurely?

@andig
Copy link
Contributor

andig commented Oct 8, 2019

More experiments. I've started giving the slaves ids to trace requests across log messages by slave. Running this with only a single worker:

bin/ppm start --workers=1 --bridge=StaticBridge --static-directory=web --max-requests=1 -v #
Starting PHP-PM with 1 workers, using StreamSelectLoop ...
1 workers (starting at 5501) up and ready. Application is ready at http://127.0.0.1:8080/
> slave 1
Restarting worker #5501 because it reached max requests of 1

ab -n 1000 -c 100 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
apr_socket_recv: Connection reset by peer (54)
Total of 1 requests completed

Test dies after only a single request, but only if we chose a high concurrency level!

@andig
Copy link
Contributor

andig commented Oct 8, 2019

/cc @WyriHaximus might I ask you if you've seen something like the comment above before where reactphp might get stuck on high concurrency levels?

@WyriHaximus
Copy link

@andig only seen this happen with ab but never with wrk/wrk2 or other tools when reaching the limits of your set up.

@acasademont
Copy link
Contributor

Taking a look at this today, let's see what we can find!

@WyriHaximus
Copy link

@acasademont you know where to find me if you do ;)

@andig
Copy link
Contributor

andig commented Dec 27, 2019

Thanks @acasademont. I‘m concentrating on programming an ev charge controller- so many projects, so little time...

@acasademont
Copy link
Contributor

I'm starting from the basics, just a simple "ab -n 2 -c 2 http://127.0.0.1:8080/"

First weird thing I saw: ab seems to open one more connection than needed when concurrency > 1.

albert@AlbertWork:~/Projects/php-pm$[master] bin/ppm start --workers=1 --bridge=StaticBridge --static-directory=web --max-requests=10 -vvv
/Users/albert/Projects/php-pm
+---------------------+----------------------------------------------+
| bridge              | StaticBridge                                 |
| host                | 127.0.0.1                                    |
| port                | 8080                                         |
| workers             | 1                                            |
| app-env             | dev                                          |
| debug               | 0                                            |
| logging             | 1                                            |
| static-directory    | web                                          |
| bootstrap           | PHPPM\Bootstraps\Symfony                     |
| max-requests        | 10                                           |
| max-execution-time  | 30                                           |
| memory-limit        | -1                                           |
| ttl                 | 0                                            |
| populate-server-var | 1                                            |
| socket-path         | .ppm/run/                                    |
| pidfile             | .ppm/ppm.pid                                 |
| reload-timeout      | 30                                           |
| cgi-path            | /usr/local/Cellar/php@7.3/7.3.13/bin/php-cgi |
+---------------------+----------------------------------------------+
Starting PHP-PM with 1 workers, using StreamSelectLoop ...
Start new worker #5501
Worker #5501 registered. Waiting for application bootstrap ...
Worker #5501 ready.
1 workers (starting at 5501) up and ready. Application is ready at http://127.0.0.1:8080/



request incoming 1 tcp://127.0.0.1:63928
1 slaves for request1
took abnormal 0.000 seconds for choosing next free worker
Took abnormal 0.001 seconds for connecting to worker 5501
handle data  for request1
occupied slave 5501 for request 1
handle data GET / HTTP/1.0
Host: 127.0.0.1:8080
User-Agent: ApacheBench/2.3
Accept: */*

 for request1
all data buffered for 1
Current memory usage for worker 5501: 2.00 MB
[27/Dec/2019:18:58:21 +0000] 127.0.0.1 - - "GET / HTTP/1.0" 404 61 "-"
Worker 5501 took abnormal 0.007 seconds for handling a connection
request incoming 2 tcp://127.0.0.1:63929
1 slaves for request2
took abnormal 0.000 seconds for choosing next free worker
Took abnormal 0.000 seconds for connecting to worker 5501
handle data  for request2
occupied slave 5501 for request 2
**request incoming 3 tcp://127.0.0.1:63930**
0 slaves for request3
no slaves for request 3
handle data GET / HTTP/1.0
Host: 127.0.0.1:8080
User-Agent: ApacheBench/2.3
Accept: */*

 for request2
all data buffered for 2
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
0 slaves for request3
no slaves for request 3
Current memory usage for worker 5501: 2.00 MB
[27/Dec/2019:18:58:21 +0000] 127.0.0.1 - - "GET / HTTP/1.0" 404 61 "-"
0 slaves for request3
no slaves for request 3
Worker 5501 took abnormal 0.022 seconds for handling a connection
1 slaves for request3
took abnormal 0.023 seconds for choosing next free worker
Took abnormal 0.000 seconds for connecting to worker 5501
handle data  for request3
occupied slave 5501 for request 3
Worker 5501 took abnormal 0.000 seconds for handling a connection
**Script did not return a valid HTTP response. Maybe it has called exit() prematurely?**

Things that we could definitely improve:

  • Do not search for a slave until all incoming data has been buffered. It makes no sense to waste resources on the search if this is not an HTTP request after all. This would avoid the misleading Script did not return a valid HTTP response. Maybe it has called exit() prematurely? which has nothing to do with the script response but from the fact that the connection never piped in any data.
  • Leverage the reactphp http server on the master instead of the socket server. This would free us from dealing with this weird stuff, we would get an already buffered Http request and then we would choose a slave to pipe the request to. Right now the master tries to be http agnostic, but it really isn't. I believe we should fully embrace http right on the master and simplify our code.

Thoughts? @andig @WyriHaximus

On the slaves we could also start a simpler http server without the default reactphp middlewares, knowing that the master has already done the hard work for us.

@acasademont
Copy link
Contributor

In any case, to continue with this, we should not emit that Script did not return a valid HTTP response. Maybe it has called exit() prematurely? error when it's the client that has prematurely disconnected.

@acasademont
Copy link
Contributor

I ended up doing some tcpdumps to see why ab was complaining of the connection being reset and the problem seems to be some TCP RST packets emitted by the server that ab doesn't seem able to deal with. This is out of scope of phppm, maybe related to reactphp, maybe related to php streams, can't really tell :(

A simple reactphp server:

<?php

include __DIR__.'/vendor/autoload.php';

$loop = React\EventLoop\Factory::create();

$server = new React\Http\Server(function (Psr\Http\Message\ServerRequestInterface $request) {
    return new React\Http\Response(
        200,
        array('Content-Type' => 'text/plain'),
        "Hello World!\n"
    );
});

$socket = new React\Socket\Server(8080, $loop);
$server->listen($socket);

echo "Server running at http://127.0.0.1:8080\n";

$loop->run();

an ab -n 100 -c 100 http://127.0.0.1:8080/

will yield errors most of the time.

For comparison, I created a nodejs server

const http = require('http')

const requestHandler = (request, response) => {
  response.end('Hello World!')
}

const server = http.createServer(requestHandler)

server.listen(8080, (err) => {
  if (err) {
    return console.log('something bad happened', err)
  }

  console.log('Server running at http://127.0.0.1:8080')
})

and dealt with that concurrency level just fine.

@acasademont
Copy link
Contributor

Attaching the tcpdumps capture for both reactphp and node if that might help.

tcpdump.zip

@acasademont
Copy link
Contributor

Tried it also with another benchmark tool written in go called hey (https://github.com/rakyll/hey). Doesn't miserably fail on the first connection reset like AB, but it still does fail quite often compared to the nodejs server (also much slower, but that's another issue I guess :p)

@WyriHaximus
Copy link

@acasademont I've also been playing around with https://k6.io/ lately, different approach but might be interesting to check out as well as you can use it as a unit test to make sure everything works as intended during different levels of load.

@acasademont
Copy link
Contributor

@WyriHaximus seems like the problem is a very low TCP backlog value by default on PHP streams (32). php-pm should probably set a specific (and configurable) parameter for that...After increasing it to 128 everything works fine!

@WyriHaximus
Copy link

@acasademont awesome!

@andig
Copy link
Contributor

andig commented Jan 2, 2020

See #488 for fix. Confirmation this is solved in HEAD would be great.

@acasademont
Copy link
Contributor

There are other things we could do to improve performance for sure, but at least the test shouldn't miserably fail anymore

@jcheron would be great if you could try again!

@jcheron
Copy link
Author

jcheron commented Jan 2, 2020

Thank you for your work!
I'll keep you informed in this thread.

@andig
Copy link
Contributor

andig commented Jan 3, 2020

@acasademont

Things that we could definitely improve:
Do not search for a slave until all incoming data has been buffered. It makes no sense to waste resources on the search if this is not an HTTP request after all. This would avoid the misleading Script did not return a valid HTTP response. Maybe it has called exit() prematurely? which has nothing to do with the script response but from the fact that the connection never piped in any data.

I wouldn't do that as it would really increase latency. If a client connects we can be fairly certain that it should have a valid request? Seems the "client went away" errors were part of #488.

Leverage the reactphp http server on the master instead of the socket server. This would free us from dealing with this weird stuff, we would get an already buffered Http request and then we would choose a slave to pipe the request to. Right now the master tries to be http agnostic, but it really isn't. I believe we should fully embrace http right on the master and simplify our code.

That sounds like an excellent idea. As far as I remember that server didn't exist yet when I refactored PPM. Now it seems like a good idea to move to the streaming server implementation?

@andig
Copy link
Contributor

andig commented Jan 13, 2020

ping @jcheron any news?

@andig andig removed the blocked label Jan 13, 2020
@jcheron
Copy link
Author

jcheron commented Jan 14, 2020

Sorry, @andig , I didn't forget.
php-pm is still impressive at a reasonable concurrency level (10 to 20).
But I still have the same problems with the concurrency required by the Techempower tests:

Json test with 72 workers on an i7-8750H (6 cores, 12 threads):
image

@andig
Copy link
Contributor

andig commented Jan 14, 2020

It's hard to say from looking at the screenshots as they show different results, but do you notice any difference to the situation before? It seems the number of warnings has reduced but they still exist. Did performance increase in any way at concurrency levels above 30?

@acasademont
Copy link
Contributor

acasademont commented Jan 14, 2020 via email

@jcheron
Copy link
Author

jcheron commented Jan 14, 2020

I think it's better too, there are fewer error messages.

But even if the test passes, the results are not satisfactory:
on the Json test with a concurrency level of 128...

  • Ubiquity-php-pm -> 5417 req/s
  • Ubiquity -> 56900 req/s
  • Ubiquity-swoole -> 108300 req/s

I benchmarked by varying the number of requests, the duration, the concurrency level,
and it seems that the cumulative number of queries in a test is as problematic as the concurrency level.

@acasademont
Copy link
Contributor

I find it hard to understand how the heck you can do 57k req/s on a standard php-fpm installation with apache. Is it possible to have the test reproducer?

@acasademont
Copy link
Contributor

I think that HTTP keep alive might be playing an important role here...

@jcheron
Copy link
Author

jcheron commented Jan 15, 2020

The 57k on my computer is not out of proportion to the runs on TechEmpower.
The test just consists in displaying a json answer:

echo \json_encode ( [ 'message' => 'Hello, World!' ] );

For Ubiquity, you can reproduce in :

For php-pm, I use this docker file:

FROM ubuntu:19.10

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update -yqq && apt-get install -yqq software-properties-common > /dev/null
RUN LC_ALL=C.UTF-8 add-apt-repository ppa:ondrej/php
RUN apt-get update -yqq > /dev/null && \
    apt-get install -yqq git unzip php7.4 php7.4-common php7.4-cli php7.4-pgsql php7.4-mbstring php7.4-cgi > /dev/null

RUN apt-get install -yqq composer > /dev/null

RUN ln -s /usr/lib/cgi-bin/php /usr/bin/php7.4-cgi

COPY deploy/conf/php-async.ini /etc/php/7.4/cgi/php.ini

ADD ./ /ubiquity

WORKDIR /ubiquity

RUN composer require phpmv/ubiquity-php-pm:dev-master --quiet

RUN composer install --optimize-autoloader --classmap-authoritative --no-dev --quiet

RUN chmod 777 -R /ubiquity/app/cache/*

RUN chmod 777 -R /ubiquity/.ubiquity/*

RUN chmod 777 /ubiquity/vendor/bin/ppm

CMD /ubiquity/vendor/bin/ppm --bridge='\PHPPM\Ubiquity' --bootstrap='\PHPPM\Ubiquity' start --debug 0 --logging 0 --workers 72 --max-requests 4096 --host=0.0.0.0 --port=8080

I'll try without the Http keep-alive.

@marcj
Copy link
Member

marcj commented Jan 15, 2020

@jcheron what OS do you use?

@jcheron
Copy link
Author

jcheron commented Jan 15, 2020

@marcj Ubuntu 16.04.6 LTS

@marcj
Copy link
Member

marcj commented Jan 15, 2020

TechEmpower runs the tests using PHP-FPM, right? How many CPU cores are they using?

I doubt that PHP-FPM is able to pull those numbers, even with very simple hello world scripts, since even Apache out of the box isn't able to pull those numbers for serving static files. From my experience even nginx has troubles going to 500k req/s serving static files (which is way easier than using cgi protocol and communicate with a PHP interpreter). It seems those are in-php loop tests where something like the http-kernel/react-http abstraction is called without external http servers like apache/nginx involved, might this be the case? If so, the whole comparison is flawed.

@jcheron
Copy link
Author

jcheron commented Jan 15, 2020

What unattainable number are you talking about? I don't get it.
TechEmpower uses 2 environments to perform these tests, and they are not pseudo-tests.
In fpm/fast cgi, with nginx server, php reaches 260k req/s
We're not on 500k, actually.
But with asynchronous php platforms, this threshold is largely exceeded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants