-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection timeout between master and worker is too high #415
Comments
I agree. In general php pm should fail faster in order to be able to recover at all in some circumstances. In 10 seconds a lot can go wrong. |
It should be long enough to cater for startup time of a worker which might be slow? |
Here is the timeout connection only, tcp handshake is not so long, and do not rely on application running inside the worker. Or if it's the case, when the worker is starting, it should not be ready |
@Prophet777 what would be the ideal timeout? |
IDK if there is an ideal timeout, depends of the usage. I think 300ms as default is correct, since it's only used in internal between worker and master, for most of usage It would fit. And making it configurable is the ideal. Also 300ms is enough for tcp connection (local ~25ms RTT), so if this deadline is exceeded, something is wrong. To be honest I dont really know what could be a good default timeout, the mindset arround is "more faster it fail, more faster we try to another one" and more faster we deliver the response to the client. So sound ok to me. Also if cascade failure happen, like 3 workers down in a row, would be like at least 900ms before getting response. Sounds resonnable for a degraded app. |
The ideal timeout would be 10s by default, but it MUST be configurable. We need to set the value higher. I don't agree with the hard-coded value, it is not scalable then. |
Currently the timeout between master and worker is hardcoded to 10s, for this kind of "internal connection" is too high, we dont want to wait 10s to know if a worker is available or "crashed" and then go to the next, 300ms is better. Faster it fails, faster it recover
What do you think ?
The text was updated successfully, but these errors were encountered: