-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rq jobs go into a bad state when redis hits connection limit #1553
Comments
Because RQ uses Redis to store worker and job states, it's expected that RQ jobs will go into an inconsistent state if Redis doesn't work properly. I don't think there's much we can do about this, but I'm open to suggestions. Please reopen the issue if you have any actionable suggestions. |
Hi @selwin, there are a few ways this could be fixed: When the job is initialized, it doesn't belong to any registry. The job could be immediately moved into the OR We dequeue the job when calling OR Introduce a new kind of registry: If you think any of the solutions could be applied, could you please re-open this issue? I don't have permission to re-open this issue. |
If the issue is about addressing the part when a job gets popped, it could get orphaned because it no longer belongs in any queue/registry (and not Redis errors causes RQ to fail because this is too broad), I'm open to addressing this issue. So the TLDR is option 1 is best if it can be done. Otherwise option 3.
This would be the most elegant solution, but I'm not sure this is easily achievable given that RQ uses
Not sure how viable this approach is, this would also change RQ in a lot of places so this would be a huge change.
Maybe this can be done, I'm willing to accept a PR for this. |
Event: message in popped from a non empty queue
State: redis server's client connection count < max_connections
Event: New work horse is forked to execute the job
State: redis server's client connection count = max_connections
For the same reason, executing the failure callback is also not possible.
At this point the job persists in redis with status="queued", but the job is not in queue. Workers
try to pop the next item in the queue and the whole cycle continues until the queue is empty. The
jobs still persist in redis and there is no way to process them. They also miss out from any
monitoring systems that monitor the rq registries as they are absent from all of them.
This problem might happen often when running rq on a redis server with low client connection limit.
Traceback that caused me to investigate the issue:
The text was updated successfully, but these errors were encountered: