Event subscriber clients won't reconnect / Jobs and events won't get triggered #2187

xairoo · 2021-10-28T07:37:09Z

Event subscriber clients won't reconnect after a timeout when redis removes the client based on tcp-keepalive setting (default is 300 seconds).

This is only related to a remote redis connection. Locally it works.

Bull uses 3 redis connections:

client
subscriber
bclient

When you run into this problem, you'll miss 1 or 2 clients (typically 2). The missing clients are required for yourQueue.on() and yourQueue.process(). So you won't receive events:

new jobs yourQueue.process()
progress updates yourQueue.on()

Reproduce

I'm still testing this, don't have much time right now, will update this if there is an easier/better way.

Use a remote redis server
Cut your uplink for at least 300 seconds (tcp-keepalive value) or put your system to sleep/hibernate
- You can reduce this time when you reduce the tcp-keepalive value on your redis server

Not sure, still testing it: Looks like when you remove your network cable from your PC or disable the network card, blocking the port or so, the disconnect event will be triggered, so it will reconnect automatically. In other words: Prevent the disconnect event to reproduce this problem.

The redis client lists CLIENT LIST (or just use RedisInsight as a GUI) will show you all connected clients.

After start:

After close/reconnect:

This could be related to #1873.

I am still testing it, but it looks like pinging the server from each client restarts the connection:

setInterval(function () {
  yourQueue.clients.forEach((client) => {
    client.ping();
  });
}, 10000);

The text was updated successfully, but these errors were encountered:

manast · 2021-10-29T02:20:49Z

Can you do the same tests but instead of using Bull using ioredis directly? As if you create a subscriber connection and check if it is alive after the reconnection? if not, then this issue should be reported to the ioredis team.

xairoo · 2021-10-29T16:52:47Z

This has nothing to do with bull. ioredis just doesn't reconnect the subscriber.
When you have such a disconnect, the normal clients won't reconnect if you don't send any commands. And if you send one, they'll reconnect.

Created a ticket: redis/ioredis#1451

But the ping trick works ;-) Maybe that should be used in bull, no idea if ioredis (or even node redis) will implement this.

Hope this will fix #1873 too,

I don't understand why these options should help:

maxRetriesPerRequest: null,
enableReadyCheck: false,

I think (haven't checked) maxRetriesPerRequest is only for sending (trying to send again if failed) commands like SET and so on.

But enableReadyCheck waits until the server is ready (when he finally loads all data from disk). That shouldn't be fix the issue in #1873.

In a large production scenario it would be really difficult to check all the connected, btw. the missing clients (subscribers!) to find the reason why the jobs are stuck.
But a step would be checking the job in redis. If they get added, than the subscriber must be disconnected. What else should it be?

manast · 2021-10-30T02:06:59Z

I agree that enableReadyCheck shouldn't matter, however without it the reconnection will not work, I have tested it extensively.

manast · 2021-10-30T02:08:40Z

In a large production scenario it would be really difficult to check all the connected, btw. the missing clients (subscribers!) to find the reason why the jobs are stuck.
But a step would be checking the job in redis. If they get added, than the subscriber must be disconnected. What else should it be?

In general you should not use the events other than for non critical notifications, since in the case of a disconnect you will loose events, so you cannot rely on them for bookkeeping.

xairoo · 2021-10-30T07:55:43Z

In general you should not use the events other than for non critical notifications, since in the case of a disconnect you will loose events, so you cannot rely on them for bookkeeping.

Totally. In my (bad) case the bull worker does his work, but the event (on complete and so on) is finally received and handled by an other instance (socket.io) that will send some data to the user and the most important part is to store data from the received job in MongoDB.

I have done this to save connections. That wasn't a good idea.

The worker should handle the events directly and must store data to the DB. In general within the event loop before the job is completed. To handle also DB problems.

Sometimes we cannot see the forest for the trees.

Thanks! =)

stale · 2021-12-29T08:22:05Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

alolis · 2022-04-27T11:46:07Z

In a large production scenario it would be really difficult to check all the connected, btw. the missing clients (subscribers!) to find the reason why the jobs are stuck.
But a step would be checking the job in redis. If they get added, than the subscriber must be disconnected. What else should it be?

In general you should not use the events other than for non critical notifications, since in the case of a disconnect you will loose events, so you cannot rely on them for bookkeeping.

I use the events for bookkeeping as well. Much cleaner that way :)

After the change in #1873 Is it safe to assume this is not a problem and the events WILL fire after re-connect or am I wrong?

manast · 2022-04-30T03:38:53Z

After the change in #1873 Is it safe to assume this is not a problem and the events WILL fire after re-connect or am I wrong?

They should. But you can also verify it to be 100% sure :)

stale bot added the wontfix label Dec 29, 2021

stale bot closed this as completed Jan 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Event subscriber clients won't reconnect / Jobs and events won't get triggered #2187

Event subscriber clients won't reconnect / Jobs and events won't get triggered #2187

xairoo commented Oct 28, 2021 •

edited

manast commented Oct 29, 2021

xairoo commented Oct 29, 2021

manast commented Oct 30, 2021

manast commented Oct 30, 2021

xairoo commented Oct 30, 2021

stale bot commented Dec 29, 2021

alolis commented Apr 27, 2022 •

edited

manast commented Apr 30, 2022

Event subscriber clients won't reconnect / Jobs and events won't get triggered #2187

Event subscriber clients won't reconnect / Jobs and events won't get triggered #2187

Comments

xairoo commented Oct 28, 2021 • edited

Reproduce

manast commented Oct 29, 2021

xairoo commented Oct 29, 2021

manast commented Oct 30, 2021

manast commented Oct 30, 2021

xairoo commented Oct 30, 2021

stale bot commented Dec 29, 2021

alolis commented Apr 27, 2022 • edited

manast commented Apr 30, 2022

xairoo commented Oct 28, 2021 •

edited

alolis commented Apr 27, 2022 •

edited