createCluster clients don't handle on('error') correctly #2721

kseth · 2024-03-19T23:53:05Z

Description

We use cluster-mode with redis for sharded pub-sub (we have 3 masters and 3 replicas in a kubernetes cluster).

We have the following args for the clients:

    const clusterArgs = {
      rootNodes: [
        {
          url: `redis://${REDIS_CLUSTER_PUBSUB_HOST}:${redisPort}`,
        },
      ],
      defaults: {
        username: REDIS_CLUSTER_PUBSUB_NAME,
        password: REDIS_CLUSTER_PUBSUB_PASS,
        socket: {
          reconnectStrategy(retries: number) {
            if (retries >= 10) {
              console.error(
                `lost connection to redis cluster-pubsub cluster: tried ${retries} times`
              );
            } else {
              console.warn(
                `retrying redis cluster-pubsub cluster connection: tried ${retries} times`
              );
            }

            // reconnect after
            return Math.min(retries * 200, 2000);
          },
          connectTimeout: 10000,
          keepAlive: 60000,
        },
      },
    };

and then we create the client(s) like this:

const client = createCluster(clusterArgs);
await client.connect();
client.on('error', (err) => {
  console.error(`[PUB-SUB ERROR]: ${err}`);
});

Sometimes our redis pub-sub cluster goes down (i.e. for maintenance, when we upgrade to a new version, since we run it in kubernetes), and we'll receive the following error:

Error: Socket closed unexpectedly

We correctly log the error by catching it in the error handler, but we never seem to retry / reconnect -- the only way I can get a reconnect to actually happen is to continually restart the process until the reconnection succeeds.

Also, if the process tries to issue a command, we sometimes get an internal error killing the process because of a node uncaught exception, even though I've added a client.on('error') above.

I followed the findings from #2120 and #2302, but those don't really seem to solve our problems.

What I'd like is to be able to specify a reconnect strategy so that we continually try to retry (according to the reconnectStrategy) if we lose our TLS connection / fail to talk to a node in the cluster. Also, I'd like that we actually queue messages when we're offline instead of throwing an error and taking down the process.

Node.js Version

20.11.1

Redis Server Version

7.0.10

Node Redis Version

4.6.13

Platform

linux

Logs

No response

The text was updated successfully, but these errors were encountered:

kseth added the Bug label Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

createCluster clients don't handle on('error') correctly #2721

createCluster clients don't handle on('error') correctly #2721

kseth commented Mar 19, 2024 •

edited

createCluster clients don't handle on('error') correctly #2721

createCluster clients don't handle on('error') correctly #2721

Comments

kseth commented Mar 19, 2024 • edited

Description

Node.js Version

Redis Server Version

Node Redis Version

Platform

Logs

kseth commented Mar 19, 2024 •

edited