Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

createCluster clients don't handle on('error') correctly #2721

Open
kseth opened this issue Mar 19, 2024 · 0 comments
Open

createCluster clients don't handle on('error') correctly #2721

kseth opened this issue Mar 19, 2024 · 0 comments
Labels

Comments

@kseth
Copy link

kseth commented Mar 19, 2024

Description

We use cluster-mode with redis for sharded pub-sub (we have 3 masters and 3 replicas in a kubernetes cluster).

We have the following args for the clients:

    const clusterArgs = {
      rootNodes: [
        {
          url: `redis://${REDIS_CLUSTER_PUBSUB_HOST}:${redisPort}`,
        },
      ],
      defaults: {
        username: REDIS_CLUSTER_PUBSUB_NAME,
        password: REDIS_CLUSTER_PUBSUB_PASS,
        socket: {
          reconnectStrategy(retries: number) {
            if (retries >= 10) {
              console.error(
                `lost connection to redis cluster-pubsub cluster: tried ${retries} times`
              );
            } else {
              console.warn(
                `retrying redis cluster-pubsub cluster connection: tried ${retries} times`
              );
            }

            // reconnect after
            return Math.min(retries * 200, 2000);
          },
          connectTimeout: 10000,
          keepAlive: 60000,
        },
      },
    };

and then we create the client(s) like this:

const client = createCluster(clusterArgs);
await client.connect();
client.on('error', (err) => {
  console.error(`[PUB-SUB ERROR]: ${err}`);
});

Sometimes our redis pub-sub cluster goes down (i.e. for maintenance, when we upgrade to a new version, since we run it in kubernetes), and we'll receive the following error:

Error: Socket closed unexpectedly

We correctly log the error by catching it in the error handler, but we never seem to retry / reconnect -- the only way I can get a reconnect to actually happen is to continually restart the process until the reconnection succeeds.

Also, if the process tries to issue a command, we sometimes get an internal error killing the process because of a node uncaught exception, even though I've added a client.on('error') above.

I followed the findings from #2120 and #2302, but those don't really seem to solve our problems.

What I'd like is to be able to specify a reconnect strategy so that we continually try to retry (according to the reconnectStrategy) if we lose our TLS connection / fail to talk to a node in the cluster. Also, I'd like that we actually queue messages when we're offline instead of throwing an error and taking down the process.

Node.js Version

20.11.1

Redis Server Version

7.0.10

Node Redis Version

4.6.13

Platform

linux

Logs

No response

@kseth kseth added the Bug label Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant