[QUESTION] different behavior for async and sync cluster #544

rhponczkowski · 2024-01-25T14:01:35Z

In a very simple scenario:
Given:

Redis cluster (6 nodes: 3 masters, 3 replicas)
client Connecting to cluster using AsyncRedisCluster with default options

When:
1.one of redis master nodes goes down
2. replica takes over
3. couple seconds after redis node fail, client send get for key which is on the shard which master just failed

Then:
reds++ tries to connect to node which is down, when it fails Future holds exception for user code: "Failed to connect to Redis ... connection refused, err: 1, errno: 107.

Question: This behavior is different to what can be observed when we use synchronous version of RedisCluster. In synchronous version if connecting to node fails it updates slot mapping and then send request to newly elected master.
Why is that different?

Environment:

OS: Windows
Compiler: Visual Studio 16
hiredis version: 1.2.0
redis-plus-plus version: 1.3.11

sewenew · 2024-02-04T01:47:50Z

Can you show me the minimum code that reproduce your problem?

You can try the following code, if you continue sending command to it, once the new master is elected, AsyncRedisCluster sends command to the new master node.

    AsyncRedisCluster cluster("redis://127.0.0.1:7000");
    while (true) {
        try {
            cout << *(cluster.get("key").get()) << endl;
        } catch (const Error &e) {
            cout << e.what() << endl;
        }
        this_thread::sleep_for(chrono::seconds(1));
    }

It should print the value, and once the master node is down, it prints error info: failed to connect to Redis (127.0.0.1:7001): Connection refused, err: 1, errno: 61, until the new mater is elected, and it print value again:

val
val
...
val
failed to connect to Redis (127.0.0.1:7001): Connection refused, err: 1, errno: 61
failed to connect to Redis (127.0.0.1:7001): Connection refused, err: 1, errno: 61
...
failed to connect to Redis (127.0.0.1:7001): Connection refused, err: 1, errno: 61
val
val

Regards

rhponczkowski · 2024-02-05T09:15:33Z

Sure, here is example code:

    AsyncRedisCluster cluster("redis://127.0.0.1:7000");
    cout << *(cluster.get("key").get()) << endl;    
    this_thread::sleep_for(chrono::seconds(1));
    // here master goes down but we don't interact with redis for now,  for example doing some other work...
    this_thread::sleep_for(chrono::seconds(10)); //during that time new master is elected
    while (true) {
        try {
            cout << *(cluster.get("key").get()) << endl;
            break;
        } catch (const Error &e) {
            cout << e.what() << endl;
        }
        this_thread::sleep_for(chrono::seconds(1));
    }

the output would be:

val
failed to connect to Redis (127.0.0.1:7001): Connection refused, err: 1, errno: 61
val

in this case one request fails despite redis cluster being already in stable state.
And it doesn't when using synchronous api.

sewenew · 2024-02-08T02:46:59Z

Thanks for the info!

Both async and sync API sends command to the old master node, since the client has no idea that a new master has been elected (since master goes down but we don't interact with redis for now, for example doing some other work). Since the old master is down, redis client gets an IoError. However, the difference is that sync API updates node-slot mapping synchronously, and sends (retries) command to the new master, while the async API updates the mapping asynchronously, sets exception info and returns immediately. And that's why you got an error info with async api.

Regards

rhponczkowski · 2024-02-08T12:25:37Z

Such was my assumption. However, the question is if different behavior between synchronous and asynchronous APIs was intentional? In the asynchronous version. we encounter a situation where we receive an error even though everything is fine with the cluster. I guess it is possible to implement retry also in asynchronous version when mapping is updated.
The issue I see is that in the application, I have to assume that when I receive an error, it doesn't necessarily mean that something is actually wrong with cluster, and as a rule, I should try querying several times. is this a common pattern of using Redis?

sewenew · 2024-02-08T13:54:51Z

I have to admit that I didn't realize this problem before your issue. However, so far, it's not an easy job to implement the retry with async interface (you know the async stuff is much more complicated than the sync one). Instead, I might add some more aggressive strategy to update the mapping.

is this a common pattern of using Redis?

Yes, if you get an error, the cluster might be in an unhealthy state, and you might want to retry it in your application or give up the request, and report error message to your end user.

Regards

rhponczkowski changed the title ~~[QUESTION] different behavior for async cluster than sync version~~ [QUESTION] different behavior for async and sync cluster Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] different behavior for async and sync cluster #544

[QUESTION] different behavior for async and sync cluster #544

rhponczkowski commented Jan 25, 2024

sewenew commented Feb 4, 2024

rhponczkowski commented Feb 5, 2024

sewenew commented Feb 8, 2024

rhponczkowski commented Feb 8, 2024

sewenew commented Feb 8, 2024

[QUESTION] different behavior for async and sync cluster #544

[QUESTION] different behavior for async and sync cluster #544

Comments

rhponczkowski commented Jan 25, 2024

sewenew commented Feb 4, 2024

rhponczkowski commented Feb 5, 2024

sewenew commented Feb 8, 2024

rhponczkowski commented Feb 8, 2024

sewenew commented Feb 8, 2024