Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] different behavior for async and sync cluster #544

Open
rhponczkowski opened this issue Jan 25, 2024 · 5 comments
Open

[QUESTION] different behavior for async and sync cluster #544

rhponczkowski opened this issue Jan 25, 2024 · 5 comments

Comments

@rhponczkowski
Copy link

In a very simple scenario:
Given:

  1. Redis cluster (6 nodes: 3 masters, 3 replicas)
  2. client Connecting to cluster using AsyncRedisCluster with default options

When:
1.one of redis master nodes goes down
2. replica takes over
3. couple seconds after redis node fail, client send get for key which is on the shard which master just failed

Then:
reds++ tries to connect to node which is down, when it fails Future holds exception for user code: "Failed to connect to Redis ... connection refused, err: 1, errno: 107.

Question: This behavior is different to what can be observed when we use synchronous version of RedisCluster. In synchronous version if connecting to node fails it updates slot mapping and then send request to newly elected master.
Why is that different?

Environment:

  • OS: Windows
  • Compiler: Visual Studio 16
  • hiredis version: 1.2.0
  • redis-plus-plus version: 1.3.11
@rhponczkowski rhponczkowski changed the title [QUESTION] different behavior for async cluster than sync version [QUESTION] different behavior for async and sync cluster Jan 25, 2024
@sewenew
Copy link
Owner

sewenew commented Feb 4, 2024

Can you show me the minimum code that reproduce your problem?

You can try the following code, if you continue sending command to it, once the new master is elected, AsyncRedisCluster sends command to the new master node.

    AsyncRedisCluster cluster("redis://127.0.0.1:7000");
    while (true) {
        try {
            cout << *(cluster.get("key").get()) << endl;
        } catch (const Error &e) {
            cout << e.what() << endl;
        }
        this_thread::sleep_for(chrono::seconds(1));
    }

It should print the value, and once the master node is down, it prints error info: failed to connect to Redis (127.0.0.1:7001): Connection refused, err: 1, errno: 61, until the new mater is elected, and it print value again:

val
val
...
val
failed to connect to Redis (127.0.0.1:7001): Connection refused, err: 1, errno: 61
failed to connect to Redis (127.0.0.1:7001): Connection refused, err: 1, errno: 61
...
failed to connect to Redis (127.0.0.1:7001): Connection refused, err: 1, errno: 61
val
val

Regards

@rhponczkowski
Copy link
Author

Sure, here is example code:

    AsyncRedisCluster cluster("redis://127.0.0.1:7000");
    cout << *(cluster.get("key").get()) << endl;    
    this_thread::sleep_for(chrono::seconds(1));
    // here master goes down but we don't interact with redis for now,  for example doing some other work...
    this_thread::sleep_for(chrono::seconds(10)); //during that time new master is elected
    while (true) {
        try {
            cout << *(cluster.get("key").get()) << endl;
            break;
        } catch (const Error &e) {
            cout << e.what() << endl;
        }
        this_thread::sleep_for(chrono::seconds(1));
    }

the output would be:

val
failed to connect to Redis (127.0.0.1:7001): Connection refused, err: 1, errno: 61
val  

in this case one request fails despite redis cluster being already in stable state.
And it doesn't when using synchronous api.

@sewenew
Copy link
Owner

sewenew commented Feb 8, 2024

Thanks for the info!

Both async and sync API sends command to the old master node, since the client has no idea that a new master has been elected (since master goes down but we don't interact with redis for now, for example doing some other work). Since the old master is down, redis client gets an IoError. However, the difference is that sync API updates node-slot mapping synchronously, and sends (retries) command to the new master, while the async API updates the mapping asynchronously, sets exception info and returns immediately. And that's why you got an error info with async api.

Regards

@rhponczkowski
Copy link
Author

Such was my assumption. However, the question is if different behavior between synchronous and asynchronous APIs was intentional? In the asynchronous version. we encounter a situation where we receive an error even though everything is fine with the cluster. I guess it is possible to implement retry also in asynchronous version when mapping is updated.
The issue I see is that in the application, I have to assume that when I receive an error, it doesn't necessarily mean that something is actually wrong with cluster, and as a rule, I should try querying several times. is this a common pattern of using Redis?

@sewenew
Copy link
Owner

sewenew commented Feb 8, 2024

I have to admit that I didn't realize this problem before your issue. However, so far, it's not an easy job to implement the retry with async interface (you know the async stuff is much more complicated than the sync one). Instead, I might add some more aggressive strategy to update the mapping.

is this a common pattern of using Redis?

Yes, if you get an error, the cluster might be in an unhealthy state, and you might want to retry it in your application or give up the request, and report error message to your end user.

Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants