Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Cluster slots polling #561

Open
mike1821 opened this issue Apr 16, 2024 · 6 comments
Open

[QUESTION] Cluster slots polling #561

mike1821 opened this issue Apr 16, 2024 · 6 comments

Comments

@mike1821
Copy link

Hello,

I am using an async cluster object to establish a connection towards one of the master nodes (0).

    sw::redis::ConnectionOptions opts;
    sw::redis::AsyncRedisCluster client;
    opts.host = "db-cluster-0-ip";
    opts.port = 6379;

    client = sw::redis::AsyncRedisCluster(opts);

The first time I am trying to interact with the database I am getting the following exception.

   void sample() {
    
    m_db_client.command<OptionalString>("client", "getname",
    [](std::future<OptionalString> &&fut) {
        try {
            auto val = fut.get();
            std::cout << *val << std::endl;
            
        } catch (const Error &e) {
            std::cout << catch  failure,  message=" << err.what()); << std::endl;
        }
    });
terminate called after throwing an instance of 'sw::redis::Error
what():  Slot is out of range: 15342

I know that redis++ initiates a polling thread in order to monitor the cluster sharding. My question is when does this operation start? Is it upon a successful connection or after a command? Or something different.

I am assuming that when the initial connection occurs cluster is not ready. However the setup completes after a while. Given that the cluster has been setup correctly after some time, would the library update the cluster map internally?

Could you please propose a way to correctly handle this exception? Currently is causing SIGABRT on the client side.

Environment:

OS: Rocky Linux
Compiler: g++ 8.5.0
hiredis version: v1.0.0, master]
redis-plus-plus version: 1.3.10

@sewenew
Copy link
Owner

sewenew commented Apr 18, 2024

My question is when does this operation start? Is it upon a successful connection or after a command? Or something different.

Once you create an AsyncRedisCluster object, it begins to fetch the slot-node mapping.

Given that the cluster has been setup correctly after some time, would the library update the cluster map internally?

YES

Could you please propose a way to correctly handle this exception

You should catch the exception.

B.T.W. I'ved fixed the slot uncovered problem, and you can update your code and try it again. Check this issue for detail.

Regards

@mike1821
Copy link
Author

mike1821 commented Apr 22, 2024

Hello,

On my understanding this exception was handled internally in the library code and it was not rethrown in 1.3.10 release of the library. As a result this is not something I can handle on the client side.

However, I did repeat the test using 1.3.12 and I am still getting an abort signal. Based on the backtrace I can see the following:

#0  0x00007f2c398f0acf in raise () from /lib64/libc.so.6
#1  0x00007f2c398c3ea5 in abort () from /lib64/libc.so.6
#2  0x00007f2c3a29109b in __gnu_cxx::__verbose_terminate_handler() [clone .cold.1] () from /lib64/libstdc++.so.6
#3  0x00007f2c3a29754c in ?? () from /lib64/libstdc++.so.6
#4  0x00007f2c3a2996a0 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#5  0x000000000208f460 in ?? ()
#6  0x00007f2c3caed3df in sw::redis::AsyncShardsPool::_get_node (this=<optimized out>, slot=34233488) at /src/sw/redis++/async_shards_pool.cpp:231
#7  0x00007f2c3caeda35 in sw::redis::AsyncShardsPool::_get_pool (this=0x208f460, slot=15342) at /src/sw/redis++/async_shards_pool.cpp:168
#8  0x00007f2c3caedc2f in sw::redis::AsyncShardsPool::_fetch (this=0x3bee, slot=15342) at /src/sw/redis++/async_shards_pool.cpp:164
#9  0x00007f2c3caedcc9 in sw::redis::AsyncShardsPool::fetch (this=0x208f460, key=...) at /src/sw/redis++/async_shards_pool.cpp:67
#10 0x00007f2c3caebb16 in sw::redis::AsyncRedisCluster::redis (this=0x6def08, hash_tag=..., new_connection=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1018

This time I am getting a SlotUncoveredError exception. It seems that slot 15342 which maps to node-2 is not handled properly, probably because the cluster setup is not yet known to the library. Please note that upon respawn the operation succeeds without problem.

I noticed a comment in the library code regarding this SlotUncoveredError. Could the fact that an async update is called in this case be related to my error?

@georgasa
Copy link

Hello,

Related to this topic, how the library behaves when the Redis cluster setup is in progress (the hash slotting is still in progress at Redis side). Should library send periodically a cluster info command so as to understand that cluster setup is finished and DB is ready for operations? Or should the DB clients do so. In this case, is 'cluster info' command supported by the library?

Thank you,
Apostolos

@mike1821
Copy link
Author

mike1821 commented Apr 23, 2024

I added on the client side a polling "CLUSTER INFO" command (using generic command interface) which blocks until the cluster setup is complete (i.e assigned slots 16384).

"CLUSTER INFO: cluster_state:ok\r\ncluster_slots_assigned:16384\r\ncluster_slots_ok:16384\r\ncluster_slots_pfail:0\r\ncluster_slots_fail:0\r\ncluster_known_nodes:3\r\ncluster_size:3\r\ncluster_current_epoch:2\r\ncluster_my_epoch:0\r\ncluster_stats_messages_ping_sent:227\r\ncluster_stats_messages_pong_sent:218\r\ncluster_stats_messages_sent:445\r\ncluster_stats_messages_ping_received:217\r\ncluster_stats_messages_pong_received:227\r\ncluster_stats_messages_meet_received:1\r\ncluster_stats_messages_received:445\r\n"

Again, I see the same exception.

"terminate called after throwing an instance of 'sw::redis::SlotUncoveredError'"}}
"what(): slot 15342 is uncovered"

The question is, given that the hash slots readiness can take some time to complete, does the library get notified somehow, about the final hash slots setup? Can you please help me identify the reason I might getting this error?

@sewenew
Copy link
Owner

sewenew commented Apr 24, 2024

@mike1821 Sorry, but I cannot reproduce your problem. If you catch the exception, your application should not terminate. Please try the following code:

    auto cluster = AsyncRedisCluster("tcp://127.0.0.1:7000");
    while (true) {
        try {
            cout << *(cluster.get("b").get()) << endl;
        } catch (const Error &e) {
            cout << e.what() << endl;
        }
        this_thread::sleep_for(chrono::seconds(1));
    }

I manually removed the slots where key b located from the cluster. With latest redis-plus-plus, the above code prints the following message without terminating:

CLUSTERDOWN Hash slot not served
slot 3300 is uncovered
slot 3300 is uncovered
slot 3300 is uncovered
....

@sewenew
Copy link
Owner

sewenew commented Apr 24, 2024

@georgasa So far, redis-plus-plus tries to fetch the slot-node mapping once the AsyncRedisCluster is created. However, it only checks if it can get the mapping, but does not check if all slots have been covered by the mapping. Your scenario is an edge case, I'll take a look on how to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants