Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieving Shards information from RedisCluster #260

Open
mellis13 opened this issue Aug 12, 2021 · 5 comments
Open

Retrieving Shards information from RedisCluster #260

mellis13 opened this issue Aug 12, 2021 · 5 comments

Comments

@mellis13
Copy link

For the application I am working on, it is helpful for performance to know the layout of the redis cluster with regards to hash slot distribution between the nodes. Is it possible to retrieve this information from RedisCluster? It seems like there isn't any access methods for the private member variable ShardsPool _pool. I'd like to avoid a redundant CLUSTER SLOTS command since this is done during RedisCluster object creation, and RedisCluster maintains the most up-to-date information on shards. If there isn't a method, would the community be open to a method that returns access to _pool for inspection but not modification?

@sewenew
Copy link
Owner

sewenew commented Aug 13, 2021

it is helpful for performance to know the layout of the redis cluster with regards to hash slot distribution between the nodes.

What's your scenario? Can you give some example? Since redis-plus-plus will cache the slot mapping, when you send commands to Redis Cluster, it, in fact, sends the command to the right node directly. There won't be any performance penalty.

Even if you get the underlying ShardPool, it might not help. Because if the slot-mapping changes (it might change just after you get the ShardsPool), ShardsPool will be out-of-date, and you need to manually update it.

Regards

@mellis13
Copy link
Author

Thanks for the quick response. The use case is specific to a Redis module being used. The general idea is that a copy of data is placed on every cluster node to facilitate efficient parallel computation. This placement of copied data requires knowing the cluster slots assigned to each database node. Re-sharding is not a concern in this specific use case.

@sewenew
Copy link
Owner

sewenew commented Aug 15, 2021

The general idea is that a copy of data is placed on every cluster node to facilitate efficient parallel computation. This placement of copied data requires knowing the cluster slots assigned to each database node. Re-sharding is not a concern in this specific use case.

If I understand correctly, it seems that you don't even need a Redis Cluster, instead, you need several standalone Redis instances. So that you can parallel the computation. You can create a Redis object (not RedisCluster) pool by creating a Redis object for each node, and randomly pick one from the pool for each operation. This also works even if you deploy these node as a Redis Cluster.

If you insist on using a RedisCluster, and don't need to worry about the re-sharding problem, you can also write your data on each node WITH THE SAME KEY by Redis object. When you need to operate the data, you can call RedisCluster::redis("random-hash-tag", false) with a randomly generate hash tag to randomly pick a node, and send the command to it.

If there isn't a method, would the community be open to a method that returns access to _pool for inspection but not modification?

In fact, you can manually create a ShardsPool with its constructor:

ShardsPool(const ConnectionPoolOptions &pool_opts,
                const ConnectionOptions &connection_opts,
                Role role);

The constructor will call CLUSTER SLOTS to get the slot mapping info. Then you can use the ShardsPool::shards method to get the slot-mapping info.

Why there's any method to get the underlying node/slot info?

As I mentioned in above comments, this mapping might change at any time. With such a method, you might get an out-of-date info, and it might mislead you.

I'll keep this issue open to see if others have similar requirements for the node-slot mapping.

If you still have any problem on it, feel free to post it.

Regards

@mlaczin
Copy link

mlaczin commented Mar 21, 2023

I have a use case for this, actually. We're using Redis JSON, and in particular, JSON.MGET which (with redis++) demands that we use the command interface. However, general multikey queries will only be routed to the node associated to the first key, so we need to collect keys which we know (via the hashslot) will be on a particular node and send them in one batch.

For example, if we have five keys (1, 2, 3, 4, 5) on two nodes (A, B), where 1,2,3 are on A and 4,5 are on B, then this command will fail:

JSON.MGET 1 2 3 4 5 $

And there will be two nil's in the output (associated to keys 4, 5).

Thus we need to send two commands:

JSON.MGET 1 2 3 $
JSON.MGET 4 5 $

Where we compute the hashslots of 1, 2, 3, 4, 5 and divide them properly.

In our case, we have several million keys that need to be divided in this way. Consequently, having access to the hashslot ranges associated to each node would be helpful.

@sewenew
Copy link
Owner

sewenew commented Mar 23, 2023

@mlaczin If I understand correctly, even if you get the slot range info, you cannot call RedisCluster::command("JSON.MGET", ...) with 1 2 3, if these 3 keys do NOT belong to the same slot. Because Redis Cluster requires that all keys in one command must belong to the same slot, instead of the same node.

One solution is that you can use hash-tag to ensure these keys belong to the same slot. Once you do that, you can call RedisCluster::redis("hash-tag", false) to create a Redis object and send the command with it.

Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants