Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic Sentinel functionality #836

Merged
merged 27 commits into from
Jul 23, 2023
Merged

Conversation

felipou
Copy link
Contributor

@felipou felipou commented May 5, 2023

Hi! I've created this implementation of some Redis Sentinel logic, which I think would be a nice addition to this crate. I'm going to need something like this anyway for my projects, so I've decided to try and contribute it here first.

This is my first Rust open-source contribution, and I have very little Rust experience, so if anything seems weird please enlighten me!

Before doing this, I've seen that there are two issues about adding sentinel support:
#539
#703

And two attempts to implement this, one in a fork, and one in a separate crate:
main...doyshinda:redis-rs:sentinel
https://github.com/aljazerzen/redis-sentinel-rs/blob/main/src/client.rs

Both are quite old and not quite the way I'd like then to be. Also, I've been wanting to practice writing some Rust code, so I decided to implement something myself, which got me to this PR.

I'm not very confident I got the API structured in a good way, so any suggestions on how to improve it would be greatly appreciated. I tried to come up with something along the lines of the Python client (https://github.com/redis/redis-py) but also trying the follow the recommendations for sentinel clients (https://redis.io/docs/reference/sentinel-clients).

Copy link
Contributor

@nihohit nihohit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very impressive work, well done!
I think that the general design is good, most of the comments are about the implementation and the tests.

F: Fn(&ConnectionInfo) -> RedisResult<T>,
{
let mut last_err = None;
for connection_info in self.sentinels_connection_info.iter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://redis.io/docs/reference/sentinel-clients/#step-1-connecting-to-the-first-sentinel
we'll try first the Sentinel that was reachable in the previous connection attempt, minimizing latency.

Consider just caching the latest connection, instead of triggering a new connection on every attempt. This is especially meaningful if using TLS, since the handshake can be slow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do that, I'd have to require a mutable reference, or use interior mutability, right? What's the most common/recommended approach in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting question.
In general I would've gone with interior mutability if you believe this method might be called in parallel from multiple sources, and mutable reference if you believe that this method usually would be called once every so often.
If you're going with interior mutability, make sure that you're using a mutex that is safe to hold over an await point, since the mutex might be used in both sync and async methods.

if max == 0 {
0
} else {
let nanos = SystemTime::now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use a proper random generator instead. You can do something like this:

.choose(&mut rng)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have commented (both in the PR and the code) why I did this. I mentioned this in my reply to you last comment below, but basically I wanted to avoid an extra dependency just for this, since it seems rather simple and harmless enough that any seemingly random number would suffice. Specially if this isn't going to be behind a feature flag. Does that make sense? Or am I being overzealous about adding new dependencies?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both oorandom, rand and fastrand are used by in redis-rs dependency tree.
IMO, there's no problem with adding a direct dependency on one of them, but I'll give @jaymell the final word on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we already use rand elsewhere, we can just make it non-optional.

For what it's worth, I think it's also considering making the sentinel functionality itself a feature. It would be easy to do, since the code is so self-contained.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaymell, @nihohit - any workaround to connect just to master sentinel node using your library? If I specify protocol as redis-sentinel in a connection string, it panics. Would be glad for some ad-hoc solution until this is merged.

#[cfg(feature = "aio")]
async fn async_try_sentinel_masters(
connection_info: &ConnectionInfo,
) -> RedisResult<Vec<HashMap<String, String>>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Vec? master should be singular.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because I'm using the SENTINEL MASTERS command, which may return multiple masters, instead of SENTINEL get-master-addr-by-name master-name. The latter is actually the suggested command in the sentinel client doc, but I the former is what redis-py uses, and I believe it's because only this command returns the master's flags (which are used to identify whether a master is valid).

The return of this method is then used by async_get_first_valid_master to find the master with the desired name (and now I'm thinking that this method name doesn´t make much sense, since there can be only one master with each name).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so this isn't intuitive. Please document the functions, to clarify the intended flow. alternatively, call async_get_first_valid_master from this function, and return the single master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried adding some comments and renaming the async_get_first_valid_master function. I was not sure if I should repeat the comments for the async functions, so for now I just reference the non-async function.


fn try_sentinel_masters(
connection_info: &ConnectionInfo,
) -> RedisResult<Vec<HashMap<String, String>>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same - master is singular

) -> RedisResult<Vec<HashMap<String, String>>> {
let sentinel_client = Client::open(connection_info.clone())?;
let mut conn = sentinel_client.get_connection()?;
let result: Vec<HashMap<String, String>> = sentinel_masters_cmd().query(&mut conn)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just return sentinel_masters_cmd().query(&mut conn)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point! Fixing it now. I'm doing the same in the async version.

let mut servers = vec![];
let mut folders = vec![];
let mut master_ports = vec![];
let start_port = 7000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use a hard-coded port. Find an available port, like here.

Better that you find an available port for each node, too, instead of assuming that you have enough available ports in a row.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I just did it like this because the cluster tests are like this, and I copied some code from it:

let start_port = 7000;

I'll try and change to use a random port now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was far easier than I initially thought, done!

sleep(Duration::from_millis(50));

let mut sentinel_servers = vec![];
let start_port = 27000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll try to improve here too!

.execute(&mut master_con);

// Read commands would go to the replica nodes
assert_eq!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test will be flakey, since it assumes that the replica will sync in time. Instead of testing set & get (here and in the previous test), just send an INFO REPLICATION command and check that the node is indeed a master or replica, and if it is a replica, that it is a replica of the correct master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense, I'll change it now!

.execute(&mut master_con);

for _ in 0..20 {
let mut replica_con = sentinel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, just test that you get a different name on INFO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing it now.

.get_connection()
.unwrap();

assert_eq!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same throughout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing it now.

@nihohit
Copy link
Contributor

nihohit commented May 7, 2023

@jaymell a couple of questions of code order -

  1. Should this be behind a feature flag?
  2. For future extensibility, maybe the module should be under its own folder - sentinel/client.rs, sentinal/connection.rs ?

@felipou
Copy link
Contributor Author

felipou commented May 16, 2023

Sorry for taking so long, and thanks for the quick review!

@nihohit I'll trying to fix everything you pointed out until the end of the week.

@jaymell a couple of questions of code order -

  1. Should this be behind a feature flag?

I actually thought about that, but I'm not sure what's the usual reasoning process to decide about when to put something behind a feature flag. In this case, I guessed it would be only to optimize compilation time for those that do not need this feature. Since it isn't much code, and doesn't bring any extra dependencies. I thought maybe we don't actually need a feature flag.

But this is also related to something that @nihohit pointed out, and which I should have mentioned in the initial description of the PR (and probably a comment in the code too): I avoided using a proper random method to avoid an extra dependency (rand) just for the sentinel (in the selection of the initial replica). Not sure this makes sense though.

  1. For future extensibility, maybe the module should be under its own folder - sentinel/client.rs, sentinal/connection.rs ?

Makes sense for me. I don't actually think the sentinel code will grow much, but it's too soon to have any confidence in that opinion.

@felipou
Copy link
Contributor Author

felipou commented May 20, 2023

Just pushed 3 commits fixing some of the things pointed out.

Two other things I'll try to do later today: support for TLS connections (to masters and replicas), and random ports (instead of fixed ones) in the tests.

For the other points, I've replied the comments and will wait for the responses before modifying anything.

Copy link
Contributor

@nihohit nihohit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good progress.

info_map
}

fn check_master_role(conn: &mut Connection) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_is_connection_to_master

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, changed it!

assert_eq!(info_map.get("role"), Some(&"master"));
}

fn check_replica_info_result(info: String, master_client: &Client) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider:
assert_connection_is_replica_of_correct_master(connection_info: String, expected_master_info: &ConnectionInfo)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

.query(&mut replica_con),
Ok(("foo".to_string(), b"bar".to_vec()))
);
check_replica_role_and_master(&mut replica_con, &master_client);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're checking that all of the connections are replicas of the correct master, but you're not testing the "multiple" part of the test name - you're not verifying that these are connections to separate replicas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, so I just changed this to check that we actually connect to all replicas. I've also changed the number of replicas because there was only one. In the SentinelClient test I only get the connection (no client, so no address), so I couldn't do the same. But I'm thinking of replacing ConnectionInfo instead of Client as you suggested, this will probably allow me to improve the SentinelClient test too.

.await
.unwrap();

let info_map = parse_replication_info(&info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use check_master_role?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't extracted this part as a separate function yet, the check_master_role didn't use async. But while renaming with your suggestions, I've also extracted this part to a separate function.

.query_async(&mut replica_con)
.await;
assert_eq!(mget_result, Ok(("foo".to_string(), b"bar".to_vec())));
async_check_replica_role_and_master(&mut replica_con, &master_client).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as for sync test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done!

let port = start_port + (node * (replicas_per_master + 1));
for _ in 0..masters {
let port = get_random_available_port();
sleep(Duration::from_millis(25));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you sleep after getting the port number? and why don't you sleep after lines 135?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just forgot it there really, I was having an issue with the random ports so I put it there to see if it helped, but it turns out it was something else entirely. I'll do some clean up in those tests, I've also noticed I've left some commented lines in the "spawn" functions which can be removed.

.unwrap();
let mut replica_conn_infos = vec![];

for _ in 0..3 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a verbatim copy of lines 85 and on? if so, please extract to a function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and there is something similar in the async tests, I'll extract both to a function.

@felipou felipou force-pushed the add-sentinel branch 2 times, most recently from 844e7af to 841e197 Compare May 22, 2023 12:35
@nihohit
Copy link
Contributor

nihohit commented May 22, 2023

when referencing other functions, please use the rust linking format
Looking fine. I'm waiting for TLS support and connection caching, and then I'll go through the code again.

@felipou
Copy link
Contributor Author

felipou commented May 24, 2023

when referencing other functions, please use the rust linking format

Nice, didn't know about that, thanks! Guess I have to read that book now 😄

Looking fine. I'm waiting for TLS support and connection caching, and then I'll go through the code again.

I just pushed the TLS support!

About the connection caching, I was about to start but I noticed/remembered that async connections are a different struct. Should I just have two vectors of connections, one for sync and other for async connections? I think I'll try implementing it that way and see how it goes.

@felipou
Copy link
Contributor Author

felipou commented May 25, 2023

Just pushed a new commit with the connection caching!

@felipou
Copy link
Contributor Author

felipou commented May 26, 2023

I've noticed some tests failed, so I pushed some commits to try and fix that.

But one of the problems was that I used the TlsMode enum that is in the cluster module, so it didn't exist in tests without this feature enabled. To fix it in the short term I just duplicated that enum inside the sentinel module, but maybe it's an opportunity to move this up? I think this could be useful in more places (for example, replacing the insecure attribute of the ConnectionAddr::TcpTls type). If not, any suggestions as to how we should specify a TLS connection?

@nihohit
Copy link
Contributor

nihohit commented May 28, 2023

Hi, great work. You found a reasonable way around the way the API is structured today in a way which is consistent with the current API language of the library.
But, I think that the last 2 commits significantly expand the size of this PR, and they contains near-duplication of existing structures, in order to maintain this consistency. This makes me think that maybe I was wrong, and the sentinel should only return a host-port tuple, and let the user handle connections, TLS, and caching.
So, essentially, I think that this PR in it's current state is fine in the broad strokes - I still didn't go line-by-line on the last 2 commits, but the quick read-through looked mostly fine. The question is whether the library wants to provide the minimal API, which is just returning host-port tuples, or to support in-house all the features and behaviors that a user might want.
This kind of strategic choice should be done by @jaymell, not me.

@felipou I do think you've done a good job, and I hope that if we'll decide to minimize the blast zone of this change you won't take it as a critique of your contribution.

@felipou
Copy link
Contributor Author

felipou commented May 28, 2023

@felipou I do think you've done a good job, and I hope that if we'll decide to minimize the blast zone of this change you won't take it as a critique of your contribution.

Not at all, you make a fine point, and the API strategy surely comes first. But thanks for saying that anyway! In any case, this has been a great Rust programming and design exercise for me!

There's just one thing I'd like to point out: in the current state of the code, we do connect to the sentinel's monitored servers anyway, as part of the recommended strategy for sentinel clients, to query for the role of the server (using the ROLE command). If the goal is to minimize the API by removing the extra connection attributes, we'd have to either remove that check, or maybe just decide that we won't support TLS in this case?

Also, one thing I've been thinking since before creating the PR but forgot to mention: maybe this code should be in a new crate? After all, it is fairly independent. I even considered just creating the crate first, since I'm going to need this code anyway, and then just deprecating it if this got merged. But I guess for the most part I just believe this sits well within this crate, and as a user I would prefer it to be more feature complete than having to rely on another crate that maybe won't have the same level of quality and maintainability (this seems to be a common discussion as of late, minimal libs vs large libs).

/// TlsMode indicates use or do not use verification of certification.
/// Check [ConnectionAddr](ConnectionAddr::TcpTls::insecure) for more.
#[derive(Clone, Copy)]
pub enum TlsMode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that TlsMode should be moved out of cluster instead of copied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felipou please expose TlsMode instead of copying it

Copy link
Contributor

@nihohit nihohit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@@ -99,6 +99,12 @@ pub enum ErrorKind {
ExtensionError,
/// Attempt to write to a read-only server
ReadOnly,
/// Requested name not found among masters
MasterNameNotFound,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MasterNameNotFoundBySentinel, to clarify the context. Adjust commend and same for NoValidReplicasFound

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, just updated it!


for _ in 0..number_of_replicas {
let replica_client = sentinel
.replica_rotate_for("master1", Some(node_conn_info))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and in assert_connect_to_known_replicas you've hardcoded master1, which means that the functions will need to be fixed if they were to target another master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I guess I ignored this part because I was using just "master1" everywhere. But I've added it as a parameter to this function and all similar ones. I've also extracted the string "master1" to a single variable per test (I've thought about a single global one for the whole file, but thought it was best to just replicate it for each test).

.unwrap();
let mut replica_con = replica_client.get_connection().unwrap();

assert!(replica_conn_infos.contains(&replica_client.get_connection_info().addr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notice that this check doesn't check that all connections in replica_conn_infos are represented - you might get the same connection from replica_rotate_for, and the test would still pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is related to the next comment, I'll reply below.

replica_conn_infos
}

fn assert_connect_to_known_replicas(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function seems to repeat the tests in connect_to_all_replicas, and to assume that connect_to_all_replicas is called before it. What's the point of this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you've asked, I'm not really sure. I guess I just wanted to make sure that a connecting a bunch of times would always end up in the same replicas found by the function connect_to_all_replicas.

The idea is that connect_to_all_replicas connects N times (the number of replicas), checking that each client is to a different address (a different replica), while also asserting that it is a connection to a replica of the target master, and storing the addresses to be returned.

These addresses are then passed to the assert_connect_to_known_replicas which will just connect a bunch of times to replicas, checking that every connection ends up in one of the known replicas (which were found by the connect_to_all_replicas), while also checking that the replica is of the target master.

So yes, both are quite similar. I'm not sure if it makes sense to keep the assert_connect_to_known_replicas function, if you don't see a point in keeping it, I can just remove it.

}

#[test]
fn test_sentinel_server_down() {
Copy link
Contributor

@nihohit nihohit May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the test check, that happens on server down? the intended behavior isn't clear from the test's name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tests that everything works the same, we'll just skip the non-working sentinel and use the next one in the list. It's exactly the same code as test_sentinel_connect_to_multiple_replicas, but stopping the first sentinel before connecting to the replicas. Not sure why I didn't stop the sentinel before connecting to the master though, now I'm thinking it makes more sense to stop it earlier perhaps?

}

/// Get a list of all masters (using the command SENTINEL MASTERS) from the
/// sentinels (we'll return the result of the first sentinel who responds to that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the part in parenthesis, it's an implementation detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done!

/// error of the last attempt.
///
/// For each sentinel, we first check if there is a cached connection, and if not
/// we attemp to connect to it (skipping that sentinel if there is an error during
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attemp -> attempt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done!

}

// We can unwrap here because we know there is at least one connection info.
// Maybe we should have a more specific error for this situation?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this line can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done!

}

// We can unwrap here because we know there is at least one connection info.
// Maybe we should have a more specific error for this situation?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this line can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done!

}

// async methods
#[cfg(feature = "aio")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor:
Maybe move this to under the sync implementation, since some of the comments reference functions from there?

Also, please be consistent, and make sure that any function comment that can reference the sync implementation does so. you don't have to add a comment just for the reference, but if you already have one, make sure that the repetition of content is minimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it below, makes sense. But I've also removed the references, and just expanded the comment above, is that ok? I've also kept the doc comments in the public methods, but they're exactly the same as the sync versions. You think in this case we should just reference the sync versions to avoid repetition? I'm not sure what would be the best practice in this case.

@nihohit
Copy link
Contributor

nihohit commented Jun 1, 2023

There's just one thing I'd like to point out: in the current state of the code, we do connect to the sentinel's monitored servers anyway, as part of the recommended strategy for sentinel clients, to query for the role of the server (using the ROLE command). If the goal is to minimize the API by removing the extra connection attributes, we'd have to either remove that check, or maybe just decide that we won't support TLS in this case?

So we could return (host, port) pairs, but we won't know which are primaries and which are replicas? Then I think you've made the right decision.

Also, one thing I've been thinking since before creating the PR but forgot to mention: maybe this code should be in a new crate?

A valid question, but I think it's a question to you. IMO this fits in this crate, as a basic Redis capability, as this is the most popular Rust Redis client. If you want to maintain your own crate, that's up to you.

@felipou
Copy link
Contributor Author

felipou commented Jun 1, 2023

There's just one thing I'd like to point out: in the current state of the code, we do connect to the sentinel's monitored servers anyway, as part of the recommended strategy for sentinel clients, to query for the role of the server (using the ROLE command). If the goal is to minimize the API by removing the extra connection attributes, we'd have to either remove that check, or maybe just decide that we won't support TLS in this case?

So we could return (host, port) pairs, but we won't know which are primaries and which are replicas? Then I think you've made the right decision.

We know from the point of view of the sentinels, because when we want a primary we ask specifically for that, and the same for the replicas (there's a separate command to ask Redis for the replicas). I think the docs recommend that we check the role (with the ROLE command) because there can be some delays when syncing the server's roles with the sentinels. So it is possible that we could ask for a master, and the (host, port) returned is already a replica because the sentinel isn't in sync with that server at that point. But if we're returning just the address, there is always the possibility that server will have its role changed even if we ask for it, because there will always be some delay after getting the address, and before we actually connect to that address.

Thinking about it as I write this, I believe there could be some changes to best guarantee the role of a server when using the SentinelClient. Perhaps we should make the Sentinel just return the address as you suggested, and the SentinelClient can ask for the role after connecting to a server (and before returning the connection).

Also, one thing I've been thinking since before creating the PR but forgot to mention: maybe this code should be in a new crate?

A valid question, but I think it's a question to you. IMO this fits in this crate, as a basic Redis capability, as this is the most popular Rust Redis client. If you want to maintain your own crate, that's up to you.

I certainly do not want to maintain my own crate for this, I much prefer for this code to be in this crate!

@nihohit
Copy link
Contributor

nihohit commented Jun 1, 2023

Thinking about it as I write this, I believe there could be some changes to best guarantee the role of a server when using the SentinelClient. Perhaps we should make the Sentinel just return the address as you suggested, and the SentinelClient can ask for the role after connecting to a server (and before returning the connection).

IMO the Sentinel struct should expose a behavior that matches the Redis best practices. We shouldn't split the correct behavior between multiple structs. So, I think the code is fine as-is.

@felipou
Copy link
Contributor Author

felipou commented Jun 1, 2023

Thinking about it as I write this, I believe there could be some changes to best guarantee the role of a server when using the SentinelClient. Perhaps we should make the Sentinel just return the address as you suggested, and the SentinelClient can ask for the role after connecting to a server (and before returning the connection).

IMO the Sentinel struct should expose a behavior that matches the Redis best practices. We shouldn't split the correct behavior between multiple structs. So, I think the code is fine as-is.

It's just that currently there is this problem where the role of the server can change after we check it, and before the connection is created. So I'm thinking we should actually return the connection that we used to check for the role (I've thought about that before, but not because of this, it just seemed that we were "wasting" that connection). If I recall correctly, that's more in line with the recommended practices, because in this case if the role changes, the connections are dropped.

But changing the Sentinel to return connections makes the SentinelClient almost redundant? Maybe not, but I'm not sure yet.

Copy link
Contributor

@nihohit nihohit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @jaymell

Copy link
Contributor

@nihohit nihohit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @jaymell

@jaymell
Copy link
Contributor

jaymell commented Jun 21, 2023

Thanks for reviewing @nihohit. I'll give this a review soon. Please rebase @felipou when able 🙏

@wojciechbator
Copy link

wojciechbator commented Jul 3, 2023

Hey @felipou, fantastic job supporting sentinel. @jaymell Is it possible to release this soon after Felipe rebases and finishes it?

@felipou
Copy link
Contributor Author

felipou commented Jul 4, 2023

Sorry for taking so long, I expect to be able to work on this until the end of the week!

@wojciechbator
Copy link

@felipou good news, fingers crossed!

@felipou
Copy link
Contributor Author

felipou commented Jul 13, 2023

Sorry for the delay, rebase done!

@jaymell
Copy link
Contributor

jaymell commented Jul 14, 2023

@felipou Thanks for the PR! I'm largely going to defer to @nihohit's excellent review here, but a couple concerns:

  1. I think we should go ahead and put the sentinel functionality behind a feature flag, and implement the proper random selection logic as discussed in prior comments. It's consistent w/ how we've handled other functionality like clustered support, so I think it's a good approach here as well.
  2. Can we use aio::MultiplexedConnection and avoid the use of aio::Connection? Unfortunately the latter is a bit of a mess that will hopefully be deprecated/obsolete soon, but as it stands has serious risks (e.g., Corrupt connections #325) and should be avoided when possible.

Improve sentinel tests by checking that we actually connect to all
replicas by comparing the server addresses.
Since TlsMode is behind a feature flag, it would only be available
for the sentinel module when the cluster feature was enabled, but
we want to use the sentinel independently from that. Duplicating is
not ideal, so maybe we'll end up joining these in the root module.
I've also moved the async functions block to below the "main" impl
block.
@felipou
Copy link
Contributor Author

felipou commented Jul 22, 2023

Ok, done!

@jaymell
Copy link
Contributor

jaymell commented Jul 23, 2023

Merging. Thanks again for the great contribution!

@jaymell jaymell merged commit 80a4a38 into redis-rs:main Jul 23, 2023
8 of 10 checks passed
@felipou
Copy link
Contributor Author

felipou commented Jul 24, 2023

It was my pleasure! Thanks a lot for the reviews and suggestions @nihohit and @jaymell, and also for maintaining this project! This sure feels like high quality open-source!

altanozlu pushed a commit to altanozlu/redis-rs that referenced this pull request Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants