Random subsetting with rendezvous hashing LB policy #2

s-matyukevich · 2024-04-10T21:44:44Z

No description provided.

villainb-dg

Missing dot at end of sentence

villainb-dg · 2024-04-11T08:34:13Z

A68-random-subsetting.md

+* The policy receives a single configuration parameter: `subset_size`, which must be configured by the user.
+* When the lb policy is initialized it also creates a random 32-byte long `salt` string. 
+* After every resolver update the policy picks a new subset. It does this by implementing `rendezvous hashing` algorithm:
+  * Concatenate string representation of every address in the list with `salt`.


I misunderstood this sentence at first, I thought you meant to concatenate all the addresses together and then add the salt. I think making it more explicit would be easier to understand for the reader, eg.

Concatenate salt to each address in the list

Also might be good to specify what string representation to use for an IP address as multiple exist.

updated this bullet point.

Also might be good to specify what string representation to use for an IP address as multiple exist.

This might be language dependent, but we don't care which one we should use. It just must be stable for a given client.

Got it, so multiple clients using a different string representation is not an issue?

Also, would this be a problem for a client:

>>> import ipaddress >>> ipaddress.ip_address("fe80::1.2.3.4") IPv6Address('fe80::102:304')

The string representation of this IPv6 address can be in multiple forms. They are both valid and both represent the same 64 bit integer address but a common bug is for application code to think they are distinct addresses. I think it's fine for your use-case but wanted to validate it.

villainb-dg · 2024-04-11T08:39:13Z

A68-random-subsetting.md

+
+#### Low connection churn during server rollouts
+
+The graphs provided in the previous section prove this is the case in practice (we rollout all servers in the middle of every test) Low connection churn during server rollouts is the primary motivation why rendezvous hashing was used as the subsetting algorithm: it guaranties that if a single server is either added or removed to the IP address list, every client will update at most 1 entry in its subset. This is because the hashes for all unaffected servers remain stable, which guarantees that the order of the servers after sorting also remains stable. 


It might be worth showing how other implementations perform with regards to connection churn. It could also be useful to explain to the reader why the previous graphs demonstrate low connection churn as it might not be obvious at first sight.

I'm also wondering what is the trade-off in situations where more than one server gets added or removed from the list? Is it still superior to use rendezvous hashing or would there be a better algorithm? Not disagreeing with the decision here but trying to cover all grounds

Regarding connection churn, we also get additional chaos across clients due to the fact that we are hashing on IPs rather than some coordinated id, right? That is, if a server we are talking to gets rescheduled to a different pod/cluster we could reshuffle unnecessarily.

It might be worth showing how other implementations perform with regards to connection churn. It could also be useful to explain to the reader why the previous graphs demonstrate low connection churn as it might not be obvious at first sight.

updated the doc to include more details on that.

I'm also wondering what is the trade-off in situations where more than one server gets added or removed from the list? Is it still superior to use rendezvous hashing or would there be a better algorithm? Not disagreeing with the decision here but trying to cover all grounds

The same logic applies: the order of unaffected servers remains stable and number of client-side updates is always less or equal to the number of updated servers. Also mentioned this in the doc.

Regarding connection churn, we also get additional chaos across clients due to the fact that we are hashing on IPs rather than some coordinated id, right? That is, if a server we are talking to gets rescheduled to a different pod/cluster we could reshuffle unnecessarily.

That is correct. gRPC has consistent hashing LB policy for such cases. @atollena has an in-flight gRFC related to that grpc#412

My understanding of the consistent hashing lb work is that it's solving the traditional problem of mapping a key to a server.

I was referring to the fact that rendezvous hashing in general is stable so long as the object we hash on are stable. In this case, we are hashing on IPs, which are relatively stable but probably add some degree of churn to our ordering when pods get rescheduled. This isn't a big deal given that we are doing random subsets to begin with, but just wanted to verify my understanding.

roanta

This looks great, very nice write-up and good amount of details.

How do these reviews go? Do you need sign-off from a quorum of grpc maintainers to start working on a PR?

roanta · 2024-04-11T16:09:40Z

A68-random-subsetting.md

+  * For every resulting entity compute [MurmurHash3](https://en.wikipedia.org/wiki/MurmurHash) hash, which produces 128-byte output.
+  * Sort all entities by hash.
+  * Pick first `subset_size` values from the list.
+* Pass the resulting subset to the child LB policy.


Out of scope for this doc, but I'm curious:

How do LB policies compose? I suspect it depends on the grpc implementation, but do have clear abstraction boundaries separation between selection (e.g. heap, p2c, rr) and the load metric (e.g. orca, least loaded, etc.) ?

In Go a balancer is just an instance of the following interface https://github.com/grpc/grpc-go/blob/master/balancer/balancer.go#L371 If a balancer wants to redirect some work to a child balancer it should implement this logic itself. There are several ways how it can do this, for example:

outlierdetection LB maintains the list of sub channels. For every sub-channel it creates a wrapper so it can get notified whenever a error on a sub-channel occurs. Whenever a err rate on a particular sub-channel reaches some threshold it removes this sub-channel from the list and notifies child balancer that the address list got updated. From the point of view of child balancer this is identical to the situation when a resolver removes the same address because it is no longer available in DNS.

What I am proposing in this gRFC is a lot simpler: we always redirect all work to the child balancer, but do some filtering whenever the address list got updated. And we don't maintain any state in the parent balancer.
In both cases this logic lives in the balancer itself and grpc don't enforce how exactly communication between parent and child balancer should happen.

roanta · 2024-04-11T16:15:38Z

A68-random-subsetting.md

+
+### Handling Parent/Resolver Updates
+
+When the resolver updates the list of addresses, or the LB config changes, Random subsetting LB will run the subsetting algorithm, described above, to filter the endpoint list. Then it will create a new resolver state with the filtered list of the addresses and pass it to the child LB. Attributes and service config from the old resolver state will be copied to the new one. 


This effectively means that we keep all local state (e.g. circuit breakers) in tact for sessions which haven't changed, right?

yes, exactly, we will be reusing the same sub-channel, which contains all state related to a particular connection.

roanta · 2024-04-11T16:18:41Z

A68-random-subsetting.md

+## Rationale
+### Alternatives Considered: Deterministic subsetting
+
+We explored the possibility of using deterministic subsetting in https://github.com/grpc/proposal/pull/383  and got push-back on this for the reasons explained [here](https://github.com/grpc/proposal/pull/383#discussion_r1334587561)


lest they have forgotten ;)

s-matyukevich · 2024-04-11T20:05:46Z

How do these reviews go? Do you need sign-off from a quorum of grpc maintainers to start working on a PR?

yep, something like this. You can find more details here https://github.com/grpc/proposal/blob/master/README.md#process

Co-authored-by: Antoine Tollenaere <atollena@gmail.com>

… into random-subsetting

s-matyukevich · 2024-04-26T17:05:05Z

closed as this is already submitted upstream

s-matyukevich added 5 commits April 10, 2024 15:41

Random subsetting with rendezvous hashing LB policy

97c8e73

Random subsetting with rendezvous hashing LB policy

1058d36

rename folder

ea7db4b

more images

4c2ca73

More images

c1b1b6e

villainb-dg suggested changes Apr 11, 2024

View reviewed changes

villainb-dg reviewed Apr 11, 2024

View reviewed changes

roanta approved these changes Apr 11, 2024

View reviewed changes

s-matyukevich added 2 commits April 11, 2024 13:54

review suggestion

8fd9fff

review comments

d58f609

s-matyukevich changed the title ~~Random subsetting with rendezvous hashing LB policy @s-matyukevich~~ Random subsetting with rendezvous hashing LB policy Apr 11, 2024

s-matyukevich and others added 7 commits April 11, 2024 15:53

Add discussion link

5ce1497

Update A68-random-subsetting.md

802fde1

Co-authored-by: Antoine Tollenaere <atollena@gmail.com>

Update A68-random-subsetting.md

31cdd68

Co-authored-by: Antoine Tollenaere <atollena@gmail.com>

Update A68-random-subsetting.md

bcd293c

Co-authored-by: Antoine Tollenaere <atollena@gmail.com>

replace addresses with endpoints

5c859ca

Merge branch 'random-subsetting' of github.com:s-matyukevich/proposal…

b710a6f

… into random-subsetting

add pseudocode

8796292

s-matyukevich closed this Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random subsetting with rendezvous hashing LB policy #2

Random subsetting with rendezvous hashing LB policy #2

s-matyukevich commented Apr 10, 2024

villainb-dg left a comment

villainb-dg Apr 11, 2024 •

edited

s-matyukevich Apr 11, 2024

villainb-dg Apr 15, 2024

villainb-dg Apr 11, 2024

roanta Apr 11, 2024

s-matyukevich Apr 11, 2024

s-matyukevich Apr 11, 2024

roanta Apr 13, 2024

roanta left a comment

roanta Apr 11, 2024

s-matyukevich Apr 11, 2024

roanta Apr 11, 2024

s-matyukevich Apr 11, 2024

roanta Apr 11, 2024

s-matyukevich commented Apr 11, 2024

s-matyukevich commented Apr 26, 2024


		#### Low connection churn during server rollouts

		The graphs provided in the previous section prove this is the case in practice (we rollout all servers in the middle of every test) Low connection churn during server rollouts is the primary motivation why rendezvous hashing was used as the subsetting algorithm: it guaranties that if a single server is either added or removed to the IP address list, every client will update at most 1 entry in its subset. This is because the hashes for all unaffected servers remain stable, which guarantees that the order of the servers after sorting also remains stable.


		### Handling Parent/Resolver Updates

		When the resolver updates the list of addresses, or the LB config changes, Random subsetting LB will run the subsetting algorithm, described above, to filter the endpoint list. Then it will create a new resolver state with the filtered list of the addresses and pass it to the child LB. Attributes and service config from the old resolver state will be copied to the new one.

Random subsetting with rendezvous hashing LB policy #2

Random subsetting with rendezvous hashing LB policy #2

Conversation

s-matyukevich commented Apr 10, 2024

villainb-dg left a comment

Choose a reason for hiding this comment

villainb-dg Apr 11, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roanta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s-matyukevich commented Apr 11, 2024

s-matyukevich commented Apr 26, 2024

villainb-dg Apr 11, 2024 •

edited