Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random subsetting with rendezvous hashing LB policy #2

Closed
wants to merge 14 commits into from

Conversation

s-matyukevich
Copy link
Owner

No description provided.

Copy link

@villainb-dg villainb-dg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing dot at end of sentence

* The policy receives a single configuration parameter: `subset_size`, which must be configured by the user.
* When the lb policy is initialized it also creates a random 32-byte long `salt` string.
* After every resolver update the policy picks a new subset. It does this by implementing `rendezvous hashing` algorithm:
* Concatenate string representation of every address in the list with `salt`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood this sentence at first, I thought you meant to concatenate all the addresses together and then add the salt. I think making it more explicit would be easier to understand for the reader, eg.

Concatenate salt to each address in the list

Also might be good to specify what string representation to use for an IP address as multiple exist.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated this bullet point.

Also might be good to specify what string representation to use for an IP address as multiple exist.

This might be language dependent, but we don't care which one we should use. It just must be stable for a given client.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, so multiple clients using a different string representation is not an issue?

Also, would this be a problem for a client:

>>> import ipaddress
>>> ipaddress.ip_address("fe80::1.2.3.4")
IPv6Address('fe80::102:304')

The string representation of this IPv6 address can be in multiple forms. They are both valid and both represent the same 64 bit integer address but a common bug is for application code to think they are distinct addresses. I think it's fine for your use-case but wanted to validate it.


#### Low connection churn during server rollouts

The graphs provided in the previous section prove this is the case in practice (we rollout all servers in the middle of every test) Low connection churn during server rollouts is the primary motivation why rendezvous hashing was used as the subsetting algorithm: it guaranties that if a single server is either added or removed to the IP address list, every client will update at most 1 entry in its subset. This is because the hashes for all unaffected servers remain stable, which guarantees that the order of the servers after sorting also remains stable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth showing how other implementations perform with regards to connection churn. It could also be useful to explain to the reader why the previous graphs demonstrate low connection churn as it might not be obvious at first sight.

I'm also wondering what is the trade-off in situations where more than one server gets added or removed from the list? Is it still superior to use rendezvous hashing or would there be a better algorithm? Not disagreeing with the decision here but trying to cover all grounds

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding connection churn, we also get additional chaos across clients due to the fact that we are hashing on IPs rather than some coordinated id, right? That is, if a server we are talking to gets rescheduled to a different pod/cluster we could reshuffle unnecessarily.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth showing how other implementations perform with regards to connection churn. It could also be useful to explain to the reader why the previous graphs demonstrate low connection churn as it might not be obvious at first sight.

updated the doc to include more details on that.

I'm also wondering what is the trade-off in situations where more than one server gets added or removed from the list? Is it still superior to use rendezvous hashing or would there be a better algorithm? Not disagreeing with the decision here but trying to cover all grounds

The same logic applies: the order of unaffected servers remains stable and number of client-side updates is always less or equal to the number of updated servers. Also mentioned this in the doc.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding connection churn, we also get additional chaos across clients due to the fact that we are hashing on IPs rather than some coordinated id, right? That is, if a server we are talking to gets rescheduled to a different pod/cluster we could reshuffle unnecessarily.

That is correct. gRPC has consistent hashing LB policy for such cases. @atollena has an in-flight gRFC related to that grpc#412

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the consistent hashing lb work is that it's solving the traditional problem of mapping a key to a server.

I was referring to the fact that rendezvous hashing in general is stable so long as the object we hash on are stable. In this case, we are hashing on IPs, which are relatively stable but probably add some degree of churn to our ordering when pods get rescheduled. This isn't a big deal given that we are doing random subsets to begin with, but just wanted to verify my understanding.

Copy link

@roanta roanta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, very nice write-up and good amount of details.

How do these reviews go? Do you need sign-off from a quorum of grpc maintainers to start working on a PR?

* For every resulting entity compute [MurmurHash3](https://en.wikipedia.org/wiki/MurmurHash) hash, which produces 128-byte output.
* Sort all entities by hash.
* Pick first `subset_size` values from the list.
* Pass the resulting subset to the child LB policy.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this doc, but I'm curious:

How do LB policies compose? I suspect it depends on the grpc implementation, but do have clear abstraction boundaries separation between selection (e.g. heap, p2c, rr) and the load metric (e.g. orca, least loaded, etc.) ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Go a balancer is just an instance of the following interface https://github.com/grpc/grpc-go/blob/master/balancer/balancer.go#L371 If a balancer wants to redirect some work to a child balancer it should implement this logic itself. There are several ways how it can do this, for example:

  • outlierdetection LB maintains the list of sub channels. For every sub-channel it creates a wrapper so it can get notified whenever a error on a sub-channel occurs. Whenever a err rate on a particular sub-channel reaches some threshold it removes this sub-channel from the list and notifies child balancer that the address list got updated. From the point of view of child balancer this is identical to the situation when a resolver removes the same address because it is no longer available in DNS.
  • What I am proposing in this gRFC is a lot simpler: we always redirect all work to the child balancer, but do some filtering whenever the address list got updated. And we don't maintain any state in the parent balancer.
    In both cases this logic lives in the balancer itself and grpc don't enforce how exactly communication between parent and child balancer should happen.


### Handling Parent/Resolver Updates

When the resolver updates the list of addresses, or the LB config changes, Random subsetting LB will run the subsetting algorithm, described above, to filter the endpoint list. Then it will create a new resolver state with the filtered list of the addresses and pass it to the child LB. Attributes and service config from the old resolver state will be copied to the new one.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This effectively means that we keep all local state (e.g. circuit breakers) in tact for sessions which haven't changed, right?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly, we will be reusing the same sub-channel, which contains all state related to a particular connection.

## Rationale
### Alternatives Considered: Deterministic subsetting

We explored the possibility of using deterministic subsetting in https://github.com/grpc/proposal/pull/383 and got push-back on this for the reasons explained [here](https://github.com/grpc/proposal/pull/383#discussion_r1334587561)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lest they have forgotten ;)

@s-matyukevich
Copy link
Owner Author

How do these reviews go? Do you need sign-off from a quorum of grpc maintainers to start working on a PR?

yep, something like this. You can find more details here https://github.com/grpc/proposal/blob/master/README.md#process

@s-matyukevich s-matyukevich changed the title Random subsetting with rendezvous hashing LB policy @s-matyukevich Random subsetting with rendezvous hashing LB policy Apr 11, 2024
s-matyukevich and others added 7 commits April 11, 2024 15:53
Co-authored-by: Antoine Tollenaere <atollena@gmail.com>
Co-authored-by: Antoine Tollenaere <atollena@gmail.com>
Co-authored-by: Antoine Tollenaere <atollena@gmail.com>
@s-matyukevich
Copy link
Owner Author

closed as this is already submitted upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants