Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A76: Improvements to the Ring Hash LB Policy #412

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
176 changes: 176 additions & 0 deletions A76-ring-hash-improvements.md
@@ -0,0 +1,176 @@
A76: Improvements to the Ring Hash LB Policy
----
* Author(s): @atollena
* Approver: ?
markdroth marked this conversation as resolved.
Show resolved Hide resolved
* Status: Draft
* Implemented in: <language, ...>
* Last updated: 2024-01-15
* Discussion at: TODO
markdroth marked this conversation as resolved.
Show resolved Hide resolved

## Abstract

This proposal describes two improvements to the `ring_hash` load balancing policy:

1. The ability to use ring hash without xDS, by extending the policy
configuration to define the [request metadata][metadata] to use as the
request hash key.
2. The ability to specify endpoint hash keys explicitly, instead of hashing the
endpoint IP address.

## Background

### Terminology

* The *request hash key*, after being hashed, defines where a given request is
to be placed on the ring in order to find the closest endpoints.
* The *endpoint hash key*, after being hashed, determines the locations of an
endpoint on the ring.

### Using ring hash without xDS by explicitly setting the request hash key

gRPC supports the `ring_hash` load balancing policy for consistent hashing. This
policy currently requires using xDS for configuration because users have no
other way to configure the hash for a request but to use the route configuration
`hash_policy` field in the `RouteAction` route configuration. This makes the
`ring_hash` policy unusable without an xDS infrastructure in place.

This proposal extends the configuration of `ring_hash` policy to specify a
metadata key. The value associated with this metadata key will be used as the
request hash key if present. This will make it possible to use `ring_hash` by
configuring it entirely in the [service config][service-config]. If this
configuration is omitted, we will preserve the current behavior of using the xDS
hash policy.

### Using an explicit endpoint hash key

Another limitation of the current `ring_hash` load balancing policy is that it
always hashes the endpoint IP address to place the endpoints on the ring. In
some scenario, this choice is not ideal: for example, [Kubernetes
Statefulsets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
offer a way to configure workloads with sticky identity such that endpoints keep
storage and network identifier across restarts. However, the IP address may
change across restarts. After a deployment, if all IP addresses have changed,
then a completely new ring has to be constructed, even though it may have been
desirable to keep the ring unchanged based on the Statefulsets identities, so
that each instance stays at the same location on the ring.

Envoy offers a solution to control endpoint hashes independently of IP
addresses. This mechanism uses the `"envoy.lb"`
[LbEndpoint.Metadata][LbEndpoint.Metadata] field `hash_key` value available in
EDS instead of the endpoint IP address, as described in [the Envoy documentation
for ring hash][envoy-ringhash]. This proposal adds support for setting the
endpoint hash key explicitly via EDS by reusing the configuration mechanism
implemented in Envoy. To retain the advantage of being able to use `ring_hash`
without xDS, custom gRPC name resolvers will be able to set this endpoint
attribute through the language specific resolver attribute interface.

### Related Proposals:

This proposal extends the following existing gRFC:

* [gRFC A42: xDS Ring Hash LB Policy][A42]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably also worth referencing A61, which significantly simplified the ring_hash picker logic.

Copy link
Author

@atollena atollena Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for pointing this out, I wasn't aware of A61's impact on ring hash. I spent some time to understand how that affects the implementation I started in Go. Go implements A62 (pick first sticky transient failure) but not A61, so it cannot take advantage of this without further refactoring the ring hash policy. I'll sync up with the Go team to see what the best path forward is, but I imagine it'd be best to start with implementing at least part of A61 (delegating to pick_first).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a quick discussion with @dfawley I decided to try to implement this before A61. I think the proposal could be more clear in this case, at least for Go, because there is a difference between immediately attempting to connect (which happens only for subchannels in "idle" state), and queuing a connect for subchannels in transient failure, where a connection attempt will be triggered when the subchannel returns to idle state after the connection backoff.

I implemented the new behavior of scanning forward for a ready subchannel, either when:

  1. the subchannel picked randomly is idle (immediate connection attempt triggered) or,
  2. the subchannel picked randomly is in transient failure (connection attempt queued)

I think it has the desired behaviour of making sure we don't trigger more than one connection attempt on each pick if there is a ready subchannel, not adding latency if there is a ready subchannel, and converge to random. But I also considered only implementing the scanning forward for ready subchannel when we triggered an immediate connection attempt, which happens when either:

  1. the picked subchannel is idle
  2. the picked subchannel is in transient failure, and the second is idle
  3. the picked suchannel is in transient failure, the second is also in transient failure, and we found an idle connection before a ready one when scanning the ring forward.

This ambiguity will disappear when all implementations have A61 implemented. This is planned for this quarter for Go, IIUC, but I imagine there may be reasons for some implementations to want to implement A76 before A61. My question is whether you think it's worth adding a note about it in this gRFC to lift this ambiguity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My inclination is that this logic is already complicated enough to understand, so I'd prefer to avoid muddying the waters by trying to do this without having first implemented the relevant parts of A61. From first glance, what you describe here seems reasonable, but I think I'd have to stare at it for a lot longer to convince myself it didn't have any potential pitfalls.

@dfawley, is it not feasible to implement just the ring_hash part of A61 before implementing this change?

markdroth marked this conversation as resolved.
Show resolved Hide resolved

## Proposal

### Explicitly setting the request hash key

A new string field `request_metadata_key` will be added to the `ring_hash`
markdroth marked this conversation as resolved.
Show resolved Hide resolved
markdroth marked this conversation as resolved.
Show resolved Hide resolved
policy config. The ring hash policy will be modified as follows:

Upon loading the load balancing config, if the `request_metadata_key` field
contains a value that is not a valid metadata key, then the configuration is
rejected. If the `request_metadata_key` refers to a binary metadata (prefixed
murgatroid99 marked this conversation as resolved.
Show resolved Hide resolved
with `-bin`), the configuration is also rejected.

At pick time:
- If `request_metadata_key` is not empty, and the request metadata has a
non-empty value, then the request hash key will be set to this value. If
`request_metadata_key` contains multiple values, then values are joined with a
coma `,` character between each value before hashing.
murgatroid99 marked this conversation as resolved.
Show resolved Hide resolved
- If `request_metadata_key` is not empty, and the request has no value or an
markdroth marked this conversation as resolved.
Show resolved Hide resolved
empty value associated with the metadata key defined, then the picker will
generate a random hash for it. The use of a random hash key will effectively
sends the request to a random endpoint.
- If `request_metadata_key` is not set or empty, then the request hash key will
be based on the xDS hash policy in RDS, which keeps the existing LB
configuration for ring hash working as before with xDS.
- If `request_metadata_key` is not set or empty but the xDS configuration does
not provide the hash key, then the picker will generate a random hash for it to
select a random endpoint, which matches the current behavior for xDS.


markdroth marked this conversation as resolved.
Show resolved Hide resolved
### Explicitly setting the endpoint hash key

The `ring_hash` policy will be changed such that the hash key used for
determining the locations of each endpoint on the ring will be extracted from a
pre-defined name resolver attribute called `hash_key`. If this attribute is set,
then the endpoint is placed on the ring by hashing its value. If this attribute
is not set or empty, then the endpoint IP address is used (current
behavior). The location of an existing endpoint on the ring changes if its
`hash_key` name resolver attribute changes.

The xDS resolver will be changed so that when converting EDS responses to
resolver endpoints, it will set the `hash_key` name resolver attribute to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDS responses are handled in a load balancing policy, so I'm not sure if "name resolver attribute" is the right terminology here.

Copy link
Author

@atollena atollena Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I interpreted your comment as two things:

  1. It's not the xDS resolver, but the cluster resolver LB policy that does this, so I replaced the term.
  2. "name resolver attributes" are not really a thing. I replaced this with "Endpoint attributes", which is metadata attached to each endpoint, used for example to store locality IDs and weight, and which IIUC are language specific, but all languages have this feature. I don't know if this is clear enough for cross-language interpretations -- please let me know.

1 seems to be in motion with A74, which removes the cluster resolver LB policy in favor of the "xDS resolver".

value of [LbEndpoint.Metadata][LbEndpoint.Metadata] `envoy.lb` `hash_key` field,
as described in [Envoy's documentation for the ring hash load
balancer](envoy-ring-hash).
murgatroid99 marked this conversation as resolved.
Show resolved Hide resolved

### LB Policy Config Changes

After the addition of this field, the `ring_hash` LB policy config will be:

message RingHashLoadBalancingConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get protobuf syntax highlighting, instead of indenting the code, use a fenced code block, with the language name proto.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

// A client-side cap will limit these values. If either of these values
// are greater than the client-side cap, they will be treated as the
// client-side cap. The default cap is 4096.
uint64 min_ring_size = 1; // Optional, defaults to 1024.
uint64 max_ring_size = 2; // Optional, defaults to 4096, max is 8M.

string request_metadata_key = 3; // Optional, defaults to the empty string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of a Protobuf 3 message, an unset string field is the same as an empty string value, so stating that the field "defaults to the empty string" is redundant with the default Protobuf behavior. In addition, "defaults to the empty string" seems to imply that the empty string value will be used in some way, but earlier it says that "not set" and "empty" are treated the same. I think it would be clearer to say something like "Optional, unused if unset".

Copy link
Author

@atollena atollena Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, replaced with "Optional, unused if unset.". I also removed mentions of "not set" in the implementation section, since there is no way to distinguish not set from empty.

}


### Temporary environment variable protection

Explicitly setting the request hash key cannot possibly lead to problem with
existing deployment because the new behavior requires setting a load balancing
policy configuration field that did not exist before. Therefore, it is not gated
behind an environment variable.

The second behavior change will be enabled by the
`GRPC_EXPERIMENTAL_XDS_RING_HASH_ENDPOINT_HASH_KEY` environment variable. This
markdroth marked this conversation as resolved.
Show resolved Hide resolved
will protect from the case where an xDS control plane is already setting the
`LbEndpoint.Metadata` `envoy.lb` `hash_key` field, in which case deploying this
new behavior would churn all endpoint hash keys. This environment variable will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploying a new version of gRPC will clear all ring_hash state anyway, so I don't understand how this environment variable would do anything to prevent churn.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploying with the environment variable unset will keep the current behavior of hashing the IP address instead of the new behavior of using the value from LbEndpoint.Metadata. As a result, the locations of existing endpoints on the ring will be guaranteed to be the same as before, and requests will continue being routed to the same endpoint. It will churn connections, if that's what you mean, but not endpoint locations on the ring. Suddenly changing all locations on the ring may have unintended consequences, removing the locality that ring hash may have been used for in the first place.

I'm thinking of a better phrasing, and the best I could think of was to replace "churn" with "change". Let me know if this is still unclear.

be removed once the feature has proven stable.

## Rationale

We originally proposed using language specific interfaces to set the request
hash key. The advantage would have been that the request hash key would not have
to be exposed through gRPC outgoing metadata. However, this would have required
defining language specific APIs, which would increase the complexity of this
change.

We also discussed the option of exposing all `LbEndpoint.metadata` from EDS
through name resolver attributes, instead of only extracting the specific
`hash_key` attribute, so as to make them available to custom LB policies. We
decided to keep only extract `hash_key` to limit the scope of this gRFC.

## Implementation

[A description of the steps in the implementation, who will do them, and when.
If a particular language is going to get the implementation first, this section
should list the proposed order.]
murgatroid99 marked this conversation as resolved.
Show resolved Hide resolved

Will provide an implementation in Go.

## Open issues (if applicable)
murgatroid99 marked this conversation as resolved.
Show resolved Hide resolved

N/A

[A42]: A42-xds-ring-hash-lb-policy.md
[envoy-ringhash]: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/load_balancers#ring-hash
[metadata]: https://grpc.io/docs/what-is-grpc/core-concepts/#metadata
[service-config]: https://github.com/grpc/grpc/blob/master/doc/service_config.md
[LbEndpoint.Metadata]: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/v3/endpoint_components.proto#envoy-v3-api-field-config-endpoint-v3-lbendpoint-metadata