Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support using the ring hash balancer without xDS #33356

Open
atollena opened this issue Jun 6, 2023 · 9 comments
Open

Support using the ring hash balancer without xDS #33356

atollena opened this issue Jun 6, 2023 · 9 comments
Assignees
Labels
disposition/help wanted Maintainers do not have enough resources to allocate to this at the moment. Help is appreciated! kind/enhancement lang/core priority/P2

Comments

@atollena
Copy link

atollena commented Jun 6, 2023

It is currently not possible to use the ring_hash balancer without xDS. As far as I can tell (from the Go code and the A42 gRFC), there is no strong ties between xDS and the ring_hash balancer: the only missing piece to use it directly via service config is that there is no public interface to set the hash code in the request context (or equivalent, e.g. CallConfig in c-core and CallOption in Java).

So the main thing to define would be the interface to set the hash on a per-request basis, across language. That may warrant another gRFC.

The alternative for gRPC users that want to use consistent hashing is to either build an xDS infrastructure, or copy paste the ring_hash balancer code just to expose a way to set the hash, which is not ideal.

@veblush
Copy link
Contributor

veblush commented Jun 6, 2023

Mark, would you take a look at this?

@markdroth
Copy link
Member

I've discussed this briefly with @ejona86 and @dfawley, and I think we'd be open to changing the ring_hash policy itself to provide a way to compute the hash for a request. We don't think it really makes sense to have a separate mechanism to define the hash in gRPC the way that it's done in xDS.

I don't think we have time to work on this right now, but if you want to put together a gRFC and are willing to contribute an implementation, we'd be open to reviewing it.

@markdroth markdroth added disposition/help wanted Maintainers do not have enough resources to allocate to this at the moment. Help is appreciated! and removed untriaged labels Jun 6, 2023
@atollena
Copy link
Author

atollena commented Jan 15, 2024

@markdroth I have a question related to this. A lot of our consistent hashing use cases make use of additional address metadata, instead of only the endpoint IP address as currently supported by gRPC. Envoy supports customizing the host hash key with the lb.envoy hash_key host metadata (see the ring hash docs in envoy documentation). This is useful to minimize churn in cases where logical replicas of a service do not change, but the IP address may change, such as with Kubernetes Statefulsets.

So in order to make the ring hash balancer to be truly useful to us, we would need:

  • A way to set the request hash programatically instead of via RDS, as originally described in this issue
  • A way to override the endpoint hash key in non-xDS cases, by feeding the endpoint attribute from a custom resolver
  • Ideally, also a way to set the hash key tolb.envoy hash_key field (or another LbEndpoint metadata field) when using xDS.

I'm thinking that all of those may make sense to bundle as a single gRFC. Do you have a preference for the granularity of gRFCs? I see that they tend to bundle multiple related but somewhat independent features together.

@markdroth
Copy link
Member

Bundling all of this into a single gRFC seems fine.

I think the mechanism for setting the endpoint's hash key will be fairly straightforward; we can basically just define a resolver attribute for that. And I also don't see any problem with supporting an endpoint metadata key in EDS to set that attribute.

However, I am a little concerned about the complexity involved in exposing an interface to programmatically set the request's hash. When we discussed this earlier, we were thinking that this would probably just be something simple like "use a specific request header", which could just be built into the ring_hash policy. Exposing an API to allow applications to do this programmatically is a much broader API surface, and before going down that road, I'd like to be convinced that there's no simpler way to do this. Is there a way that we can handle your use-case without that complexity?

@atollena
Copy link
Author

However, I am a little concerned about the complexity involved in exposing an interface to programmatically set the request's hash. ...

I think that your suggestion of extracting the key from a header covers our use case, which is to programatically set the hash key in the application, which can be done through the metadata interface available in all languages. So we could use a pre-defined grpc- prefixed metadata for the hash key, or make the header name configurable in the ring hash policy config. This is very simple and I think would cover most use cases.

We may also want to define hashing filter_state io.grpc.channel.id key when using service config, I guess because if it's useful for xDS users, it's probably also useful for others. I don't have a use case for this, though.

@markdroth
Copy link
Member

Sounds like we're aligned on using a header for the request hash. I'd be fine with adding a field to the ring_hash config to configure the header name to use.

The io.grpc.channel.id filter state thing was added for a fairly ugly internal use-case that we'd like to eventually eliminate, so I'd prefer not to duplicate this in the ring_hash config.

I'm looking forward to seeing the gRFC for this!

@atollena
Copy link
Author

atollena commented Jan 17, 2024

One last question regarding EDS LbMetadata. There are two options:

  1. allow a single attribute to be "settable" from resolvers but only "gettable" from the ring hash load balancer (e.g. lb.envoy hash_key) such that this cannot be used from custom LB policies.
  2. make all lb.envoy attributes available through resolver endpoint attributes, so that lb policies that want to use this feature can access them.

Option 1 minimizes the amount of information exposed, so any change in the details in the future would be a breaking change. 2 allows custom LB policies to take EDS endpoint metadata into consideration when picking an endpoint.

I am asking this because we have Envoy based features that make use of this metadata through the Envoy LB subsets, and we would eventually like to support them in gRPC+xDS (note that Envoy LB subsets, somewhat confusingly, doesn't have much to do with subsets as proposed and discussed in the open A68). Today there is no way to even experiment with a custom LB policy that does this. This is because, at least in Go, EDS LbEndpoint metadata in gRPC are not extracted from EDS responses. Option 2. would be a good first step to support LB policies that use EDS endpoint metadata, and perhaps eventually supporting LB subset metdata. I initiated a discussion on this a few months ago in this grpc-io mailing list thread, which provides more context.

I would prefer option 2, since we have a use case for it.

I realize the fact that this issue mentions removing dependency on xDS, and now I'm talking about adding xDS specific features. This is because we are currently working on supporting gRPC in our xDS control plane but all existing usage use a custom resolver, so we have both use cases.

@markdroth
Copy link
Member

We've discussed this a bit internally, and our consensus is that the Envoy-style subsetting (as opposed to the kind of subsetting discussed in A68) won't really work for gRPC, at least not without a tremendous amount of overhead.

The reason Envoy-style subsetting works for Envoy is that Envoy is designed to completely decouple the LB policy from connection management: for each request, the LB policy is passed a list of endpoints to pick from, the pick is done independently of the current connectivity state of the endpoints, and then after the pick is done, it will look for a connection to the chosen endpoint, creating a new one if needed. Because the list of endpoints to pick from is an argument to the pick request, the subsetting policy can trivially filter the list of endpoints to pick from on each request.

However, in gRPC, the LB policy itself manages endpoint connections, which allows the LB policy to never pick an endpoint for which a connection has not already been established, thus minimizing request latency. But this means that the pick request does not pass down the list of endpoints to choose from; the LB policy is expected to already know the list of endpoints it is choosing from, which it is given whenever the resolver returns a new result. This would mean that in order to implement xDS-style subsetting, the subsetting policy would basically have to create a duplicate child policy for each subset. In most gRPC implementations, this would result in a lot of duplicate connections to the same endpoint, and it would also use a lot of memory. We believe that the extra overhead here would be prohibitive in the general case.

For the use-case described in the grpc.io thread that you started, I think that you will need to stick with the property that each subset is a separate xDS cluster, so that the problem can be modeled as finding a way to dynamically choose the cluster to use for a given request. There are some options we can explore for that, and we may even have a canned solution that you can use, but let's discuss that separately.

Getting back to the ring_hash issue, I think it's fine to define a general-purpose resolver attribute for the hash key that can be used by any hash-based LB policy. I don't think we need to restrict it to be gettable only from the ring_hash policy; although we don't currently have any other policies that would use this attribute, I could see it possibly being useful for other hash-based policies in the future (either in a custom policy or some future gRPC-supported policy). For xDS, we can set that resolver attribute from the EDS envoy.lb metadata hash_key entry, but other custom resolvers could set the attribute from whatever source they want.

@atollena
Copy link
Author

I opened grpc/proposal#412 for the changes we discussed.

I will follow up regarding LB metadata, but ideally not in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition/help wanted Maintainers do not have enough resources to allocate to this at the moment. Help is appreciated! kind/enhancement lang/core priority/P2
Projects
None yet
Development

No branches or pull requests

3 participants