Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove NameResolver.Factory #7133

Open
ejona86 opened this issue Jun 16, 2020 · 26 comments
Open

Remove NameResolver.Factory #7133

ejona86 opened this issue Jun 16, 2020 · 26 comments
Assignees
Milestone

Comments

@ejona86
Copy link
Member

ejona86 commented Jun 16, 2020

NameResolver.Factory is old and was meant to be replaced by NameResolverProvider. I've been slowly trying to remove it, especially ManagedChannelBuilder.nameResolverFactory(). I see evidence new users are using it, so this issue is to be a gathering point for why it is being removed and making sure there are alternatives to the current usages.

Since NameResolverRegistry was added in v1.21, most users shouldn't need the factory. A small number of users may need a "default name resolver override". This would be similar to defaultLoadBalancingPolicy() which was added when we removed loadBalancerFactory(). However, the NameResolver API needs tweaks to support that, where each NameResolver would be for a particular scheme and the NameResolverRegistry would select the appropriate NameResolver (instead of the current "call all the name resolvers in order until one understands the URI").

NameResolver.Factory is used internally in grpc. I don't care much about that, since it does little harm, but I don't expect NameResolver.Factory to ever become stable and it should be removed. It just may go slowly due to priorities.

@ejona86 ejona86 added this to the Next milestone Jun 16, 2020
@ejona86
Copy link
Member Author

ejona86 commented Jun 16, 2020

Removing ManagedChannelBuilder.nameResolverFactory() is a blocker for #3085.

@lburgazzoli
Copy link

I'm trying to find how to replace the use of nameResolverFactory in my code here. It looks like one should use NameResolverRegistry.register to register a custom resolver but it does not seems to work or at least I was unable to make it working.

Is there any migration documentation/example/hint about how to replace the deprecate method with a proper solution ?

@ejona86
Copy link
Member Author

ejona86 commented Aug 26, 2020

There's no migration guide, since the process is just "make a normal name resolver." That would be one that uses a target string to look up addresses.

You'd start with using the ServiceLoader mechanism described in the NameResolverProvider documentation. If you can't use it because you need some global configuration injected, then you'd use NameResolverRegistry.register(). From there the name resolver needs to get selected for your specific channel via forTarget("etcd:///") (or etcd:).

It looks like you are passing in per-channel information via the factory constructor. That should use the target uri. People abusing nameResolverFactory() to pass in channel-specific configuration is a major reason to mark it deprecated.

I don't see much point to passing the etcd hosts as URIs. Yes, that was necessary for etcd itself for compatibility, but it is strange for a gRPC name resolver. I'd probably just pass the host:post for each server, say, separated by commas. But it's not clear why there are pluggable URIResolverLoaders; is that for using things other than DNS to resolve names?

I'd maybe use a target string akin to: etcd://authority/host1:port,host2:port. That's a different meaning for authority than normal, so you could also do something like etcd:///authority/host1:port,host2:port or etcd:///host1:port,host2:port?authority=$authority.

If you really love your URIs, then you could do something more like etcd:http://host1:port,http://host2:port and pull it out with uri.getSchemeSpecificPart(). URIs allow a lot of things. At the end of the day the objective is to get the per-channel configuration into a single string.

@lburgazzoli
Copy link

lburgazzoli commented Aug 26, 2020

That implementation is quite old so I don't remember all the reason why it is what it is :) I guess some of the original design goal were to hide internals as much as possible so that we have the URIResolverLoaders. Another reason was that there's no support for dns+srv resolvers in grpc-java (at least #2109 is still not resolved) but I think I can now use the ServiceLoader mechanism to implement that one too.

@lburgazzoli
Copy link

@ejona86 I got something working but it is quite ugly as etcd itself as well as any example or other client including the go one use a list of URIs. Wonder having grpc-java supporting something like forTargets("a","b", ...) not an option right ?

@ejona86
Copy link
Member Author

ejona86 commented Aug 27, 2020

@lburgazzoli, the URL list is probably not a good enough reason, since we know everything except the hostname will be thrown away. In my mind you are free to convert the URLs to a target string with a utility and "clean them up" in the process by stripping the other stuff.

@lburgazzoli
Copy link

lburgazzoli commented Aug 28, 2020

I'm certainly not advocating to support URLs by default but in mind mind something like:

forTargets("http://host:port", "dns+srv:///_etcd._tcp.acme.com", "dns:///etcd.acme.com:8080");

Then under the hoods, gtpc-java could use the same mechanics as today but applied to multiple target ad finally aggregate the result (this is in fact what my custom name resolver was doing). In my case then I'd only have to provide a NameResolver for the http and dns+srv.

However I've implemented what you suggested by having some logic that deals with encoding/deconding from a a list of URLs to a single target entry so what I'm suggesting is not required by since there's a concept of load balancing in grpc-java, I think having an easy way to define multiple targets would be nice.

@ejona86
Copy link
Member Author

ejona86 commented Aug 28, 2020

I think having an easy way to define multiple targets would be nice.

As a general purpose mechanism it initially seems generic and useful. But it ends up hard-coding a critical part of service discovery as part of the client, which we would consider a Bad Idea. Basically, primary/secondary nodes are hard-coded. For communicating with etcd itself, that's not as strange because you have to bootstrap yourself somehow. But for communicating with an actual backend that would be very weak.

I'm noticing now that etcd may actually be a poor name to use for this resolver. Normally the etcd resolver would look up results in etcd, not connect to etcd.

For communicating with a low-level service like etcd, I'd normally expect either the (not-supported-in-grpc-java) ip: scheme or dns or dns+srv. ip: supports multiple ip addresses separated by comma. (It's not supported because there's not been any need; generally when an ip is used it's just a single IP and dns: works fine. C core only has it because they used it in their unit tests.) You'd use channelBuilder.overrideAuthority() to allow TLS. Both dns and dns+srv support returning multiple IPs and are centrally configurable. So the only reason I can think of (other than "historical reasons") that you'd prefer to list multiple hosts separately is so they can vary in their authority, but that is pretty weak.

Now, I completely understand you're doing things "the etcd way" and that made sense to etcd given how it evolved. It's just that the difficulties experienced don't seem inherent in any way, so I'd be prone to let this etcd-resolver-that-is-rarely-seen be a bit uglier.

I'll also note that it would be possible to gather the etcd configuration for a set of endpoints and give it a name and use that name for your etcd: target string. So the configuration would be out-of-band to the target string. A layer of indirection and more global state, but depending on the specific use-case it could make sense.

@lburgazzoli
Copy link

@ejona86 thx for your advice

@tommyulfsparre
Copy link

We have a custom NameResolverProvider that can be configured with a custom domain. To give an example: doing new MyNameResolverProvider("example.com") and using forTarget("my-resolver://foobar") will cause the NameResolver to resolves address with the configured custom domain foobar.example.com.

Instantiate and register our NameResolverProvider into the global registry does not prevent any library code from also doing the same and registering with a higher priority and with a custom domain.

Is there a way for us to not have to depend on the global registry?

@ejona86
Copy link
Member Author

ejona86 commented Sep 18, 2020

@tommyulfsparre, can you explain why you don't just use forTarget("my-resolver://foobar.example.com")? There is clearly a split in configuration occurring, but it would be useful to know to what end.

@tommyulfsparre
Copy link

tommyulfsparre commented Sep 18, 2020

@ejona86 we actual support both but using the fqdn uses a different scheme.

The reason (iirc) that we wanted to match our old pre-gRPC discovery names which didn't encode any locality. Our application framework will detect the region (or if you are running locally) its running in and configures the NameResolverProvider accordingly.
Currently this is achieved with ManagedChannelBuilder.nameResolverFactory().

@ejona86
Copy link
Member Author

ejona86 commented Sep 18, 2020

@tommyulfsparre, is the application or framework calling nameResolverFactory() today? For example, the application may call ManagedChannelBuilder.forTarget("my-resolver://foobar") and then pass the builder to the framework so that it can modify it. Or nameResolverFactory could be injected into the application and the application calls nameResolverFactory(). I'm assuming the application is calling forTarget("my-resolver://foobar") today.

@tommyulfsparre
Copy link

tommyulfsparre commented Sep 18, 2020

@ejona86 the framework is calling nameResolverFactory().

The framework injects a GrpcChannelFactory that returns a ManagedChannelBuilder<?> with some options already set, nameResolverFactory() among others.

So a common application usage is:

final ManagedChannel channel = GrpcChannelFactory.forTarget("my-resolver://foobar").build()

@fredsuvn
Copy link

fredsuvn commented May 9, 2021

So the only reason I can think of (other than "historical reasons") that you'd prefer to list multiple hosts separately is so they can vary in their authority, but that is pretty weak.

Did this means in current version (1.37.0), different address with their different authority is not supported? @ejona86

@ejona86
Copy link
Member Author

ejona86 commented May 10, 2021

So the only reason I can think of (other than "historical reasons") that you'd prefer to list multiple hosts separately is so they can vary in their authority, but that is pretty weak.

Did this means in current version (1.37.0), different address with their different authority is not supported?

A NameResolver is able to vary the authority per IP address group via EquivalentAddressGroup.ATTR_AUTHORITY_OVERRIDE. It is intended for cases where the name resolver can securely (e.g., using HTTPS) determine the authority and the name resolver itself is trusted. But there's no "easy" API on the higher level surface (e.g. Channel) for doing anything like that.

@ejona86
Copy link
Member Author

ejona86 commented Sep 27, 2021

As brought up in #8543, when we remove nameResolverFactory() we should update the NameResover.newNameResolver() javadoc to guarantee that the target's scheme will match that of NameResolverProvider.getScheme()

@ptriller
Copy link

ptriller commented Jul 28, 2022

We are using Zookeeper as service registry for our services, and with authentication, and cluster configuration the Global registration of the NameResolver.Factory makes it very awkward. Is there a best practice way to do that without a local NameResolver factory ?

@sanjaypujare
Copy link
Contributor

We are using Zookeeper as service registry for our services, and with authentication, and cluster configuration the Global registration of the NameResolver.Factory makes it very awkward. Is there a best practice way to do that without a local NameResolver factory ?

Not sure I completely understand the concern - since you say "Global registration ... makes it very awkward" and also "Is there a best practice way to do that without a local NameResolver factory". So the use of global and local is confusing.
And this issue talks about replacing the factory with the providers registry.

What would be an ideal solution for you?

@ejona86
Copy link
Member Author

ejona86 commented Jul 28, 2022

"authentication" shouldn't be a problem. Worst-case use NameResolverRegistry.register() in order to inject auth information into the resolver and then register it with gRPC. The "cluster configuration" part is unclear. What is that talking about? Is that just additional configuration like auth, or is it something that changes per-channel?

@ptriller
Copy link

ptriller commented Jul 28, 2022

We are using Zookeeper as service registry for our services, and with authentication, and cluster configuration the Global registration of the NameResolver.Factory makes it very awkward. Is there a best practice way to do that without a local NameResolver factory ?

Not sure I completely understand the concern - since you say "Global registration ... makes it very awkward" and also "Is there a best practice way to do that without a local NameResolver factory". So the use of global and local is confusing. And this issue talks about replacing the factory with the providers registry.

What would be an ideal solution for you?

Well, the Zookeeper connection is very much a local resource for a subset of gRPC services I use, and the NameResolverRegistry by its current design a global resource. I was wondering if there was a solution where I would not have to publish it in a VM scoped Registry. To make things even more awkward I have two Zookeeper ensembles I need to connect to for different services. It felt unnecessarily convoluted as it would be pretty easy with the deprecated nameResolverFactory method. But if that is how the target architecture looks like, I will find a suitable workaround. Although I must admit I do not understand, nor agree with that design decision.

@ejona86
Copy link
Member Author

ejona86 commented Aug 5, 2022

To make things even more awkward I have two Zookeeper ensembles I need to connect to for different services.

This would normally be handled by using the authority section of a target (e.g., zookeeper://zookeeper1/service and zookeeper://zookeeper2/service).

It felt unnecessarily convoluted as it would be pretty easy with the deprecated nameResolverFactory method.

There's two separate things going on. We don't want "I'm using two different zookeeper servers" to be a reason to avoid using the registry. The registry should handle that case. We can maybe do something to make that easier, but it is something supported for similar situations (e.g., DNS, xDS).

"a local resource for a subset of gRPC services" on the other hand is something that may not fit nicely into the registry model and I'd like to understand that better. Is this like "Zookeeper FOO is used for most services, but services from an acquisition/another company/etc use Zookeeper BAR"? Or is it more, "Service A has a special sharding/whatever and has its own Zookeeper to manage itself"?

It felt unnecessarily convoluted as it would be pretty easy with the deprecated nameResolverFactory method.

Yes and no. It was very easy to workaround the name resolver system by not using the target string at all. It seemed there were many NRs that couldn't be used in the global registry. The override wasn't really meant for that purpose; it was more for the purpose, "let me change the default from DNS to my own," which it does a poor job at but was quick and easy and gave us time to work on more pressing matters while we also gain more experience with the system.

There's multiple reasons for the changes. Much of it is to align the various languages. The entire purpose of using the target URI is we wanted something powerful enough you could easily pass a string on the command line to a program and it contact the appropriate service. If NR is hard-coded, that stops working and brings other challenges.

But we also want to support inprocess, unix, and binder name resolvers. This would allow using other transport types with stable API. But that requires a negotiation and choosing a transport based on the target string and name resolver before constructing the builder (see #9076). We're working around things for now, but essentially there's more situations that the nameResolverFactory() is incompatible and it appears to rarely be used for its intended purpose. Now we want to learn how people have been using it to factor those into the design.

@ejona86
Copy link
Member Author

ejona86 commented Aug 8, 2023

Looking through OSS references: https://sourcegraph.com/search?q=context:global+.nameResolverFactory++-file:io/grpc/&patternType=standard&case=yes&sm=0&groupBy=repo

There's a reasonable number in repos with more than 1000 stars.

A surprising number are setting DnsNameResolverProvider, which 1) is generally the default, and 2) is in the internal package, so apparently those users favor YOLO instead of filing an issue. I suspect that's a workaround for a ServiceLoader issue, or some serious cargo culting.

@jdcormie
Copy link
Member

The loss of ManagedChannelBuilder#nameResolverFactory would be problematic for PackageManager-based service discovery on Android. Such an NRP needs an instance of PackageManager to do its job, but this object isn't globally accessible -- it needs to be passed to a NRP as a constructor argument, which kind of precludes SPI.

That leaves the NameResolverRegistry method, but the ManagedChannelBuilder#nameResolverRegistry method isn't public. The global default registry isn't great because our code is an SDK/library meant to be installed in 3p apps. We don't really want to change how names are resolved for other gRPC-using code in the hosting process. Imagine a pathological case where two libraries each have their own different idea of how to resolve target URIs that start with intent: or android-app: and the 'winner' w.r.t the global default registry depends on non-deterministic initialization order of the two libraries.

Can the ManagedChannelBuilder#nameResolverRegistry method be made public to replace nameResolverFactory()?

@ejona86
Copy link
Member Author

ejona86 commented Dec 11, 2023

it needs to be passed to a NRP as a constructor argument, which kind of precludes SPI.

SPI isn't a requirement. NameResolverRegistry.getDefaultRegistry().register(provider) is intended for the cases you need to inject values.

The global default registry isn't great because our code is an SDK/library meant to be installed in 3p apps.

That's the more important situation.

Imagine a pathological case where two libraries each have their own different idea of how to resolve target URIs that start with intent: or android-app:

Well, sure. But if you are going to need some custom resolver that is just for your library, then use a more unique scheme: my-lib-resolver:.

w.r.t the global default registry depends on non-deterministic initialization order of the two libraries.

We've taken care that this not the case. We don't use registration order at all. If two resolvers for the same scheme have the same priority, we order based on class name.

@jdcormie
Copy link
Member

jdcormie commented Mar 8, 2024

Well, sure. But if you are going to need some custom resolver that is just for your library, then use a more unique scheme: my-lib-resolver:.

Thanks for the answer! I'd actually like to standardize the URI scheme then write and publish an official NameResolver for it for many android apps and libraries to use. Within a single app/process, there could be multiple independent bodies of code (e.g. sdks) using it. Each body of code needs to ensure for itself that the NRP is installed on its channels, providing its instance of the Application context (a single android:process can host multiple android.app.Applications).

At the root I guess my complaint is that non-public visibility of ManagedChannelBuilder#nameResolverRegistry forces me to use a global variable, and take on all the complications that follow from that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants