Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Allow setting inter-broker advertised address to cluster-ip #9551

Open
ventsislav-georgiev opened this issue Jan 12, 2024 · 6 comments

Comments

@ventsislav-georgiev
Copy link

Related problem

We are using Strimzi in GKE with CloudDNS and occasionally have issues with CloudDNS not propagating dns records for headless services.

The issue breaks the cluster and entity operators from communicating with the brokers. Getting constant java.net.UnknownHostException for the hostname.subdomain.namespace.svc requests.

The above issue is a bit out of our hands and what we would like to do instead is to not rely on headless services. Is it possible to setup the inter-broker (REPLICATION:9091) address with ClusterIP instead of relying on Pod's FQDN?

Suggested solution

Using the GenericKafkaListenerConfigurationBroker with type: cluster-ip and broker.advertisedHost to the service does exactly what we want for the 9092/9094 communication.

However, we cannot set it for the 9091 inter-broker communication.
Will be great if we could utilize the same approach.

Alternatives

No response

Additional context

No response

@ventsislav-georgiev
Copy link
Author

@scholzj Is there any "hacky" workaround to modify the end result of the broker's /opt/kafka/custom-config/server.config?:

##########
... 
# Common listener configuration
##########
...
advertised.listeners=...
...

We need this for a test environment where we create and destroy kafka clusters many times as part of CI.
So we are fine utilizing some non-production approach in order to bypass the issues with headless services of Cloud DNS.

@scholzj
Copy link
Member

scholzj commented Jan 24, 2024

I think this is not just about the advertised hosts. you would also need to create the services in the right way etc. I expected that this will end up in the "needs proposal" state after it is triaged. Until then, yuu would need to fork the code and manage it yourself.

TBH, I'm not sure I understand the issue. The way Strimzi works does not rely on any spoecial DNS features. Just a standard Kubernetes DNS patterns for addressing pods. Why do you expect the DNS names for cluster IP service to work any better than the DNS resolution for the pod DNS names?

@ventsislav-georgiev
Copy link
Author

It is just that the way Cloud DNS works (replaces kube-dns/coredns) and all the DNS resolving is done from GKE's metadata server. However the issue we experience is that headless services are not registered in the DNS server (missing DNS records). For Cluster IP services it works properly and the A record is immediately created in Cloud DNS. This is sporadic and is probably related to how often we create and destroy such namespaces with services.

For the tests we are using a single broker working also as controller in KRaft mode. So we can set the config for the cluster-operator to create the Cluster IP service for that broker and just need to redirect all network requests to use it.

For example for Kafka resource named: strimzi
and KafkaNodePool named: dual-role
in namespace named: temp-ns-xxx
we currently have the following server.config:

advertised.listeners=REPLICATION-9091://strimzi-dual-role-0.strimzi-kafka-brokers.temp-ns-xxx.svc:9091,PLAINSASL-9094://strimzi-dual-role-0.strimzi-kafka-brokers.temp-ns-xxx.svc:9094

Setting the listener configuration type to cluster-ip will create a service with ClusterIP type for the broker and we need to update the the advertised.listener to use the service instead.

advertised.listeners=REPLICATION-9091://strimzi-kafka-broker.temp-ns-xxx.svc:9091,PLAINSASL-9094://strimzi-kafka-broker.temp-ns-xxx.svc:9094

@ventsislav-georgiev
Copy link
Author

The issue has nothing to do with Kubernetes and Strimzi. We are just a bit out of options. Seems like it is reported here without progress: https://www.googlecloudcommunity.com/gc/Google-Kubernetes-Engine-GKE/GKE-autopilot-DNS-not-resolving/m-p/634344

@scholzj
Copy link
Member

scholzj commented Jan 24, 2024

The issue has nothing to do with Kubernetes and Strimzi. We are just a bit out of options. Seems like it is reported here without progress: https://www.googlecloudcommunity.com/gc/Google-Kubernetes-Engine-GKE/GKE-autopilot-DNS-not-resolving/m-p/634344

Well, resolving the various DNS names is core to Kubernetes from my point of view. So if that does not work properly, it is quite hard to deal with it.

To be honest, implementing something like this would be quite a major change and I'm not sure we want to have a functionality like that to maintain and test for the years to come just to work around some Google issues.

@scholzj
Copy link
Member

scholzj commented Jan 25, 2024

Triaged on the Strimzi Community call on 25.1.2024: There are some concerns about this:

  • This would be a lot of effort to implement and also maintain as an alternative path
  • The motivation seems to be questionable given it just seems to be a bug / limitation in one particular product

Should this be implemented, these things would need to be clarified in a proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants