-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange behavior when using NodePort with clustermesh #24692
Comments
That doesn't sound good! |
Hello, I have exactly the same problem. Os version : Ubuntu 22.04
Cilium has been installed in a standard way :
I deployed the rebel-base pod on each cluster with a service type NodePort listening on port 30030.
Outside the cluster (my workstation), it works fine:
You will find attached the cilium dump : cilium-sysdump-20230404-162849.zip Thank you in advance for your help |
I just stumbled upon the same issue, which occurs only when cilium is configured in NodePort KPR mode is enabled (otherwise cross-cluster NodePort services are not supported). |
I've had a fresh look at this issue, and my understanding is that it relates to how NodePorts are handled for socket load balancing (the issue does not manifest when targeting the NodePort from outside the cluster). More specifically, when the dst port is in the NodePort range, we then check whether the dst address belongs to a local or remote node [1], and in that case, we lookup the service with the wildcard (0.0.0.0) address [2]. Hence, if the given NodePort service exists in the local cluster, we will always target one of the local backends, even if the target address is that of a remote node. One challenge that I see about fixing this is that AFAIK at the moment we do not have any way to tell in the datapath whether a given node belongs to the local or a remote cluster. Even knowing that, though, it is not 100% clear to me which could be the correct approach. My intuition is that a tradeoff could be to not perform the load balancing decision locally when the destination is a node in the remote cluster, since a NodePort service for that port on the remote cluster might not exist, or have different backends (hence matching the behavior when reaching that IP:port from outside the cluster). One final note is that this issue also affects the clustermesh-apiserver NodePort service with the default Helm values configuration (since the NodePort is set to a fixed value there), when /cc @cilium/sig-datapath [1]: https://github.com/cilium/cilium/blob/9c06b258463ab5629d16d1ce6b9ecaa5b1e391d7/bpf/bpf_sock.c#L198-L201 |
Cilium is currently affected by a known bug (cilium#24692) when NodePorts are handled by the KPR implementation, which occurs when the same NodePort is used both in the local and the remote cluster. This causes all traffic targeting that NodePort to be redirected to a local backend, regardless of whether the destination node belongs to the local or the remote cluster. This affects also the clustermesh-apiserver NodePort service, which is configured by default with a fixed port. Hence, let's add a warning message to the corresponding values file setting. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Cilium is currently affected by a known bug (#24692) when NodePorts are handled by the KPR implementation, which occurs when the same NodePort is used both in the local and the remote cluster. This causes all traffic targeting that NodePort to be redirected to a local backend, regardless of whether the destination node belongs to the local or the remote cluster. This affects also the clustermesh-apiserver NodePort service, which is configured by default with a fixed port. Hence, let's add a warning message to the corresponding values file setting. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 9e83a6f ] Cilium is currently affected by a known bug (cilium#24692) when NodePorts are handled by the KPR implementation, which occurs when the same NodePort is used both in the local and the remote cluster. This causes all traffic targeting that NodePort to be redirected to a local backend, regardless of whether the destination node belongs to the local or the remote cluster. This affects also the clustermesh-apiserver NodePort service, which is configured by default with a fixed port. Hence, let's add a warning message to the corresponding values file setting. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 9e83a6f ] Cilium is currently affected by a known bug (#24692) when NodePorts are handled by the KPR implementation, which occurs when the same NodePort is used both in the local and the remote cluster. This causes all traffic targeting that NodePort to be redirected to a local backend, regardless of whether the destination node belongs to the local or the remote cluster. This affects also the clustermesh-apiserver NodePort service, which is configured by default with a fixed port. Hence, let's add a warning message to the corresponding values file setting. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 9e83a6f ] Cilium is currently affected by a known bug (cilium#24692) when NodePorts are handled by the KPR implementation, which occurs when the same NodePort is used both in the local and the remote cluster. This causes all traffic targeting that NodePort to be redirected to a local backend, regardless of whether the destination node belongs to the local or the remote cluster. This affects also the clustermesh-apiserver NodePort service, which is configured by default with a fixed port. Hence, let's add a warning message to the corresponding values file setting. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 9e83a6f ] Cilium is currently affected by a known bug (cilium#24692) when NodePorts are handled by the KPR implementation, which occurs when the same NodePort is used both in the local and the remote cluster. This causes all traffic targeting that NodePort to be redirected to a local backend, regardless of whether the destination node belongs to the local or the remote cluster. This affects also the clustermesh-apiserver NodePort service, which is configured by default with a fixed port. Hence, let's add a warning message to the corresponding values file setting. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 9e83a6f ] Cilium is currently affected by a known bug (#24692) when NodePorts are handled by the KPR implementation, which occurs when the same NodePort is used both in the local and the remote cluster. This causes all traffic targeting that NodePort to be redirected to a local backend, regardless of whether the destination node belongs to the local or the remote cluster. This affects also the clustermesh-apiserver NodePort service, which is configured by default with a fixed port. Hence, let's add a warning message to the corresponding values file setting. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 9e83a6f ] Cilium is currently affected by a known bug (#24692) when NodePorts are handled by the KPR implementation, which occurs when the same NodePort is used both in the local and the remote cluster. This causes all traffic targeting that NodePort to be redirected to a local backend, regardless of whether the destination node belongs to the local or the remote cluster. This affects also the clustermesh-apiserver NodePort service, which is configured by default with a fixed port. Hence, let's add a warning message to the corresponding values file setting. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 9e83a6f ] Cilium is currently affected by a known bug (cilium#24692) when NodePorts are handled by the KPR implementation, which occurs when the same NodePort is used both in the local and the remote cluster. This causes all traffic targeting that NodePort to be redirected to a local backend, regardless of whether the destination node belongs to the local or the remote cluster. This affects also the clustermesh-apiserver NodePort service, which is configured by default with a fixed port. Hence, let's add a warning message to the corresponding values file setting. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
[ upstream commit 9e83a6f ] Cilium is currently affected by a known bug (#24692) when NodePorts are handled by the KPR implementation, which occurs when the same NodePort is used both in the local and the remote cluster. This causes all traffic targeting that NodePort to be redirected to a local backend, regardless of whether the destination node belongs to the local or the remote cluster. This affects also the clustermesh-apiserver NodePort service, which is configured by default with a fixed port. Hence, let's add a warning message to the corresponding values file setting. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
This issue has been automatically marked as stale because it has not |
This issue has been automatically marked as stale because it has not |
This issue has been automatically marked as stale because it has not |
[ upstream commit 9e83a6f ] Cilium is currently affected by a known bug (#24692) when NodePorts are handled by the KPR implementation, which occurs when the same NodePort is used both in the local and the remote cluster. This causes all traffic targeting that NodePort to be redirected to a local backend, regardless of whether the destination node belongs to the local or the remote cluster. This affects also the clustermesh-apiserver NodePort service, which is configured by default with a fixed port. Hence, let's add a warning message to the corresponding values file setting. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Is there an existing issue for this?
#21261
What happened?
Based on slack discussion: https://cilium.slack.com/archives/C53TG4J4R/p1680172555618329
Not sure if this is an issue or normal behavior. We installed two clusters using tunnel mode (let's call them DC1 and DC2) with Cluster Mesh activated and running. No global services are configured. Each cluster has a service type NodePort listening on port 30030. On a DC1 node or pod, when trying to communicate with DC2 using node01-dc2.lab.it:30030, we always get a response from DC1, as if the DC1 node is catching the request. We do not see this behavior when Cluster Mesh is not running.
For example, with Rebel Base:
curl node01-dc1.lab.it:30030
{"Galaxy": "Alderaan", "Cluster": "Cluster-DC1"}
curl node01-dc2.lab.it:30030
{"Galaxy": "Alderaan", "Cluster": "Cluster-DC1"}
What we think it should be:
curl node01-dc1.lab.it:30030
{"Galaxy": "Alderaan", "Cluster": "Cluster-DC1"}
curl node01-dc2.lab.it:30030
{"Galaxy": "Alderaan", "Cluster": "Cluster-DC2"}
Outside the cluster, normal behavior is observed when reaching DC1 or DC2.
Cilium Version
Client: 1.13.0 c9723a8 2023-02-15T14:18:31+01:00 go version go1.19.6 linux/amd64
Daemon: 1.13.0 c9723a8 2023-02-15T14:18:31+01:00 go version go1.19.6 linux/amd64
Kernel Version
Linux 5.10.0-20-amd64 #1 SMP Debian 5.10.158-2 (2022-12-13) x86_64 x86_64 x86_64 GNU/Linux
Kubernetes Version
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"8f94681cd294aa8cfd3407b8191f6c70214973a4", GitTreeState:"clean", BuildDate:"2023-01-18T15:51:25Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}
Sysdump
cilium-sysdump-20230404-145836.zip
Relevant log output
No response
Anything else?
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: