Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submariner doesn't work with Calico CNI #2121

Closed
zcz-1020 opened this issue Nov 12, 2022 · 20 comments
Closed

Submariner doesn't work with Calico CNI #2121

zcz-1020 opened this issue Nov 12, 2022 · 20 comments
Assignees
Labels
bug Something isn't working Calico datapath Datapath related issues or enhancements help wanted Looking for someone to work on this support
Projects

Comments

@zcz-1020
Copy link

zcz-1020 commented Nov 12, 2022

Recently, I used calico CNI to build a k8s cluster on multiple virtual machines and encountered the same problem 【 #407 】. In cross-cluster scenarios, only pods on the gateway can communicate with each other. The data plane on non-gateway nodes cannot communicate with each other. I also followed the instructions in https://submariner.io/operations/deployment/calico/ to create IPPool, but it didn't work, everything is correct when using subctl diagnose command.Traffic can be transmitted from the vx-submariner of other nodes in the cluster to the vx-submariner of the gateway node, but the packet is not sent to the vxlan-tunnel of the gateway node. As a result, a tunnel cannot be established with the remote cluster.

The environment version is as follows:
k8s: 1.21.0
calico: 3.21
submariner: 0.13.0

@sridhargaddam sridhargaddam added help wanted Looking for someone to work on this datapath Datapath related issues or enhancements labels Nov 14, 2022
@dfarrell07 dfarrell07 added bug Something isn't working Calico labels Nov 15, 2022
@dfarrell07 dfarrell07 added this to Backlog in Backlog via automation Nov 15, 2022
@zcz-1020
Copy link
Author

Recently, we spent some time trying to locate the problem. It is found that packet loss occurs on the FORWARD chain of the filter table. The default policy of the FORWARD chain is DROP. Because the request packet does not match any of the rules, the packet is discarded. After the default policy of the Forward chain is changed to ACCEPT or the rules that can be matched by data packets are added, the access is normal.

@sridhargaddam
Copy link
Member

@zcz-1020 we recently validated Calico with Submariner and noticed that there are some issues when Calico is deployed with IPIP overlay tunnels. But when we changed the Calico configuration to VxLAN overlay tunnels and configured the required Submariner IPPools as documented [here](Calico supports different types of overlay networking. Currently, Submariner is validated only when Calico is deployed with VXLAN encapsulation.), we noticed that everything works fine. Did you deploy Calico with IPIP tunnels or VxLAN tunnels?

@ii2day
Copy link

ii2day commented Feb 16, 2023

@zcz-1020 we recently validated Calico with Submariner and noticed that there are some issues when Calico is deployed with IPIP overlay tunnels. But when we changed the Calico configuration to VxLAN overlay tunnels and configured the required Submariner IPPools as documented [here](Calico supports different types of overlay networking. Currently, Submariner is validated only when Calico is deployed with VXLAN encapsulation.), we noticed that everything works fine. Did you deploy Calico with IPIP tunnels or VxLAN tunnels?

Hi I have the same problem, I determined that my calico is in vxlan mode, my calico version 3.24.0
my submariner cableDriver is vxlan

@sridhargaddam
Copy link
Member

@zcz-1020 we recently validated Calico with Submariner and noticed that there are some issues when Calico is deployed with IPIP overlay tunnels. But when we changed the Calico configuration to VxLAN overlay tunnels and configured the required Submariner IPPools as documented [here](Calico supports different types of overlay networking. Currently, Submariner is validated only when Calico is deployed with VXLAN encapsulation.), we noticed that everything works fine. Did you deploy Calico with IPIP tunnels or VxLAN tunnels?

Hi I have the same problem, I determined that my calico is in vxlan mode, my calico version 3.24.0 my submariner cableDriver is vxlan

We enhanced subctl recently to validate the Calico Overlay mode and also the IPPools that Submariner requires. Can you try running subctl diagnose cni using the latest devel which you can download as shown below.

curl -Ls https://get.submariner.io | VERSION=devel bash

@zcz-1020
Copy link
Author

zcz-1020 commented Feb 25, 2023

@sridhargaddam I'm using IPIP mode deployment, but I understand that this has nothing to do with IPIP or VXLAN mode. You can focus on the question I said above.
It is found that packet loss occurs on the FORWARD chain of the filter table. The default policy of the FORWARD chain is DROP. Because the request packet does not match any of the rules, the packet is discarded. After the default policy of the Forward chain is changed to ACCEPT or the rules that can be matched by data packets are added, the access is normal.

@zcz-1020
Copy link
Author

@sridhargaddam I have another question. Recently, we are testing the calico BGP mode. We find that the data plane of pods in different clusters is also unreachable. Have you verified this before?

@sridhargaddam
Copy link
Member

@sridhargaddam I'm using IPIP mode deployment, but I understand that this has nothing to do with IPIP or VXLAN mode. You can focus on the question I said above. It is found that packet loss occurs on the FORWARD chain of the filter table. The default policy of the FORWARD chain is DROP. Because the request packet does not match any of the rules, the packet is discarded. After the default policy of the Forward chain is changed to ACCEPT or the rules that can be matched by data packets are added, the access is normal.

Not sure if the above observation applies to Gateway node or non-Gateway node, but either way when we configure the IPPools for the remote clusters (as documented here), the remote cluster packets should not be dropped AFAICT. Anyways, currently we do not validate Calico in our CI pipeline and this needs some investigation.

CC: @yboaron JFYI

@sridhargaddam
Copy link
Member

@sridhargaddam I have another question. Recently, we are testing the calico BGP mode. We find that the data plane of pods in different clusters is also unreachable. Have you verified this before?

We verified Calico with VxLAN overlay in which BGP is disabled.

@yboaron
Copy link
Contributor

yboaron commented Mar 1, 2023

@zcz-1020 Can you share the content of Calico's default ippool from your environment?

@yboaron yboaron self-assigned this Mar 2, 2023
@zcz-1020
Copy link
Author

zcz-1020 commented Mar 3, 2023

@yboaron Sorry, the environment is gone. My IPPool is configured according to this . The cidr in the spec in the IPPool is set to the pod_cidr and the service_cidr of the remote cluster.

@yboaron
Copy link
Contributor

yboaron commented Mar 5, 2023

@yboaron Sorry, the environment is gone. My IPPool is configured according to this . The cidr in the spec in the IPPool is set to the pod_cidr and the service_cidr of the remote cluster.

I meant to the content of the default ippool (named 'default-ipv4-ippool') and not the ippools created to allow Submariner functionality with Calico CNI.

In case you have an environment please run (on both clusters):
A. Update default ippool spec to one of [1] or [2]
B. Restart all Submariner submariner-routeagent pods by running [3] command

And then run Submariner connectivity test.

Let us know if you still have issues.

[1]
ipipMode: Always
vxlanMode: Never
[2]
ipipMode: Never
vxlanMode: Always
[3]
kubectl delete pod -n submariner-operator -l app=submariner-routeagent

@leberknecht
Copy link

Maybe a little bit off-topic, but maybe also a little bit on-topic: i have an old gke cluster (i think we started it with 1.13) and im pretty sure we once installed calico manually back then to use NetworkPolicy resources. Now its been upgraded over time to 1.24, and the gcloud console UI shows Calico Kubernetes Network policy | Enabled

But subctl diagnose cni gives

 ⚠ Checking Submariner support for the CNI network plugin
 ⚠ Submariner could not detect the CNI network plugin and is using ("generic") plugin. It may or may not work.
 ✓ Trying to detect the Calico ConfigMap 

It seems odd that it does find a config-map but cant detect the plugin.. just out of curiosity, what config-map does it look for here?

@sridhargaddam
Copy link
Member

@leberknecht the Network Plugin discovery code looks for configMap named "calico-config".

For more details, you can see the corresponding code here
https://github.com/submariner-io/submariner-operator/blob/0a98c9c95d5bfb10d3a7fcb0507f741397160c23/pkg/discovery/network/calico.go#L43

@stale
Copy link

stale bot commented Sep 17, 2023

This issue has been automatically marked as stale because it has not had activity for 60 days. It will be closed if no further activity occurs. Please make a comment if this issue/pr is still valid. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Sep 17, 2023
@sridhargaddam
Copy link
Member

@leberknecht the Network Plugin discovery code looks for configMap named "calico-config".

For more details, you can see the corresponding code here https://github.com/submariner-io/submariner-operator/blob/0a98c9c95d5bfb10d3a7fcb0507f741397160c23/pkg/discovery/network/calico.go#L43

Also, we recently enhanced the discovery code for Calico CNI - submariner-io/submariner-operator#2769

@stale stale bot removed the wontfix This will not be worked on label Sep 26, 2023
@sridhargaddam
Copy link
Member

@leberknecht did you get a chance to try out the suggestion from @yboaron here - #2121 (comment)

@cangyin
Copy link

cangyin commented Jan 10, 2024

I encountered the same problem, where cross cluster communication only works on gateway nodes, while non-gateway nodes not.

@cangyin
Copy link

cangyin commented Jan 11, 2024

Recently, we spent some time trying to locate the problem. It is found that packet loss occurs on the FORWARD chain of the filter table. The default policy of the FORWARD chain is DROP. Because the request packet does not match any of the rules, the packet is discarded. After the default policy of the Forward chain is changed to ACCEPT or the rules that can be matched by data packets are added, the access is normal.

How do you fixed the iptables rules? Could you detail the operations, thanks !

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further
activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label May 11, 2024
@dfarrell07 dfarrell07 removed the stale label May 14, 2024
@yboaron
Copy link
Contributor

yboaron commented May 19, 2024

Starting from release V0.16.0 Submariner automatically creates the necessary Calico IPPools for remote cluster connectivity when the Calico API Server is installed in the cluster and Calico CNI detection also includes searching for calico-node CNI pods when the calico-config map is not detected.

Please try to install Submariner with Calico [1] again using the latest version of Submariner and let us know if you encounter any problem.

I'm going to close this issue.

[1]
https://submariner.io/operations/deployment/calico/

@yboaron yboaron closed this as completed May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Calico datapath Datapath related issues or enhancements help wanted Looking for someone to work on this support
Projects
Backlog
Backlog
Development

No branches or pull requests

7 participants