wireguard.cali interface does not have IPv4 address #8772

geotransformer · 2024-04-28T12:29:53Z

======= wireguard interface has no ip address in one node of 3 nodes k8s cluster========
ubuntu@k8s-node3:~$ ifconfig wireguard.cali
wireguard.cali: flags=209<UP,POINTOPOINT,RUNNING,NOARP> mtu 1440
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC)
RX packets 299392534 bytes 133667595056 (133.6 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 263705618 bytes 55068382752 (55.0 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

=========kubeadm based k8s cluster
ubuntu@k8s-node3:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-node1 Ready control-plane 40h v1.26.5
k8s-node2 Ready control-plane 39h v1.26.5
k8s-node3 Ready control-plane 39h v1.26.5

========= pods scheduled
ubuntu@k8s-node3:$ kubectl get pods -A | wc -l
362
ubuntu@k8s-node3:$ kubectl get pods -A -owide | grep k8s-node3 | wc -l
107

========= pod subnet =============
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.26.5
certificatesDir: /data/kubernetes/pki
networking:
serviceSubnet: 10.152.4.0/23
podSubnet: 10.152.2.0/23
apiServer:

Expected Behavior

ubuntu@k8s-node1:~$ ifconfig wireguard.cali
wireguard.cali: flags=209<UP,POINTOPOINT,RUNNING,NOARP> mtu 1440
inet 10.152.2.66 netmask 255.255.255.255 destination 10.152.2.66

Current Behavior

ubuntu@k8s-node3:~$ ifconfig wireguard.cali
wireguard.cali: flags=209<UP,POINTOPOINT,RUNNING,NOARP> mtu 1440
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC)
RX packets 299392534 bytes 133667595056 (133.6 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 263705618 bytes 55068382752 (55.0 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Possible Solution

Steps to Reproduce (for bugs)

Upgrade k8s and os in a rolling fashion, one node at a time

Context

Your Environment

Calico version calicoctl v3_24
Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes
Operating System and version: Ubuntu 20.04.6 LTS
Link to your project (optional):

geotransformer · 2024-04-28T12:36:42Z

Calico node pod logs for the impacted node
ubuntu@k8s-node3:~$ kubectl get pods -A -owide | grep k8s-node3 | grep calico
kube-system calico-node-zbdf8 1/1 Running 1 40h 10.152.1.252 k8s-node3

ubuntu@k8s-node3:~$ date
Sun 28 Apr 2024 12:29:14 PM UTC

ubuntu@k8s-node3:~$ kubectl logs -n kube-system calico-node-zbdf8 | grep -i guard

****2024-04-28 12:28:48.242 [INFO][84] felix/int_dataplane.go 1946: Received *proto.WireguardEndpointUpdate update from calculation graph msg=hostname:"k8s-node3" public_key:"vFqdz4DFUYlAaGzN4O3p7vkFfoxNr+aIY94e48lZ+mQ="

2024-04-28 12:28:51.607 [INFO][84] felix/int_dataplane.go 1946: Received *proto.WireguardEndpointUpdate update from calculation graph msg=hostname:"k8s-node2" public_key:"+Ek2nxBsI60WEYfoMdQmAFUZFllR4dzB2yS80yjMDFQ=" interface_ipv4_addr:"10.152.2.131"
2024-04-28 12:28:54.351 [INFO][84] felix/int_dataplane.go 1946: Received *proto.WireguardEndpointUpdate update from calculation graph msg=hostname:"k8s-node1" public_key:"iVS+PMScWh65pQS2yr0jcV9oPgsd3UbM/SwodOpB8nQ=" interface_ipv4_addr:"10.152.2.66"
2024-04-28 12:28:58.429 [INFO][84] felix/int_dataplane.go 1946: Received *proto.WireguardEndpointUpdate update from calculation graph msg=hostname:"k8s-node3" public_key:"vFqdz4DFUYlAaGzN4O3p7vkFfoxNr+aIY94e48lZ+mQ="
2024-04-28 12:29:01.926 [INFO][84] felix/int_dataplane.go 1946: Received *proto.WireguardEndpointUpdate update from calculation graph msg=hostname:"k8s-node2" public_key:"+Ek2nxBsI60WEYfoMdQmAFUZFllR4dzB2yS80yjMDFQ=" interface_ipv4_addr:"10.152.2.131"
2024-04-28 12:29:04.502 [INFO][84] felix/int_dataplane.go 1946: Received *proto.WireguardEndpointUpdate update from calculation graph msg=hostname:"k8s-node1" public_key:"iVS+PMScWh65pQS2yr0jcV9oPgsd3UbM/SwodOpB8nQ=" interface_ipv4_addr:"10.152.2.66"
2024-04-28 12:29:08.555 [INFO][84] felix/int_dataplane.go 1946: Received *proto.WireguardEndpointUpdate update from calculation graph msg=hostname:"k8s-node3" public_key:"vFqdz4DFUYlAaGzN4O3p7vkFfoxNr+aIY94e48lZ+mQ="
2024-04-28 12:29:12.032 [INFO][84] felix/int_dataplane.go 1946: Received *proto.WireguardEndpointUpdate update from calculation graph msg=hostname:"k8s-node2" public_key:"+Ek2nxBsI60WEYfoMdQmAFUZFllR4dzB2yS80yjMDFQ=" interface_ipv4_addr:"10.152.2.131"

tomastigera · 2024-04-29T23:54:47Z

Is the problem on a single node only or across all the node?

Upgrade k8s and os in a rolling fashion, one node at a time

Could you state in the description what got you to this state? You had a cluster working and then you upgraded k8s and os? Is this the first node updated? Is there incompatibility in wg getween the old nodes and new nodes?

geotransformer · 2024-04-30T02:11:55Z

wg

Is the problem on a single node only or across all the node?

Upgrade k8s and os in a rolling fashion, one node at a time

Could you state in the description what got you to this state? You had a cluster working and then you upgraded k8s and os? Is this the first node updated? Is there incompatibility in wg getween the old nodes and new nodes?

1> K8s upgraded from 1.25 to 1.26. Calico has no change, same version 3.24. The impacted node is not always the same node, node 2 or sometimes node3. The issue observed in 3~4 times out of 100 upgrades.

2> For k8s upgrade, the node will be cordoned, drained, and removed from the k8s cluster. Then it will be the OS upgrade, and kubeadm join the node back to the k8s cluster

3> If ip add del xxx dev wireguard.cali, the ip can be restored by calico itself. wondering why the following scenario it cannot be recovered itself.
****2024-04-28 12:28:48.242 [INFO][84] felix/int_dataplane.go 1946: Received *proto.WireguardEndpointUpdate update from calculation graph msg=hostname:"k8s-node3" public_key:"vFqdz4DFUYlAaGzN4O3p7vkFfoxNr+aIY94e48lZ+mQ="

2024-04-28 12:28:51.607 [INFO][84] felix/int_dataplane.go 1946: Received *proto.WireguardEndpointUpdate update from calculation graph msg=hostname:"k8s-node2" public_key:"+Ek2nxBsI60WEYfoMdQmAFUZFllR4dzB2yS80yjMDFQ=" interface_ipv4_addr:"10.152.2.131"

=========== the following is a capture when trying to reproduce the issue ====

node1" public_key:"oB8moC5Qw4tbVnyvjRlEi3abHkpU5k8YCalNqAy49ik=" interface_ipv4_addr:"10.28.2.133"
2024-04-29 22:13:53.354 [INFO][1258] felix/int_dataplane.go 1680: Received *proto.WireguardEndpointUpdate update from calculation graph msg=hostname:"test-node1" public_key:"oB8moC5Qw4tbVnyvjRlEi3abHkpU5k8YCalNqAy49ik=" interface_ipv4_addr:"10.28.2.133"

2024-04-29 22:13:58.731 [INFO][1258] felix/int_dataplane.go 1680: Received *proto.WireguardEndpointUpdate update from calculation graph msg=hostname:"test-node1" public_key:"oB8moC5Qw4tbVnyvjRlEi3abHkpU5k8YCalNqAy49ik=" interface_ipv4_addr:"10.28.2.133"

2024-04-29 22:17:53.521 [INFO][1258] felix/int_dataplane.go 1680: Received *proto.WireguardEndpointRemove update from calculation graph msg=hostname:"test-node1"

2024-04-29 22:18:07.680 [INFO][1258] felix/int_dataplane.go 1680: Received *proto.WireguardEndpointUpdate update from calculation graph msg=hostname:"test-node1" public_key:"d5eb99Gp3YQYrXeBEWf7P0+QTF7Uof4g3s5dwkwONzU="

2024-04-29 22:18:07.733 [INFO][1258] felix/int_dataplane.go 1680: Received *proto.WireguardEndpointUpdate update from calculation graph msg=hostname:"test-node1" public_key:"d5eb99Gp3YQYrXeBEWf7P0+QTF7Uof4g3s5dwkwONzU=" interface_ipv4_addr:"10.28.2.136"

tomastigera · 2024-04-30T16:04:11Z

OK so the issue is isolated to individual nodes. Could you share full logs from a node? An issue might be compatibility with k8s 1.26. Calico 3.24 is not quite supported anymore. You might need to upgrade.

coutinhop · 2024-04-30T16:51:17Z

@geotransformer may I also ask you to enable debug logging in felix? Set logSeverityScreen to Debug in the default FelixConfiguration: https://docs.tigera.io/calico/latest/operations/troubleshoot/component-logs#configure-felix-log-level

geotransformer · 2024-05-01T00:08:47Z

OK so the issue is isolated to individual nodes. Could you share full logs from a node? An issue might be compatibility with k8s 1.26. Calico 3.24 is not quite supported anymore. You might need to upgrade.

We observed the same issue on Calico 3.27.

One thing we would like share here first. The pod subnet configured in this 3 node clusters is /23. We use the default kubernetes/Calico config. So one node cannot get a /24 cidr. Kubernetes complained the cidr is not available for node3. In Calico, I believe ipam manages the ip block and allocation. So this warning or error message seems not a SOS issue.

Also in our 3 node deployment, we have 360+ pods and need 310+ pod ips. During the upgrade, nodes will be cordoned and drained, and pods will be created again on the node. @coutinhop is there some race condition for ip recycle and reuse for calico Interfaces, like Wireguard.cali

geotransformer · 2024-05-02T13:29:24Z

@geotransformer may I also ask you to enable debug logging in felix? Set logSeverityScreen to Debug in the default FelixConfiguration: https://docs.tigera.io/calico/latest/operations/troubleshoot/component-logs#configure-felix-log-level

Yes, we will try to enable this in our automation testing

tomastigera added the kind/support label Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wireguard.cali interface does not have IPv4 address #8772

wireguard.cali interface does not have IPv4 address #8772

geotransformer commented Apr 28, 2024 •

edited

geotransformer commented Apr 28, 2024 •

edited

tomastigera commented Apr 29, 2024

geotransformer commented Apr 30, 2024 •

edited

tomastigera commented Apr 30, 2024

coutinhop commented Apr 30, 2024

geotransformer commented May 1, 2024 •

edited

geotransformer commented May 2, 2024

wireguard.cali interface does not have IPv4 address #8772

wireguard.cali interface does not have IPv4 address #8772

Comments

geotransformer commented Apr 28, 2024 • edited

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

geotransformer commented Apr 28, 2024 • edited

tomastigera commented Apr 29, 2024

geotransformer commented Apr 30, 2024 • edited

tomastigera commented Apr 30, 2024

coutinhop commented Apr 30, 2024

geotransformer commented May 1, 2024 • edited

geotransformer commented May 2, 2024

geotransformer commented Apr 28, 2024 •

edited

geotransformer commented Apr 28, 2024 •

edited

geotransformer commented Apr 30, 2024 •

edited

geotransformer commented May 1, 2024 •

edited