Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control plane load balancer creation loop with multiple instances #784

Open
RomLecat opened this issue Mar 10, 2024 · 17 comments
Open

Control plane load balancer creation loop with multiple instances #784

RomLecat opened this issue Mar 10, 2024 · 17 comments

Comments

@RomLecat
Copy link

Describe the bug
When multiple instances (in my case, one for IPv4 and an other one for IPv6) of kube-vip runs to provide control-plane load balancing, the load balancer is removed and re-created in loop.

To Reproduce

Run multiple instances of kube-vip with a different name, different VIP, same interface.

values.yaml:

kube-vip:
  image:
    repository: ghcr.io/romlecat/kube-vip
    tag: "0.7.1-fix781"
  nameOverride: "kube-vip-v6"
  config:
    address: <vip>
  env:
    vip_arp: "true"
    vip_interface: eth0
    vip_port: "6443"
    lb_enable: "true"
    cp_enable: "true"
    svc_enable: "false"
    vip_cidr: "128"
    prometheus_server: ":2113"
  tolerations:
    - effect: NoSchedule
      key: node-role.kubernetes.io/master
    - effect: NoSchedule
      key: node-role.kubernetes.io/control-plane
      operator: Exists
    - effect: NoExecute
      key: node-role.kubernetes.io/master
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
            - key: node-role.kubernetes.io/master
              operator: Exists
          - matchExpressions:
            - key: node-role.kubernetes.io/control-plane
              operator: Exists

Logs:

time="2024-03-10T11:10:24Z" level=error msg="Error querying backends file does not exist"
time="2024-03-10T11:10:24Z" level=warning msg="load balancer for API server already exists, attempting to remove and re-create"
time="2024-03-10T11:10:24Z" level=info msg="Created Load-Balancer services on [<ipv6-public-prefix-2>:2::b002:ffff:6443]"
time="2024-03-10T11:10:24Z" level=info msg="Broadcasting NDP update for <ipv6-public-prefix-2>:2::b002:ffff (<mac>) via eth0"
time="2024-03-10T11:10:24Z" level=info msg="ndp: &{false false true <ipv6-public-prefix-2>:2::b002:ffff [0xc0006eea00]}"
time="2024-03-10T11:10:27Z" level=info msg="Broadcasting NDP update for <ipv6-public-prefix-2>:2::b002:ffff (<mac>) via eth0"
time="2024-03-10T11:10:27Z" level=info msg="ndp: &{false false true <ipv6-public-prefix-2>:2::b002:ffff [0xc000051100]}"
time="2024-03-10T11:10:30Z" level=info msg="Broadcasting NDP update for <ipv6-public-prefix-2>:2::b002:ffff (<mac>) via eth0"
time="2024-03-10T11:10:30Z" level=info msg="ndp: &{false false true <ipv6-public-prefix-2>:2::b002:ffff [0xc000051da0]}"
time="2024-03-10T11:10:33Z" level=info msg="Broadcasting NDP update for <ipv6-public-prefix-2>:2::b002:ffff (<mac>) via eth0"
time="2024-03-10T11:10:33Z" level=info msg="ndp: &{false false true <ipv6-public-prefix-2>:2::b002:ffff [0xc0002a4bc0]}"
time="2024-03-10T11:10:34Z" level=error msg="Error querying backends file does not exist"
time="2024-03-10T11:10:34Z" level=warning msg="load balancer for API server already exists, attempting to remove and re-create"
time="2024-03-10T11:10:34Z" level=info msg="Created Load-Balancer services on [<ipv6-public-prefix-2>:2::b002:ffff:6443]"
time="2024-03-10T11:10:34Z" level=error msg="Error querying backends file does not exist"
time="2024-03-10T11:10:34Z" level=warning msg="load balancer for API server already exists, attempting to remove and re-create"
time="2024-03-10T11:10:34Z" level=info msg="Created Load-Balancer services on [<ipv6-public-prefix-2>:2::b002:ffff:6443]"
time="2024-03-10T11:10:34Z" level=error msg="Error querying backends file does not exist"
time="2024-03-10T11:10:34Z" level=warning msg="load balancer for API server already exists, attempting to remove and re-create"
time="2024-03-10T11:10:34Z" level=info msg="Created Load-Balancer services on [<ipv6-public-prefix-2>:2::b002:ffff:6443]"

Expected behavior
Load balancer should be stable.

Environment (please complete the following information):

Thanks

@lubronzhan
Copy link
Contributor

When multiple instances (in my case, one for IPv4 and an other one for IPv6) of kube-vip runs to provide control-plane load balancing, the load balancer is removed and re-created in loop.

Hi, do you mean if you just deploy single kube-vip instance with ipv6, it won't have issue? It would just happen when you deploying two kube-vip instances?

@RomLecat
Copy link
Author

My apologies, I thought it was, but after disabling the IPv4 instance, the issue still occurs, so it doesn't look like it's due to having multiple instances.

@lubronzhan
Copy link
Contributor

lubronzhan commented Mar 11, 2024

Ok let me check if I can reproduce this

@lubronzhan
Copy link
Contributor

I can't reproduce this issue in my env

time="2024-03-12T01:13:50Z" level=info msg="Broadcasting NDP update for fd01:3:4:2826:0:a:cccc:a955 (00:50:56:95:a8:13) via eth0"
time="2024-03-12T01:13:50Z" level=info msg="ndp: &{false false true fd01:3:4:2826:0:a:cccc:a955 [0xc000118960]}"
time="2024-03-12T01:13:53Z" level=info msg="Broadcasting NDP update for fd01:3:4:2826:0:a:cccc:a955 (00:50:56:95:a8:13) via eth0"
time="2024-03-12T01:13:53Z" level=info msg="ndp: &{false false true fd01:3:4:2826:0:a:cccc:a955 [0xc000214800]}"
time="2024-03-12T01:13:56Z" level=info msg="Broadcasting NDP update for fd01:3:4:2826:0:a:cccc:a955 (00:50:56:95:a8:13) via eth0"
time="2024-03-12T01:13:56Z" level=info msg="ndp: &{false false true fd01:3:4:2826:0:a:cccc:a955 [0xc0001190e0]}"
time="2024-03-12T01:13:59Z" level=info msg="Broadcasting NDP update for fd01:3:4:2826:0:a:cccc:a955 (00:50:56:95:a8:13) via eth0"
time="2024-03-12T01:13:59Z" level=info msg="ndp: &{false false true fd01:3:4:2826:0:a:cccc:a955 [0xc0001196e0]}"
time="2024-03-12T01:14:02Z" level=info msg="Broadcasting NDP update for fd01:3:4:2826:0:a:cccc:a955 (00:50:56:95:a8:13) via eth0"
time="2024-03-12T01:14:02Z" level=info msg="ndp: &{false false true fd01:3:4:2826:0:a:cccc:a955 [0xc0007988a0]}"
time="2024-03-12T01:14:05Z" level=info msg="Broadcasting NDP update for fd01:3:4:2826:0:a:cccc:a955 (00:50:56:95:a8:13) via eth0"
time="2024-03-12T01:14:05Z" level=info msg="ndp: &{false false true fd01:3:4:2826:0:a:cccc:a955 [0xc0001199e0]}"

But my kube-proxy is iptables mode.

Looking at your look, it looks like something is conflicting with kube-vip. Since you just deployed single kube-vip but still observing it, so I assume it's something else. From your description from previous PR, before you use the dev build that has the fix, the ipvsadm -L -n output you mentioned has something like below

TCP  [<ipv6-public-prefix-2>:2::b002:ffff]:30151 rr
  -> [<ipv6-public-prefix-1>:700::66d:75b7]:443 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4f1]:443 Masq    1      0          0

you mentioned <ipv6-public-prefix-2>:2::b002:ffff is your control plane VIP, and kube-vip wasn't able to create ipvs rule before you have the fix, so I think this rule probably was created by kube-proxy, another evidence would be the large port number 30151.

But you also passed below config to your kube-proxy settings, could you check if that ipv6-neet correctly include the vip?

'--kube-proxy-arg=ipvs-exclude-cidrs=<ipv4-net>/23,<ipv6-net>/64' \

One verification would be disable ipvs for you kube-proxy and see if the issue still persist.

I think kube-proxy is incorrectly using VIP address as the node ip. This could happen because for IPV6, the ip order on interface is opposite. The workaround we used is to pass node-ip to kubelet. You can verify that by kubectl describe no and check your node ip, it probably has both ips, node ip+vip ip. So another way to fix that is pass node-ip to kubelet. If this is really the root cause

@lubronzhan
Copy link
Contributor

lubronzhan commented Mar 12, 2024

Interesting, the above env I tested was inside a pure ipv6 env, now I'm able to reproduce this inside an environemnt that node has both ipv4 and ipv6 dual stack ip on the interface

time="2024-03-12T01:34:05Z" level=error msg="Error querying backends file does not exist"
time="2024-03-12T01:34:05Z" level=warning msg="load balancer for API server already exists, attempting to remove and re-create"
time="2024-03-12T01:34:05Z" level=info msg="Created Load-Balancer services on [fd01:3:8:208:250:56ff:fe85:1c2b:6443]"
time="2024-03-12T01:34:06Z" level=info msg="Broadcasting NDP update for fd01:3:8:208:250:56ff:fe85:1c2b (00:50:56:85:1c:2a) via eth0"
time="2024-03-12T01:34:06Z" level=info msg="ndp: &{false false true fd01:3:8:208:250:56ff:fe85:1c2b [0xc0002a7be0]}"
time="2024-03-12T01:34:08Z" level=error msg="Error querying backends file does not exist"
time="2024-03-12T01:34:08Z" level=warning msg="load balancer for API server already exists, attempting to remove and re-create"
time="2024-03-12T01:34:08Z" level=info msg="Created Load-Balancer services on [fd01:3:8:208:250:56ff:fe85:1c2b:6443]"
time="2024-03-12T01:34:09Z" level=info msg="Broadcasting NDP update for fd01:3:8:208:250:56ff:fe85:1c2b (00:50:56:85:1c:2a) via eth0"
time="2024-03-12T01:34:09Z" level=info msg="ndp: &{false false true fd01:3:8:208:250:56ff:fe85:1c2b [0xc000051840]}"
time="2024-03-12T01:34:12Z" level=info msg="Broadcasting NDP update for fd01:3:8:208:250:56ff:fe85:1c2b (00:50:56:85:1c:2a) via eth0"
time="2024-03-12T01:34:12Z" level=info msg="ndp: &{false false true fd01:3:8:208:250:56ff:fe85:1c2b [0xc0002509a0]}"

I might triggered some weird issue though since my node only have ipv4 IP in this environment, while I'm trying to advertise ipv6 address

Addresses:
  Hostname:    wl-antrea-controlplane-cms7d-t84hw
  InternalIP:  10.180.204.60
  ExternalIP:  10.180.204.60

your frequency of error log still indicate kube-proxy is causing the conflict.

@lubronzhan
Copy link
Contributor

lubronzhan commented Mar 12, 2024

Other than the suggestion I have here #784 (comment), you can try

  1. stop kube-vip
  2. make sure kube-proxy is still running
  3. add ipvs rule related to your control plane plane vip, see if it disappear immediately or does it ever get added
    For example
ipvsadm -A -t [<ipv6-public-prefix-2>:2::b002:ffff]:6443   <--- change to your VIP
ipvsadm -a -t [<ipv6-public-prefix-2>:2::b002:ffff]:6443 -r [random ip]:6443 -m

This way we can rule out whether if it's kube-proxy is causing the issue

@RomLecat
Copy link
Author

internal-ip looks correct in the VIP holder:

kubectl describe node k8s-master-1.internal.lct1 | grep internal-ip
                    k3s.io/internal-ip: 10.9.2.241,<ipv6-public-prefix-2>:2::b002:1

InternalIP is 10.9.2.241, because I set:

'--kubelet-arg=node-ip=0.0.0.0' \

IPv6 prefix in ipvs-exclude-cidrs is correct, but I'll try to manually change IPVS rules as you described to see if it changes anything.

@RomLecat
Copy link
Author

RomLecat commented Mar 12, 2024

I did ipvsadm -A -t [<ipv6-public-prefix-2>:2::b002:ffff]:6443 and it was not altered:

ipvsadm -A -t [<ipv6-public-prefix-2>:2::b002:ffff]:6443
ipvsadm -a -t [<ipv6-public-prefix-2>:2::b002:ffff]:6443 -r [2607:f8b0:4004:c19::5e]:6443 -m
ipvsadm -Ln | grep ffff
TCP  [<ipv6-public-prefix-2>:2::b002:ffff]:6443 wlc

@lubronzhan
Copy link
Contributor

Could you grep -A 3, wanna see if the forwarding rule is still there

And I thought kube-proxy did created other ffff rule before, for example, those are not found?

TCP  [<ipv6-public-prefix-2>:2::b002:ffff]:31662 rr
  -> [<ipv6-public-prefix-1>:700::66d:75af]:80 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4ca]:80 Masq    1      0          0

@RomLecat
Copy link
Author

RomLecat commented Mar 12, 2024

Here's the full output, so you have all information:

root@k8s-master-3:~# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  <ipv4-public-prefix>.20:3129 rr
  -> 10.8.19.203:3129             Masq    1      0          0
TCP  <ipv4-public-prefix>.23:80 rr
  -> 10.8.3.90:80                 Masq    1      0          0
  -> 10.8.19.253:80               Masq    1      0          0
TCP  <ipv4-public-prefix>.23:443 rr
  -> 10.8.3.90:443                Masq    1      0          0
  -> 10.8.19.253:443              Masq    1      0          0
TCP  10.8.23.18:30151 rr
  -> 10.8.3.90:443                Masq    1      0          0
  -> 10.8.19.253:443              Masq    1      0          0
TCP  10.8.23.18:30394 rr
  -> 10.8.19.203:3129             Masq    1      0          0
TCP  10.8.23.18:31662 rr
  -> 10.8.3.199:80                Masq    1      0          0
  -> 10.8.19.9:80                 Masq    1      0          0
TCP  10.8.23.18:31851 rr
  -> 10.8.4.4:6980                Masq    1      0          0
TCP  10.8.23.18:31970 rr
  -> 10.8.3.90:80                 Masq    1      0          0
  -> 10.8.19.253:80               Masq    1      0          0
TCP  10.8.23.18:32524 rr
  -> 10.8.3.199:443               Masq    1      0          0
  -> 10.8.19.9:443                Masq    1      0          0
TCP  10.8.128.1:443 rr
  -> 10.9.2.241:6443              Masq    1      1          0
  -> 10.9.2.242:6443              Masq    1      1          0
  -> 10.9.2.243:6443              Masq    1      1          0
TCP  10.8.128.10:53 rr
  -> 10.8.4.13:53                 Masq    1      0          0
TCP  10.8.128.10:9153 rr
  -> 10.8.4.13:9153               Masq    1      0          0
TCP  10.8.132.200:9696 rr
  -> 10.8.4.33:9696               Masq    1      0          0
TCP  10.8.133.128:443 rr
  -> 10.8.4.70:5443               Masq    1      0          0
  -> 10.8.19.200:5443             Masq    1      0          0
TCP  10.8.133.183:5800 rr
  -> 10.8.19.203:5800             Masq    1      0          0
TCP  10.8.135.15:6767 rr
  -> 10.8.3.157:6767              Masq    1      0          0
TCP  10.8.138.21:8080 rr
  -> 10.8.19.60:8080              Masq    1      0          0
TCP  10.8.139.43:80 rr
  -> 10.8.3.212:80                Masq    1      0          0
TCP  10.8.143.135:7979 rr
  -> 10.8.4.129:7979              Masq    1      0          0
TCP  10.8.145.129:8181 rr
  -> 10.8.3.193:8181              Masq    1      0          0
TCP  10.8.147.60:443 rr
  -> 10.8.19.62:10250             Masq    1      0          0
TCP  10.8.151.237:8600 rr
  -> 10.8.19.156:8600             Masq    1      0          0
TCP  10.8.153.60:4848 rr
  -> 10.8.3.236:4848              Masq    1      0          0
TCP  10.8.158.104:443 rr
  -> 10.8.4.55:9443               Masq    1      0          0
TCP  10.8.159.11:3003 rr
  -> 10.8.3.75:3003               Masq    1      0          0
TCP  10.8.159.18:8080 rr
  -> 10.8.3.42:8080               Masq    1      0          0
TCP  10.8.160.176:5432 rr
  -> 10.8.3.73:5432               Masq    1      0          0
TCP  10.8.160.188:443 rr
  -> 10.8.3.90:8443               Masq    1      0          0
  -> 10.8.19.253:8443             Masq    1      0          0
TCP  10.8.161.146:80 rr
  -> 10.8.4.72:80                 Masq    1      0          0
  -> 10.8.19.73:80                Masq    1      0          0
TCP  10.8.163.210:8096 rr
  -> 10.8.3.6:8096                Masq    1      0          0
TCP  10.8.164.78:7011 rr
  -> 10.8.3.225:7011              Masq    1      0          0
TCP  10.8.164.194:5473 rr
  -> 10.9.2.241:5473              Masq    1      0          0
  -> 10.9.2.242:5473              Masq    1      0          0
  -> 10.9.2.243:5473              Masq    1      0          0
TCP  10.8.165.165:8989 rr
  -> 10.8.3.9:8989                Masq    1      0          0
TCP  10.8.166.104:80 rr
  -> 10.8.19.183:8080             Masq    1      0          0
TCP  10.8.166.104:443 rr
  -> 10.8.19.183:8080             Masq    1      0          0
TCP  10.8.167.7:3129 rr
  -> 10.8.19.203:3129             Masq    1      0          0
TCP  10.8.169.41:6379 rr
  -> 10.8.19.251:6379             Masq    1      0          0
TCP  10.8.170.46:80 rr
  -> 10.8.3.90:80                 Masq    1      0          0
  -> 10.8.19.253:80               Masq    1      0          0
TCP  10.8.170.46:443 rr
  -> 10.8.3.90:443                Masq    1      0          0
  -> 10.8.19.253:443              Masq    1      0          0
TCP  10.8.173.224:7000 rr
  -> 10.8.4.135:7000              Masq    1      0          0
TCP  10.8.174.24:10050 rr
  -> 10.9.2.241:10050             Masq    1      0          0
  -> 10.9.2.242:10050             Masq    1      0          0
  -> 10.9.2.243:10050             Masq    1      0          0
  -> 10.9.2.246:10050             Masq    1      0          0
  -> 10.9.2.247:10050             Masq    1      0          0
  -> 10.9.2.248:10050             Masq    1      0          0
TCP  10.8.176.96:8080 rr
  -> 10.8.19.142:8080             Masq    1      0          0
TCP  10.8.178.75:8083 rr
  -> 10.8.3.237:8083              Masq    1      0          0
TCP  10.8.181.72:443 rr
  -> 10.8.3.199:8443              Masq    1      0          0
  -> 10.8.19.9:8443               Masq    1      0          0
TCP  10.8.182.57:9402 rr
  -> 10.8.4.164:9402              Masq    1      0          0
TCP  10.8.184.114:6980 rr
  -> 10.8.4.4:6980                Masq    1      0          0
TCP  10.8.191.122:7979 rr
  -> 10.8.3.246:7979              Masq    1      0          0
TCP  10.8.203.237:6379 rr
  -> 10.8.3.82:6379               Masq    1      0          0
TCP  10.8.204.7:5432 rr
  -> 10.8.19.198:5432             Masq    1      0          0
TCP  10.8.212.115:8080 rr
  -> 10.8.3.66:8080               Masq    1      0          0
TCP  10.8.214.216:5432 rr
  -> 10.8.3.183:5432              Masq    1      0          0
TCP  10.8.221.153:8081 rr
  -> 10.8.4.166:8081              Masq    1      0          0
TCP  10.8.222.3:9000 rr
  -> 10.8.4.120:9000              Masq    1      0          0
TCP  10.8.222.21:3001 rr
  -> 10.8.3.166:3001              Masq    1      0          0
TCP  10.8.225.147:6666 rr
  -> 10.8.3.24:6666               Masq    1      0          0
TCP  10.8.229.253:8080 rr
  -> 10.8.4.18:8080               Masq    1      0          0
TCP  10.8.238.187:80 rr
  -> 10.8.3.199:80                Masq    1      0          0
  -> 10.8.19.9:80                 Masq    1      0          0
TCP  10.8.238.187:443 rr
  -> 10.8.3.199:443               Masq    1      0          0
  -> 10.8.19.9:443                Masq    1      0          0
TCP  10.8.240.108:7878 rr
  -> 10.8.3.18:7878               Masq    1      0          0
TCP  10.8.241.20:8191 rr
  -> 10.8.3.46:8191               Masq    1      0          0
TCP  10.8.246.95:443 rr
  -> 10.8.19.241:10250            Masq    1      0          0
TCP  10.8.247.59:5556 rr
  -> 10.8.19.28:5556              Masq    1      0          0
TCP  10.8.247.59:5557 rr
  -> 10.8.19.28:5557              Masq    1      0          0
TCP  10.9.1.111:6980 rr
  -> 10.8.4.4:6980                Masq    1      0          0
TCP  10.9.1.112:80 rr
  -> 10.8.3.199:80                Masq    1      0          0
  -> 10.8.19.9:80                 Masq    1      0          0
TCP  10.9.1.112:443 rr
  -> 10.8.3.199:443               Masq    1      0          0
  -> 10.8.19.9:443                Masq    1      0          0
TCP  10.9.2.243:30151 rr
  -> 10.8.3.90:443                Masq    1      0          0
  -> 10.8.19.253:443              Masq    1      0          0
TCP  10.9.2.243:30394 rr
  -> 10.8.19.203:3129             Masq    1      0          0
TCP  10.9.2.243:31662 rr
  -> 10.8.3.199:80                Masq    1      0          0
  -> 10.8.19.9:80                 Masq    1      0          0
TCP  10.9.2.243:31851 rr
  -> 10.8.4.4:6980                Masq    1      0          0
TCP  10.9.2.243:31970 rr
  -> 10.8.3.90:80                 Masq    1      0          0
  -> 10.8.19.253:80               Masq    1      0          0
TCP  10.9.2.243:32524 rr
  -> 10.8.3.199:443               Masq    1      0          0
  -> 10.8.19.9:443                Masq    1      0          0
TCP  10.9.2.254:6443 rr
  -> 10.9.2.241:6443              Local   1      0          0
  -> 10.9.2.242:6443              Local   1      0          0
  -> 10.9.2.243:6443              Local   1      0          0
TCP  10.9.2.254:30151 rr
  -> 10.8.3.90:443                Masq    1      0          0
  -> 10.8.19.253:443              Masq    1      0          0
TCP  10.9.2.254:30394 rr
  -> 10.8.19.203:3129             Masq    1      0          0
TCP  10.9.2.254:31662 rr
  -> 10.8.3.199:80                Masq    1      0          0
  -> 10.8.19.9:80                 Masq    1      0          0
TCP  10.9.2.254:31851 rr
  -> 10.8.4.4:6980                Masq    1      0          0
TCP  10.9.2.254:31970 rr
  -> 10.8.3.90:80                 Masq    1      0          0
  -> 10.8.19.253:80               Masq    1      0          0
TCP  10.9.2.254:32524 rr
  -> 10.8.3.199:443               Masq    1      0          0
  -> 10.8.19.9:443                Masq    1      0          0
UDP  10.8.128.10:53 rr
  -> 10.8.4.13:53                 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:700::9902:5d2]:30151 rr
  -> [<ipv6-public-prefix-1>:700::66d:75b7]:443 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4f1]:443 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:700::9902:5d2]:30394 rr
  -> [<ipv6-public-prefix-1>:700::44c3:c4f7]:3129 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:700::9902:5d2]:31662 rr
  -> [<ipv6-public-prefix-1>:700::66d:75af]:80 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4ca]:80 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:700::9902:5d2]:31851 rr
  -> [<ipv6-public-prefix-1>:700::9c31:c102]:6980 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:700::9902:5d2]:31970 rr
  -> [<ipv6-public-prefix-1>:700::66d:75b7]:80 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4f1]:80 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:700::9902:5d2]:32524 rr
  -> [<ipv6-public-prefix-1>:700::66d:75af]:443 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4ca]:443 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:701::11d9]:6980 rr
  -> [<ipv6-public-prefix-1>:700::9c31:c102]:6980 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:701::3d82]:80 rr
  -> [<ipv6-public-prefix-1>:700::66d:75b7]:80 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4f1]:80 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:701::3d82]:443 rr
  -> [<ipv6-public-prefix-1>:700::66d:75b7]:443 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4f1]:443 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:701::8ee0]:80 rr
  -> [<ipv6-public-prefix-1>:700::66d:75af]:80 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4ca]:80 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:701::8ee0]:443 rr
  -> [<ipv6-public-prefix-1>:700::66d:75af]:443 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4ca]:443 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:701::d91b]:3129 rr
  -> [<ipv6-public-prefix-1>:700::44c3:c4f7]:3129 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:811::cafe:0]:3129 rr
  -> [<ipv6-public-prefix-1>:700::44c3:c4f7]:3129 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:811::cafe:3]:80 rr
  -> [<ipv6-public-prefix-1>:700::66d:75b7]:80 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4f1]:80 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:811::cafe:3]:443 rr
  -> [<ipv6-public-prefix-1>:700::66d:75b7]:443 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4f1]:443 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:916::cafe:0]:6980 rr
  -> [<ipv6-public-prefix-1>:700::9c31:c102]:6980 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:916::cafe:1]:80 rr
  -> [<ipv6-public-prefix-1>:700::66d:75af]:80 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4ca]:80 Masq    1      0          0
TCP  [<ipv6-public-prefix-1>:916::cafe:1]:443 rr
  -> [<ipv6-public-prefix-1>:700::66d:75af]:443 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4ca]:443 Masq    1      0          0
TCP  [<ipv6-public-prefix-2>:2::b002:3]:30151 rr
  -> [<ipv6-public-prefix-1>:700::66d:75b7]:443 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4f1]:443 Masq    1      0          0
TCP  [<ipv6-public-prefix-2>:2::b002:3]:30394 rr
  -> [<ipv6-public-prefix-1>:700::44c3:c4f7]:3129 Masq    1      0          0
TCP  [<ipv6-public-prefix-2>:2::b002:3]:31662 rr
  -> [<ipv6-public-prefix-1>:700::66d:75af]:80 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4ca]:80 Masq    1      0          0
TCP  [<ipv6-public-prefix-2>:2::b002:3]:31851 rr
  -> [<ipv6-public-prefix-1>:700::9c31:c102]:6980 Masq    1      0          0
TCP  [<ipv6-public-prefix-2>:2::b002:3]:31970 rr
  -> [<ipv6-public-prefix-1>:700::66d:75b7]:80 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4f1]:80 Masq    1      0          0
TCP  [<ipv6-public-prefix-2>:2::b002:3]:32524 rr
  -> [<ipv6-public-prefix-1>:700::66d:75af]:443 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4ca]:443 Masq    1      0          0
TCP  [<ipv6-public-prefix-2>:2::b002:ffff]:6443 rr
TCP  [<ipv6-public-prefix-2>:2::b002:ffff]:30151 rr
  -> [<ipv6-public-prefix-1>:700::66d:75b7]:443 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4f1]:443 Masq    1      0          0
TCP  [<ipv6-public-prefix-2>:2::b002:ffff]:30394 rr
  -> [<ipv6-public-prefix-1>:700::44c3:c4f7]:3129 Masq    1      0          0
TCP  [<ipv6-public-prefix-2>:2::b002:ffff]:31662 rr
  -> [<ipv6-public-prefix-1>:700::66d:75af]:80 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4ca]:80 Masq    1      0          0
TCP  [<ipv6-public-prefix-2>:2::b002:ffff]:31851 rr
  -> [<ipv6-public-prefix-1>:700::9c31:c102]:6980 Masq    1      0          0
TCP  [<ipv6-public-prefix-2>:2::b002:ffff]:31970 rr
  -> [<ipv6-public-prefix-1>:700::66d:75b7]:80 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4f1]:80 Masq    1      0          0
TCP  [<ipv6-public-prefix-2>:2::b002:ffff]:32524 rr
  -> [<ipv6-public-prefix-1>:700::66d:75af]:443 Masq    1      0          0
  -> [<ipv6-public-prefix-1>:700::44c3:c4ca]:443 Masq    1      0          0

Those are still there, I just grep on ffff so they were filtered from the output.

@lubronzhan
Copy link
Contributor

image
Yeah you see, only the entry is added, but no forwarding rule here.
This is what I see from my env if I execute

ipvsadm -A -t [fd01:3:8:208:250:56ff:fe85:1c2c]:6443
ipvsadm -a -t [fd01:3:8:208:250:56ff:fe85:1c2c]:6443 -r [fd01:3:8:208:250:56ff:fe85:1c2]:6443 -m

You see there is extra line with ->, that's the output of above second line

root@wl-antrea-controlplane-cms7d-t84hw:/home/capv# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  [fd01:3:8:208:250:56ff:fe85:1c2b]:6443 rr
TCP  [fd01:3:8:208:250:56ff:fe85:1c2c]:6443 wlc
  -> [fd01:3:8:208:250:56ff:fe85:1c2]:6443 Masq    1      0          0

From you output, looks like it was removed, so probably kube-proxy is doing that.

Now you can try

  1. stop kube-proxy
  2. do ipvsadm -a -t [<ipv6-public-prefix-2>:2::b002:ffff]:6443 -r [2607:f8b0:4004:c19::5e]:6443 -m again
  3. See if you see the same rule that I see. ipvsadm -Ln | grep -A 3 "ffff]:6443"

@RomLecat
Copy link
Author

RomLecat commented Mar 12, 2024

My bad, I forgot to add the rule.
The rule stays here if I add it:

root@k8s-master-3:~# ipvsadm -a -t [<ipv6-prefix-2>:2::b002:ffff]:6443 -r [2607:f8b0:4004:c19::5e]:6443 -m
root@k8s-master-3:~# ipvsadm -Ln | grep ffff -A3
TCP  [<ipv6-prefix-2>:2::b002:ffff]:6443 rr
  -> [2607:f8b0:4004:c19::5e]:6443 Masq    1      0          0
[...]

@lubronzhan
Copy link
Contributor

lubronzhan commented Mar 12, 2024

mm, how does that Scheduler changed from wlc to rr, you added the virtual service with -s rr?

TCP  [<ipv6-prefix-2>:2::b002:ffff]:6443 rr <-----

I saw when you added here it's only wlc . It shouldn't matter though.

If it's not modified by kube-proxy, then, could you paste the log of kube-proxy here, wanna see if there is any error inside.

I'll also try on dual stack env with node address set to ipv6 address

@lubronzhan
Copy link
Contributor

Hm I can't reproduce this on my with my dual stack env, node has both ipv4 and ipv6 addresses. Do you observe the issue if you just turn on kube-vip while turning off kube-proxy or use kube-proxy iptables mode?

@RomLecat
Copy link
Author

RomLecat commented Mar 13, 2024

mm, how does that Scheduler changed from wlc to rr, you added the virtual service with -s rr?

TCP  [<ipv6-prefix-2>:2::b002:ffff]:6443 rr <-----

I saw when you added here it's only wlc . It shouldn't matter though.

I did not stopped kube-vip on this test, hence why scheduler was rr. I tried again with kube-proxy stopped, and the result is the same (entry stays as-is and kube-proxy does not touch it).

If it's not modified by kube-proxy, then, could you paste the log of kube-proxy here, wanna see if there is any error inside.

I'll also try on dual stack env with node address set to ipv6 address

k3s does logs everything under its daemon so it's a bit hard to extract kube-proxy logs, but some grep on warning or error doesn't tell anything useful or kube-proxy related.

Hm I can't reproduce this on my with my dual stack env, node has both ipv4 and ipv6 addresses. Do you observe the issue if you just turn on kube-vip while turning off kube-proxy or use kube-proxy iptables mode?

Which part were you not able to reproduce ? The service creation looping part ?

@lubronzhan
Copy link
Contributor

lubronzhan commented Mar 14, 2024

The service creation looping part

Yeah I don't see that in my env. But kube-proxy is using iptables mode, and I'm using kubeadm to bootstrap

and the result is the same (entry stays as-is and kube-proxy does not touch it).

Does kube-vip reports error even if you turn off kube-proxy?

@RomLecat
Copy link
Author

I just tried disabling kube-proxy entirely, but the same issue still occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants