The load balancer IP addresses are not being release after the LB-type Services are deleted #835

starbops · 2024-04-29T08:43:06Z

Describe the bug

In ARP mode, kube-vip does not remove the load balancer IP address from the node's network interface when the LB-type Service object is deleted.

To Reproduce

Follow the document to prepare the RBAC and kube-vip DaemonSet manifests
Create a K3s cluster consists of three nodes using k3sup

Create a test Deployment called blog with nginx image

kubectl create deployment blog --image=nginx --replicas=3

Create a LB-type Service exposing port 80 of the blog Deployment

kubectl expose deployment blog --port 80 --type=LoadBalancer --overrides='{"metadata": {"annotations": {"kube-vip.io/loadbalancerIPs": "192.168.100.201"}}}'

The logs generated by the kube-vip leader Pod:

$ kubectl -n kube-system logs kube-vip-ds-68ldv
time="2024-04-29T08:28:59Z" level=info msg="Starting kube-vip.io [v0.8.0]"
time="2024-04-29T08:28:59Z" level=info msg="namespace [kube-system], Mode: [ARP], Features(s): Control Plane:[true], Services:[true]"
time="2024-04-29T08:28:59Z" level=info msg="prometheus HTTP server started"
time="2024-04-29T08:28:59Z" level=info msg="Using node name [k3s-03]"
time="2024-04-29T08:28:59Z" level=info msg="Starting Kube-vip Manager with the ARP engine"
time="2024-04-29T08:28:59Z" level=info msg="beginning services leadership, namespace [kube-system], lock name [plndr-svcs-lock], id [k3s-03]"
I0429 08:28:59.926436       1 leaderelection.go:250] attempting to acquire leader lease kube-system/plndr-svcs-lock...
time="2024-04-29T08:28:59Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plndr-cp-lock], id [k3s-03]"
I0429 08:28:59.926733       1 leaderelection.go:250] attempting to acquire leader lease kube-system/plndr-cp-lock...
time="2024-04-29T08:28:59Z" level=info msg="Node [k3s-02] is assuming leadership of the cluster"
time="2024-04-29T08:28:59Z" level=info msg="new leader elected: k3s-02"
I0429 08:29:01.419796       1 leaderelection.go:260] successfully acquired lease kube-system/plndr-cp-lock
time="2024-04-29T08:29:01Z" level=info msg="Node [k3s-03] is assuming leadership of the cluster"
time="2024-04-29T08:29:01Z" level=info msg="Gratuitous Arp broadcast will repeat every 3 seconds for [192.168.100.100/enp1s0]"
I0429 08:29:05.139854       1 leaderelection.go:260] successfully acquired lease kube-system/plndr-svcs-lock
time="2024-04-29T08:29:05Z" level=info msg="(svcs) starting services watcher for all namespaces"
time="2024-04-29T08:31:25Z" level=info msg="(svcs) adding VIP [192.168.100.201] via enp1s0 for [default/blog]"
time="2024-04-29T08:31:25Z" level=info msg="[service] synchronised in 15ms"
time="2024-04-29T08:31:25Z" level=warning msg="(svcs) already found existing address [192.168.100.201] on adapter [enp1s0]"
time="2024-04-29T08:31:28Z" level=warning msg="Re-applying the VIP configuration [192.168.100.201] to the interface [enp1s0]"

Find out where the leader kube-vip Pod is and SSH into the node to check whether the load balancer IP address is actually assigned to the network interface that kube-vip is working on

$ ip addr show enp1s0
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 96:27:05:1e:11:d0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.140/24 brd 192.168.100.255 scope global dynamic enp1s0
       valid_lft 31355738sec preferred_lft 31355738sec
    inet 192.168.100.100/32 scope global enp1s0
       valid_lft forever preferred_lft forever
    inet 192.168.100.201/32 scope global enp1s0
       valid_lft forever preferred_lft forever
    inet6 fe80::9427:5ff:fe1e:11d0/64 scope link 
       valid_lft forever preferred_lft forever

Delete the LB-type Service

kubectl delete svc blog

There's only one line of log generated by the kube-vip leader Pod:

time="2024-04-29T08:34:46Z" level=info msg="(svcs) [default/blog] has been deleted"

Go back to the node and see the IP address is still there

$ ip addr show enp1s0
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 96:27:05:1e:11:d0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.140/24 brd 192.168.100.255 scope global dynamic enp1s0
       valid_lft 31355589sec preferred_lft 31355589sec
    inet 192.168.100.100/32 scope global enp1s0
       valid_lft forever preferred_lft forever
    inet 192.168.100.201/32 scope global enp1s0
       valid_lft forever preferred_lft forever
    inet6 fe80::9427:5ff:fe1e:11d0/64 scope link 
       valid_lft forever preferred_lft forever

Expected behavior

The load balancer IP address should be removed from the network interface of the node after the corresponding LB-type Service object is deleted.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

OS/Distro: Ubuntu 20.04

Kubernetes Version: K3s v1.29.3+k3s1

$ kubectl get nodes -o wide
NAME     STATUS   ROLES                       AGE     VERSION        INTERNAL-IP       EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
k3s-01   Ready    control-plane,etcd,master   5h48m   v1.29.3+k3s1   192.168.100.117   <none>        Ubuntu 20.04.4 LTS   5.4.0-121-generic   containerd://1.7.11-k3s2
k3s-02   Ready    control-plane,etcd,master   5h46m   v1.29.3+k3s1   192.168.100.105   <none>        Ubuntu 20.04.4 LTS   5.4.0-121-generic   containerd://1.7.11-k3s2
k3s-03   Ready    control-plane,etcd,master   5h46m   v1.29.3+k3s1   192.168.100.140   <none>        Ubuntu 20.04.4 LTS   5.4.0-121-generic   containerd://1.7.11-k3s2

Kube-vip Version: v0.8.0

Kube-vip.yaml:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "1"
    objectset.rio.cattle.io/applied: H4sIAAAAAAAA/7xVUW8iRwz+K5WfF44lCaEr9QE1qRS1RdElah8ihMyMgSmzM1OPdxMU7X+vZuHIQo5c24d7QPKMP39re+yPV8Bg/iCOxjsoAEOIn+ocMtgYp6GAG6TSuwcSyKAkQY2CULwCOucFxXgX09Ev/iIlkaTPxvcViljqG//JJA7Izvr9syPureoNFLC5iB1PnWc//Gqc/mmitXffpHBYUuKoFtSrTejxAtW/CooB1SEybqNQCU0Giqmt7tGUFAXLAIWrrM3A4oJsWzOG0E9R7EgoJtbTJHSE7Cuw+tDuetAf9wcf5LnGuIYCFmPE0ZjoajQe5otrxCt9OSTC4ej66hoXo7HOry+0VinzrybxQaUxkErlRLKkxHOySxS1/u2/Vdo0GQiVwaJQy9GZlu/azaZTFC6XxhnZJtt5TZPOmenvyjDpm4qNWz2oNenKGre6Wzl/uL59IVVJy79jeNi36ZG4jFA87Zt1+xKYYtztw9MrbGgLRRvQY2/pJOcSoxCndw/E2DYdbl9MlAjNrMn+F6fyTtjbXrDo6Bz1rEndSVA0jnjHi7xKBpTocEUMswzI1a1r/wa1CXPkABnUaKt0I1wRpEz3iOBZOu7R5eVF150IUuLtcQ/7hX2Zuro0ZPVnWh7se5Q09OkN+ylomoJS4kd0xgnxMo3022fJhTwOTj+sjOYO6GLYBWgX56XXXZal4ShdjApzcriw9EEDVJi/Ldkb7HjZDuhYq29TJpAljEddS722TnMv1ir2rFeb03ItoSZux9S0ynmOfw8+w6/COfZIumI8Ib86BTI5etaE2hrXZX83F0zC20BsvO7A8i4MtU6b0HX/OOzno3E/HwzS72gU2Zcka6riPBLX1H37YpjnQ2hmGZgSV+lmtVac1ueLzByM4iDOLfS+svbeW6PSDt4tp17umSI5gXeSCxkwRV+xoiRtSY5IVWxk+7N3Qi/SSiIGXBhrxNBO/7ROOzi9fZxPbn6/m0LW2p8nf0Ja2lkGax9lSvLseQNFesvEy7VRNFHKV06m79IQb4m//Es/vQItl6QECpj6vd6dUYrsCLsTwbOi0mRQBY1CD8IotErq2jTNPwEAAP//uInBLlsIAAA
    objectset.rio.cattle.io/id: ""
    objectset.rio.cattle.io/owner-gvk: k3s.cattle.io/v1, Kind=Addon
    objectset.rio.cattle.io/owner-name: kube-vip-rbac
    objectset.rio.cattle.io/owner-namespace: kube-system
  creationTimestamp: "2024-04-29T02:49:32Z"
  generation: 1
  labels:
    app.kubernetes.io/name: kube-vip-ds
    app.kubernetes.io/version: v0.8.0
    objectset.rio.cattle.io/hash: b8aa68ee56821b7aa5d42eea26757ab68d173ddc
  name: kube-vip-ds
  namespace: kube-system
  resourceVersion: "8708"
  uid: 47ef7952-1d6a-49e7-a9d7-24ba3654246b
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-vip-ds
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/name: kube-vip-ds
        app.kubernetes.io/version: v0.8.0
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-role.kubernetes.io/master
                operator: Exists
            - matchExpressions:
              - key: node-role.kubernetes.io/control-plane
                operator: Exists
      containers:
      - args:
        - manager
        env:
        - name: vip_arp
          value: "true"
        - name: port
          value: "6443"
        - name: vip_nodename
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: vip_interface
          value: enp1s0
        - name: vip_cidr
          value: "32"
        - name: dns_mode
          value: first
        - name: cp_enable
          value: "true"
        - name: cp_namespace
          value: kube-system
        - name: svc_enable
          value: "true"
        - name: svc_leasename
          value: plndr-svcs-lock
        - name: vip_leaderelection
          value: "true"
        - name: vip_leasename
          value: plndr-cp-lock
        - name: vip_leaseduration
          value: "5"
        - name: vip_renewdeadline
          value: "3"
        - name: vip_retryperiod
          value: "1"
        - name: address
          value: 192.168.100.100
        - name: prometheus_server
          value: :2112
        image: ghcr.io/kube-vip/kube-vip-iptables:v0.8.0
        imagePullPolicy: IfNotPresent
        name: kube-vip
        resources: {}
        securityContext:
          capabilities:
            add:
            - NET_ADMIN
            - NET_RAW
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      hostNetwork: true
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: kube-vip
      serviceAccountName: kube-vip
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        operator: Exists
      - effect: NoExecute
        operator: Exists
  updateStrategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate
status:
  currentNumberScheduled: 3
  desiredNumberScheduled: 3
  numberAvailable: 3
  numberMisscheduled: 0
  numberReady: 3
  observedGeneration: 1
  updatedNumberScheduled: 3

Additional context

This was first found in a Harvester deployment (RKE2-based cluster) validating the load balancer functionality when upgrading kube-vip from v0.6.0 to v0.8.0 (details were recorded in harvester/harvester#5682), but the same issue is also spotted on a newly created K3s cluster, as described in the reproduction section. I hope this will help reduce the noise.

The text was updated successfully, but these errors were encountered:

starbops · 2024-04-30T08:21:56Z

I enabled the debug level and saw the same Service object was reconciled again and again.

time="2024-04-30T06:43:26Z" level=debug msg="(svcs) [blog] has been added/modified with addresses [[192.168.100.201]]"
time="2024-04-30T06:43:26Z" level=debug msg="[STARTING] Service Sync"
time="2024-04-30T06:43:26Z" level=debug msg="init enable service security: false"
time="2024-04-30T06:43:26Z" level=info msg="(svcs) adding VIP [192.168.100.201] via enp1s0 for [default/blog]"
time="2024-04-30T06:43:26Z" level=debug msg="(svcs) will update [default/blog]"
time="2024-04-30T06:43:26Z" level=debug msg="(svcs) broadcasting ARP update for 192.168.100.201 via enp1s0, every 3000ms"
time="2024-04-30T06:43:26Z" level=info msg="[service] synchronised in 15ms"
time="2024-04-30T06:43:26Z" level=warning msg="(svcs) already found existing address [192.168.100.201] on adapter [enp1s0]"
time="2024-04-30T06:43:26Z" level=debug msg="(svcs) [blog] has been added/modified with addresses [[192.168.100.201]]"
time="2024-04-30T06:43:26Z" level=debug msg="[STARTING] Service Sync"
time="2024-04-30T06:43:26Z" level=debug msg="isDHCP: false, newServiceAddress: 192.168.100.201"
time="2024-04-30T06:43:26Z" level=debug msg="(svcs) [blog] has been added/modified with addresses [[192.168.100.201]]"
time="2024-04-30T06:43:26Z" level=debug msg="[STARTING] Service Sync"
time="2024-04-30T06:43:26Z" level=debug msg="isDHCP: false, newServiceAddress: 192.168.100.201"
time="2024-04-30T06:43:29Z" level=warning msg="Re-applying the VIP configuration [192.168.100.201] to the interface [enp1s0]"

It seems that the Service's UID was not put into the activeSerivce map, which is why the peculiar behavior occurred.

starbops mentioned this issue Apr 30, 2024

fix: set service's uid in activeService map for inactive services #837

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The load balancer IP addresses are not being release after the LB-type Services are deleted #835

The load balancer IP addresses are not being release after the LB-type Services are deleted #835

starbops commented Apr 29, 2024

starbops commented Apr 30, 2024

The load balancer IP addresses are not being release after the LB-type Services are deleted #835

The load balancer IP addresses are not being release after the LB-type Services are deleted #835

Comments

starbops commented Apr 29, 2024

starbops commented Apr 30, 2024