Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Duplicate IPs and empty IPPools #393

Open
OmniBerkshire opened this issue Dec 5, 2023 · 3 comments
Open

[BUG] Duplicate IPs and empty IPPools #393

OmniBerkshire opened this issue Dec 5, 2023 · 3 comments

Comments

@OmniBerkshire
Copy link

OmniBerkshire commented Dec 5, 2023

Describe the bug
We are frequently ending up with pods being assigned the same IP, which results in many dueling ARP entries, and much consternation and gnashing of teeth.

We are using the kubernetes method for keeping track of allocations, not the etcd method.

A symptom of this is that the IPPool CRDs frequently end up empty as well.

Expected behavior
Each pod always has a unique IP address and the IPPool CRDs accurately reflect IP allocation.

To Reproduce
We have not managed to reproduce this behavior in a controlled environment. I've tried many different things to make it happen, and none of them have worked. :-(

Environment:

  • Whereabouts version : v0.6.1
  • Kubernetes version (use kubectl version): 1.26.1
  • Network-attachment-definition: Proprietary
  • Whereabouts configuration (on the host):
{
  "datastore": "kubernetes",
  "kubernetes": {
    "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
  },
  "reconciler_cron_expression": "30 4 * * *"
}
  • OS (e.g. from /etc/os-release): Talos
  • Kernel (e.g. uname -a): Linux talos-5i3-rhe 5.15.102-talos #1 SMP Mon Mar 13 18:10:38 UTC 2023 x86_64 Linux
  • Others: N/A

Additional info / context
Pod or node churn seem important. We haven't been able to catch the problem 'in the act' yet though, so...

@wwcd
Copy link

wwcd commented Dec 7, 2023

I had the same issue when the node powered down.

In this scenario, the pod is reborn on other nodes, and there will be a reappeared podref in the ippools. If recycling occurs regularly, there is a probability that the newly created pod address will be recycled incorrectly, resulting in repeated address allocation.

The associated code is as follows。

func getPodRefsServedByWhereabouts(ipPools []storage.IPPool) map[string]void {

@wwcd
Copy link

wwcd commented Dec 8, 2023

The problematic code may be analyzed as follows

  1. Assume that after the node is powered off, the pod is reborn with the following ippools data
     "18278": <--pod powered off by node, leaked address
       id: 051a974db72a38d2c226aebbf46f596c62c95e2c527e7dae51d6df52d252bb51
       podref: xxx/pod-1
     "18279": <--Newly generated pod, newly applied address
       id: b2a431a89ae18c3132eb3708306a901ea5999da64f10f0a23f14ede25602c01c
       podref: xxx/pod-1
  1. Decide whether to recycle the process. Since the data returned by pool.Allocations() is a list transferred from map, the order of the returned list members is not fixed. It may be [18278, 18279], or it may be [18279, 18278] , this step assumes that the returned value is [18278, 18279]. At this time, isPodAlive returns false, because the address of 18278 is inconsistent with the current real pod address, so the decision is xxx/pod-1 to be recycled
func (rl *ReconcileLooper) findOrphanedIPsPerPool(ipPools []storage.IPPool) error {
     for _, pool := range ipPools {
         orphanIP := OrphanedIPReservations{
             Pool: pool,
         }
         for _, ipReservation := range pool.Allocations() {//The order of the returned list is not fixed
             logging.Debugf("the IP reservation: %s", ipReservation)
             if ipReservation.PodRef == "" {
                 _ = logging.Errorf("pod ref missing for Allocations: %s", ipReservation)
                 continue
             }
             if !rl.isPodAlive(ipReservation.PodRef, ipReservation.IP.String()) {
                 logging.Debugf("pod ref %s is not listed in the live pods list", ipReservation.PodRef)
                 orphanIP.Allocations = append(orphanIP.Allocations, ipReservation)
             }
         }
         if len(orphanIP.Allocations) > 0 {
             rl.orphanedIPs = append(rl.orphanedIPs, orphanIP)
         }
     }

     return nil
}
  1. When the address recycling is executed, orphanedIP.Pool.Allocations() returns [18279, 18278] because the order is not fixed, causing allocate.IterateForDeallocation to recycle the currently used 18279 during recycling.
func (rl ReconcileLooper) ReconcileIPPools(ctx context.Context) ([]net.IP, error) {
     matchByPodRef := func(reservations []types.IPReservation, podRef string) int {
         foundidx := -1
         for idx, v := range reservations {
             if v.PodRef == podRef {
                 returnidx
             }
         }
         return foundidx
     }

     var err error
     var totalCleanedUpIps[]net.IP
     for _, orphanedIP := range rl.orphanedIPs {
         currentIPReservations := orphanedIP.Pool.Allocations()//The order of the returned list is not fixed
         podRefsToDeallocate := findOutPodRefsToDeallocateIPsFrom(orphanedIP)
         var deallocatedIP net.IP
         for _, podRef := range podRefsToDeallocate {
             currentIPReservations, deallocatedIP, err = allocate.IterateForDeallocation(currentIPReservations, podRef, matchByPodRef)//may cause the correct address to be recycled
             if err != nil {
                 return nil, err
             }
         }

         logging.Debugf("Going to update the reserve list to: %+v", currentIPReservations)
         if err := orphanedIP.Pool.Update(ctx, currentIPReservations); err != nil {
             logging.Errorf("failed to update the reservation list: %v", err)
             continue
         }
         totalCleanedUpIps = append(totalCleanedUpIps, deallocatedIP)
     }

     return totalCleanedUpIps, nil
}

@OmniBerkshire
Copy link
Author

This looks interesting. But I don't understand enough about what's going on here to have a strong opinion as to whether or not this is the issue. Though, it does suggest some ideas for reproducing the issue.

I would like to also mention that we do use the standard practice of having a DaemonSet that installs the CNI and keeps a reconciler/garbage collector running.

We are using multus to give the pod several interfaces. The primary interface is using the Flannel CNI, and it works fine. The other two are using a macvlan CNI with an IP being assigned by whereabouts. And yes, we know that the IP ranges the IPs are being assigned from don't overlap. And we consistently use the same NetworkAttachmentDefinitions, so we don't really have a way to have lots of ranges anyway.

And we are seeing a number of odd log messages, though it's not clear that any particular one is related. One log message we see is this:

│ E1211 15:41:37.094624 24 leaderelection.go:330] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants