Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delayed NLRI withdrawl for ingress-nginx endpoint removal #32487

Closed
2 of 3 tasks
bewing opened this issue May 11, 2024 · 3 comments · Fixed by #32536
Closed
2 of 3 tasks

Delayed NLRI withdrawl for ingress-nginx endpoint removal #32487

bewing opened this issue May 11, 2024 · 3 comments · Fixed by #32536
Assignees
Labels
area/bgp kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack.

Comments

@bewing
Copy link
Contributor

bewing commented May 11, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

The NLRI for the LoadBalancerIP with externalTrafficPolicy: Local of the ingress-nginx-controller is not withdrawn when the Endpoint disappears, and appears to delay until the pod is completely gone.

Cilium Version

v1.15.4

Kernel Version

5.15.0-56-generic

Kubernetes Version

v1.29.2

Regression

No response

Sysdump

cilium-sysdump-20240511-191215.zip

Relevant log output

No response

Anything else?

To reproduce:

Files for this can be viewed at https://github.com/bewing/cilium-issue-32487

Clone repo with config files:

$ git clone https://github.com/bewing/cilium-issue-32487; cd cilium-issue-32487

Create single-node k8s cluster with kind, and deploy Cilium

$ kind create cluster --config kind.yaml
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.29.2) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a nice day! 👋
$ helm upgrade --install --namespace kube-system --repo https://helm.cilium.io cilium cilium --values cilium.yaml
Release "cilium" does not exist. Installing it now.
NAME: cilium
LAST DEPLOYED: Sat May 11 18:30:35 2024
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
You have successfully installed Cilium with Hubble.

Your release version is 1.15.4.

For any further help, visit https://docs.cilium.io/en/v1.15/gettinghelp

Once the operator is Ready, examine the IP assigned to the node, and edit frr.conf and peer.yaml to match, apply the BGP-specific config, and start an FRR instance. Confirm the peering gets established.

$ kubectl get nodes -o wide
NAME                 STATUS   ROLES           AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION      CONTAINER-RUNTIME
kind-control-plane   Ready    control-plane   102s   v1.29.2   172.19.0.2    <none>        Debian GNU/Linux 12 (bookworm)   5.15.0-56-generic   containerd://1.7.13

$ kubectl get deployments --namespace kube-system cilium-operator
NAME              READY   UP-TO-DATE   AVAILABLE   AGE
cilium-operator   1/1     1            1           2m50s

$ vim frr.conf peer.yaml
< update IPs to reflect subnet assigned by Docker to the kind network >

$ kubectl apply -f peer.yaml -f lb.yaml
ciliumbgppeeringpolicy.cilium.io/kind-control-plane created
ciliumloadbalancerippool.cilium.io/default created

$ docker run --name frr -d --rm --network kind --init --privileged -v $(pwd)/frr.conf:/etc/frr/frr.conf -v $(pwd)/frr.daemons:/etc/frr/daemons frrouting/frr:latest

$ docker exec -it frr vtysh -c "show ip bgp sum"
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.

IPv4 Unicast Summary (VRF default):
BGP router identifier 172.19.0.3, local AS number 65530 vrf-id 0
BGP table version 1
RIB entries 1, using 192 bytes of memory
Peers 1, using 717 KiB of memory

Neighbor                       V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
kind-control-plane(172.19.0.2) 4      65530        39        37        0    0    0 00:01:45            1        0 N/A

Total number of neighbors 1

Install ingress-nginx, confirm BGP announcement of the assigned LoadBalancer IP:

$ helm upgrade --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace kube-system --values ingress.yaml
Release "ingress-nginx" does not exist. Installing it now.
NAME: ingress-nginx
LAST DEPLOYED: Sat May 11 18:39:58 2024
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The ingress-nginx controller has been installed.
It may take a few minutes for the load balancer IP to be available.
You can watch the status by running 'kubectl get service --namespace kube-system ingress-nginx-controller --output wide --watch'

$ docker exec -it frr vtysh -c "show ip bgp"
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
BGP table version is 2, local router ID is 172.19.0.3, vrf id 0
Default local pref 100, local AS 65530
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
*>i100.64.0.0/24    172.19.0.2(kind-control-plane)
                                                  100      0 i
*>i100.66.0.1/32    172.19.0.2(kind-control-plane)
                                                  100      0 i

Displayed  2 routes and 2 total paths

Scale the controller to 0 replicas. Confirm that the endpoint is deleted. See that the route has not been withdrawn, and sticks for multiple seconds:

$ kubectl --namespace kube-system get endpoints ingress-nginx-controller
NAME                       ENDPOINTS                          AGE
ingress-nginx-controller   100.64.0.205:443,100.64.0.205:80   2m9s

$ kubectl --namespace kube-system scale deploy ingress-nginx-controller --replicas=0; kubectl --namespace kube-system get endpoints ingress-nginx-controller; sleep 5; docker exec -it frr vtysh -c "show ip bgp"
deployment.apps/ingress-nginx-controller scaled
NAME                       ENDPOINTS   AGE
ingress-nginx-controller   <none>      35m
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
BGP table version is 4, local router ID is 172.19.0.3, vrf id 0
Default local pref 100, local AS 65530
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
*>i100.64.0.0/24    172.19.0.2(kind-control-plane)
                                                  100      0 i
*>i100.66.0.1/32    172.19.0.2(kind-control-plane)
                                                  100      0 i

Displayed  2 routes and 2 total paths

Once the ingress-nginx-controller pod finally finishes Terminating, and goes away, the route will finally get withdrawn:

$ docker exec -it frr vtysh -c "show ip bgp"
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
BGP table version is 3, local router ID is 172.19.0.3, vrf id 0
Default local pref 100, local AS 65530
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
*>i100.64.0.0/24    172.19.0.2(kind-control-plane)
                                                  100      0 i

Displayed  1 routes and 1 total paths

During the period before the route is withdrawn, but the controller is in the Terminating state, clients who are directed to the controller will get connection refused.

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct
@bewing bewing added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels May 11, 2024
bewing added a commit to bewing/cilium-issue-32487 that referenced this issue May 11, 2024
@harsimran-pabla
Copy link
Contributor

Thanks @bewing for reporting this issue.

I suspect that BGP control plane does not inspect endpoint state, just the presence for withdrawing the route.

Can you test this theory by modifying the graceful termination 'terminationGracePeriodSeconds' period of ingress-nginx-controller. I think it is 30s by default, which is the delay you are seeing for route withdrawal. If that is the case, we have look at modifying BGP endpoint selection code.

@bewing
Copy link
Contributor Author

bewing commented May 13, 2024

Modifying the deployment for the ingress controller to set terminationGracePeriodSeconds to 0 and scaling down the controller results in the pod going away immediately on scaledown, and I see the route disappear immediately as well.

$ kubectl --namespace kube-system scale deploy ingress-nginx-controller --replicas=0; docker exec -it frr vtysh -c "show ip bgp"
deployment.apps/ingress-nginx-controller scaled
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
BGP table version is 9, local router ID is 172.19.0.3, vrf id 0
Default local pref 100, local AS 65530
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
*>i100.64.0.0/24    172.19.0.2(kind-control-plane)
                                                  100      0 i

Displayed  1 routes and 1 total paths

@harsimran-pabla harsimran-pabla added area/bgp and removed needs/triage This issue requires triaging to establish severity and next steps. labels May 13, 2024
@harsimran-pabla
Copy link
Contributor

I think we should check if backend is not in terminating state here. Thanks for confirming it is related to pod termination delay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bgp kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants