Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-apiserver IP adresses disappear from bpf ipcache 10 minutes after restarting cilium-agent #24502

Closed
2 tasks done
svanschie opened this issue Mar 21, 2023 · 51 comments · Fixed by #24875
Closed
2 tasks done
Assignees
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. needs/triage This issue requires triaging to establish severity and next steps. priority/high This is considered vital to an upcoming release. sig/agent Cilium agent related. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.

Comments

@svanschie
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

When I restart the cilium agent, everything works fine for 10 minutes. After that connectivity to kube-apiserver breaks. I expect this happens due to the IP adresses of the API no longer being in the ipcache. The IP addresses seem to be removed from this cache as part of the functionality behind the --identity-restore-grace-period flag, which is set to the default of 10 minutes.

The IP addresses of my API server are:

  • 10.102.3.28
  • 10.102.3.42

After a restart, they both show up twice in the output of cilium identity list:

16777219   cidr:10.102.3.28/32
           reserved:world
16777220   cidr:10.102.3.42/32
           reserved:world
16777222   cidr:10.102.3.42/32
           reserved:kube-apiserver
           reserved:world
16777223   cidr:10.102.3.28/32
           reserved:kube-apiserver
           reserved:world

The corresponding entries in cilium bpf ipcache list:

10.102.3.28/32                  identity=16777223 encryptkey=0 tunnelendpoint=0.0.0.0 nodeid=0
10.102.3.42/32                  identity=16777222 encryptkey=0 tunnelendpoint=0.0.0.0 nodeid=0

After 10 minutes, identities 16777219 and 16777220 are removed:

2023-03-21T12:48:07.411157839Z level=debug msg="UpdateIdentities: Deleting identity" identity=16777219 labels="[cidr:0.0.0.0/0 cidr:0.0.0.0/1 cidr:0.0.0.0/2 cidr:0.0.0.0/3 cidr:0.0.0.0/4 cidr:10.0.0.0/7 cidr:10.0.0.0/8 cidr:10.0.0.0/9 cidr:10.100.0.0/14 cidr:10.102.0.0/15 cidr:10.102.0.0/16 cidr:10.102.0.0/17 cidr:10.102.0.0/18 cidr:10.102.0.0/19 cidr:10.102.0.0/20 cidr:10.102.0.0/21 cidr:10.102.0.0/22 cidr:10.102.2.0/23 cidr:10.102.3.0/24 cidr:10.102.3.0/25 cidr:10.102.3.0/26 cidr:10.102.3.0/27 cidr:10.102.3.16/28 cidr:10.102.3.24/29 cidr:10.102.3.28/30 cidr:10.102.3.28/31 cidr:10.102.3.28/32 cidr:10.64.0.0/10 cidr:10.96.0.0/11 cidr:10.96.0.0/12 cidr:10.96.0.0/13 cidr:8.0.0.0/5 cidr:8.0.0.0/6 reserved:world]" subsys=policy
2023-03-21T12:48:07.411168359Z level=debug msg="UpdateIdentities: Deleting identity" identity=16777220 labels="[cidr:0.0.0.0/0 cidr:0.0.0.0/1 cidr:0.0.0.0/2 cidr:0.0.0.0/3 cidr:0.0.0.0/4 cidr:10.0.0.0/7 cidr:10.0.0.0/8 cidr:10.0.0.0/9 cidr:10.100.0.0/14 cidr:10.102.0.0/15 cidr:10.102.0.0/16 cidr:10.102.0.0/17 cidr:10.102.0.0/18 cidr:10.102.0.0/19 cidr:10.102.0.0/20 cidr:10.102.0.0/21 cidr:10.102.0.0/22 cidr:10.102.2.0/23 cidr:10.102.3.0/24 cidr:10.102.3.0/25 cidr:10.102.3.0/26 cidr:10.102.3.32/27 cidr:10.102.3.32/28 cidr:10.102.3.40/29 cidr:10.102.3.40/30 cidr:10.102.3.42/31 cidr:10.102.3.42/32 cidr:10.64.0.0/10 cidr:10.96.0.0/11 cidr:10.96.0.0/12 cidr:10.96.0.0/13 cidr:8.0.0.0/5 cidr:8.0.0.0/6 reserved:world]" subsys=policy

And entries from ipcache for the API server IP addresses are also removed:

2023-03-21T12:48:07.411224640Z level=debug msg="Deleting IP from ipcache layer" ipAddr=10.102.3.28/32 subsys=ipcache
2023-03-21T12:48:07.411231711Z level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777223 generated false}" ipAddr="{10.102.3.28 ffffffff}" modification=Delete subsys=datapath-ipcache
2023-03-21T12:48:07.411240471Z level=debug msg="Deleting IP from ipcache layer" ipAddr=10.102.3.42/32 subsys=ipcache
2023-03-21T12:48:07.411242771Z level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777222 generated false}" ipAddr="{10.102.3.42 ffffffff}" modification=Delete subsys=datapath-ipcache

No new entries are added to the ipcache for the API server IP addresses.

When allowing outbound traffic to port 443 anywhere it works fine, but with a policy that allows traffic to the cluster (or kube-apiserver specifically) entities the traffic is dropped:

apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: default-allow-cluster-only
spec:
  endpointSelector: {}
  egress:
    - toEntities:
        - cluster

Example of a dropped request:

{
  "flow": {
    "time": "2023-03-21T12:58:09.767376425Z",
    "verdict": "DROPPED",
    "drop_reason": 133,
    "ethernet": {
      "source": "4a:7c:ea:1f:d8:40",
      "destination": "c2:ef:b0:d0:23:9a"
    },
    "IP": {
      "source": "10.102.4.59",
      "destination": "10.102.3.28",
      "ipVersion": "IPv4"
    },
    "l4": {
      "TCP": {
        "source_port": 59884,
        "destination_port": 443,
        "flags": {
          "SYN": true
        }
      }
    },
    "source": {
      "ID": 156,
      "identity": 55508,
      "namespace": "external-dns",
      "labels": [
        "k8s:app.kubernetes.io/instance=external-dns",
        "k8s:app.kubernetes.io/name=external-dns",
        "k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=external-dns",
        "k8s:io.cilium.k8s.policy.cluster=default",
        "k8s:io.cilium.k8s.policy.serviceaccount=external-dns",
        "k8s:io.kubernetes.pod.namespace=external-dns"
      ],
      "pod_name": "external-dns-659779cd66-jt4ck",
      "workloads": [
        {
          "name": "external-dns",
          "kind": "Deployment"
        }
      ]
    },
    "destination": {
      "identity": 2,
      "labels": [
        "reserved:world"
      ]
    },
    "Type": "L3_L4",
    "node_name": "ip-10-102-5-6.eu-west-1.compute.internal",
    "event_type": {
      "type": 1,
      "sub_type": 133
    },
    "traffic_direction": "EGRESS",
    "drop_reason_desc": "POLICY_DENIED",
    "Summary": "TCP Flags: SYN"
  },
  "node_name": "ip-10-102-5-6.eu-west-1.compute.internal",
  "time": "2023-03-21T12:58:09.767376425Z"
}

Cilium Version

1.13.1

Kernel Version

5.15.79

Kubernetes Version

v1.24.8-eks-ffeb93d

Sysdump

No response

Relevant log output

2023-03-21T12:38:01.379721686Z level=debug msg="Resolving identity" identityLabels="cidr:10.102.3.28/32,reserved:world" subsys=identity-cache
2023-03-21T12:38:01.379730526Z level=debug msg="Reallocated restored CIDR identity: 16777219" subsys=identity-cache
2023-03-21T12:38:01.379806618Z level=debug msg="Resolving identity" identityLabels="cidr:10.102.3.42/32,reserved:world" subsys=identity-cache
2023-03-21T12:38:01.379815458Z level=debug msg="Reallocated restored CIDR identity: 16777220" subsys=identity-cache
2023-03-21T12:38:01.381765715Z level=debug msg="UpdateIdentities: Adding a new identity" identity=16777219 labels="[cidr:0.0.0.0/0 cidr:0.0.0.0/1 cidr:0.0.0.0/2 cidr:0.0.0.0/3 cidr:0.0.0.0/4 cidr:10.0.0.0/7 cidr:10.0.0.0/8 cidr:10.0.0.0/9 cidr:10.100.0.0/14 cidr:10.102.0.0/15 cidr:10.102.0.0/16 cidr:10.102.0.0/17 cidr:10.102.0.0/18 cidr:10.102.0.0/19 cidr:10.102.0.0/20 cidr:10.102.0.0/21 cidr:10.102.0.0/22 cidr:10.102.2.0/23 cidr:10.102.3.0/24 cidr:10.102.3.0/25 cidr:10.102.3.0/26 cidr:10.102.3.0/27 cidr:10.102.3.16/28 cidr:10.102.3.24/29 cidr:10.102.3.28/30 cidr:10.102.3.28/31 cidr:10.102.3.28/32 cidr:10.64.0.0/10 cidr:10.96.0.0/11 cidr:10.96.0.0/12 cidr:10.96.0.0/13 cidr:8.0.0.0/5 cidr:8.0.0.0/6 reserved:world]" subsys=policy
2023-03-21T12:38:01.382198643Z level=debug msg="UpdateIdentities: Adding a new identity" identity=16777220 labels="[cidr:0.0.0.0/0 cidr:0.0.0.0/1 cidr:0.0.0.0/2 cidr:0.0.0.0/3 cidr:0.0.0.0/4 cidr:10.0.0.0/7 cidr:10.0.0.0/8 cidr:10.0.0.0/9 cidr:10.100.0.0/14 cidr:10.102.0.0/15 cidr:10.102.0.0/16 cidr:10.102.0.0/17 cidr:10.102.0.0/18 cidr:10.102.0.0/19 cidr:10.102.0.0/20 cidr:10.102.0.0/21 cidr:10.102.0.0/22 cidr:10.102.2.0/23 cidr:10.102.3.0/24 cidr:10.102.3.0/25 cidr:10.102.3.0/26 cidr:10.102.3.32/27 cidr:10.102.3.32/28 cidr:10.102.3.40/29 cidr:10.102.3.40/30 cidr:10.102.3.42/31 cidr:10.102.3.42/32 cidr:10.64.0.0/10 cidr:10.96.0.0/11 cidr:10.96.0.0/12 cidr:10.96.0.0/13 cidr:8.0.0.0/5 cidr:8.0.0.0/6 reserved:world]" subsys=policy
2023-03-21T12:38:01.388316588Z level=debug msg="Upserting IP into ipcache layer" identity="{16777220 generated false}" ipAddr=10.102.3.42/32 key=0 subsys=ipcache
2023-03-21T12:38:01.388350159Z level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777220 generated false}" ipAddr="{10.102.3.42 ffffffff}" modification=Upsert subsys=datapath-ipcache
2023-03-21T12:38:01.388976990Z level=debug msg="Upserting IP into ipcache layer" identity="{16777219 generated false}" ipAddr=10.102.3.28/32 key=0 subsys=ipcache
2023-03-21T12:38:01.388994991Z level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777219 generated false}" ipAddr="{10.102.3.28 ffffffff}" modification=Upsert subsys=datapath-ipcache
2023-03-21T12:38:01.389687974Z level=debug msg="Restoring backend" backendID=3 backendPreferred=false backendState=0 l3n4Addr="10.102.3.28:443" subsys=service
2023-03-21T12:38:01.389712834Z level=debug msg="Restoring backend" backendID=4 backendPreferred=false backendState=0 l3n4Addr="10.102.3.42:443" subsys=service
2023-03-21T12:38:02.522004786Z level=debug msg="Kubernetes service definition changed" action=service-updated endpoints="10.102.3.28:443/TCP,10.102.3.42:443/TCP" k8sNamespace=default k8sSvcName=kubernetes old-service=nil service="frontends:[172.16.0.1]/ports=[https]/selector=map[]" subsys=k8s-watcher
2023-03-21T12:38:02.522043487Z level=debug msg="Upserting service" backends="[10.102.3.28:443 10.102.3.42:443]" l7LBFrontendPorts="[]" l7LBProxyPort=0 loadBalancerSourceRanges="[]" serviceIP="{172.16.0.1 {TCP 443} 0}" serviceName=kubernetes serviceNamespace=default sessionAffinity=false sessionAffinityTimeout=0 subsys=service svcExtTrafficPolicy=Cluster svcHealthCheckNodePort=0 svcIntTrafficPolicy=Cluster svcType=ClusterIP
2023-03-21T12:38:02.522087977Z level=debug msg="Acquired service ID" backends="[10.102.3.28:443 10.102.3.42:443]" l7LBFrontendPorts="[]" l7LBProxyPort=0 loadBalancerSourceRanges="[]" serviceID=4 serviceIP="{172.16.0.1 {TCP 443} 0}" serviceName=kubernetes serviceNamespace=default sessionAffinity=false sessionAffinityTimeout=0 subsys=service svcExtTrafficPolicy=Cluster svcHealthCheckNodePort=0 svcIntTrafficPolicy=Cluster svcType=ClusterIP
2023-03-21T12:38:02.537470738Z level=info msg="Policy Add Request" ciliumNetworkPolicy="[&{EndpointSelector:{\"matchLabels\":{\"any:app\":\"datadog-cluster-agent\",\"k8s:io.kubernetes.pod.namespace\":\"datadog\"}} NodeSelector:{} Ingress:[] IngressDeny:[] Egress:[{EgressCommonRule:{ToEndpoints:[] ToRequires:[] ToCIDR:[169.254.169.254/32] ToCIDRSet:[] ToEntities:[] ToServices:[] ToGroups:[] aggregatedSelectors:[{LabelSelector:0xc00051b7e0 requirements:0xc0016edea8 cachedLabelSelectorString:&LabelSelector{MatchLabels:map[string]string{cidr.169.254.169.254/32: ,},MatchExpressions:[]LabelSelectorRequirement{},}}]} ToPorts:[{Ports:[{Port:80 Protocol:TCP}] TerminatingTLS:<nil> OriginatingTLS:<nil> ServerNames:[] Listener:<nil> Rules:<nil>}] ToFQDNs:[] ICMPs:[] Auth:<nil>}] EgressDeny:[] Labels:[k8s:io.cilium.k8s.policy.derived-from=CiliumNetworkPolicy k8s:io.cilium.k8s.policy.name=datadog-cluster-agent k8s:io.cilium.k8s.policy.namespace=datadog k8s:io.cilium.k8s.policy.uid=8c470cd3-6f44-493f-9e7b-0587a55a859d] Description:Egress to metadata server},\n&{EndpointSelector:{\"matchLabels\":{\"any:app\":\"datadog-cluster-agent\",\"k8s:io.kubernetes.pod.namespace\":\"datadog\"}} NodeSelector:{} Ingress:[] IngressDeny:[] Egress:[{EgressCommonRule:{ToEndpoints:[{\"matchLabels\":{\"k8s:io.kubernetes.pod.namespace\":\"kube-system\",\"k8s:k8s-app\":\"kube-dns\"}}] ToRequires:[] ToCIDR: ToCIDRSet:[] ToEntities:[] ToServices:[] ToGroups:[] aggregatedSelectors:[]} ToPorts:[{Ports:[{Port:53 Protocol:ANY}] TerminatingTLS:<nil> OriginatingTLS:<nil> ServerNames:[] Listener:<nil> Rules:0xc0001e53b0}] ToFQDNs:[] ICMPs:[] Auth:<nil>}] EgressDeny:[] Labels:[k8s:io.cilium.k8s.policy.derived-from=CiliumNetworkPolicy k8s:io.cilium.k8s.policy.name=datadog-cluster-agent k8s:io.cilium.k8s.policy.namespace=datadog k8s:io.cilium.k8s.policy.uid=8c470cd3-6f44-493f-9e7b-0587a55a859d] Description:Egress to DNS},\n&{EndpointSelector:{\"matchLabels\":{\"any:app\":\"datadog-cluster-agent\",\"k8s:io.kubernetes.pod.namespace\":\"datadog\"}} NodeSelector:{} Ingress:[] IngressDeny:[] Egress:[{EgressCommonRule:{ToEndpoints:[] ToRequires:[] ToCIDR: ToCIDRSet:[] ToEntities:[] ToServices:[] ToGroups:[] aggregatedSelectors:[{LabelSelector:0xc00051b9e0 requirements:0xc00050cdb0 cachedLabelSelectorString:&LabelSelector{MatchLabels:map[string]string{reserved.none: ,},MatchExpressions:[]LabelSelectorRequirement{},}}]} ToPorts:[{Ports:[{Port:443 Protocol:TCP}] TerminatingTLS:<nil> OriginatingTLS:<nil> ServerNames:[] Listener:<nil> Rules:<nil>}] ToFQDNs:[{MatchName:app.datadoghq.eu MatchPattern:} {MatchName: MatchPattern:*-app.agent.datadoghq.eu} {MatchName:orchestrator.datadoghq.eu MatchPattern:}] ICMPs:[] Auth:<nil>}] EgressDeny:[] Labels:[k8s:io.cilium.k8s.policy.derived-from=CiliumNetworkPolicy k8s:io.cilium.k8s.policy.name=datadog-cluster-agent k8s:io.cilium.k8s.policy.namespace=datadog k8s:io.cilium.k8s.policy.uid=8c470cd3-6f44-493f-9e7b-0587a55a859d] Description:Egress to Datadog intake},\n&{EndpointSelector:{\"matchLabels\":{\"any:app\":\"datadog-cluster-agent\",\"k8s:io.kubernetes.pod.namespace\":\"datadog\"}} NodeSelector:{} Ingress:[] IngressDeny:[] Egress:[{EgressCommonRule:{ToEndpoints:[] ToRequires:[] ToCIDR: ToCIDRSet:[10.102.3.42/32 10.102.3.28/32] ToEntities:[] ToServices:[{K8sServiceSelector:<nil> K8sService:0xc00051aee0}] ToGroups:[] aggregatedSelectors:[{LabelSelector:0xc00051b8a0 requirements:0xc0016edf08 cachedLabelSelectorString:&LabelSelector{MatchLabels:map[string]string{cidr.10.102.3.42/32: ,},MatchExpressions:[]LabelSelectorRequirement{},}} {LabelSelector:0xc00051b900 requirements:0xc0016edf68 cachedLabelSelectorString:&LabelSelector{MatchLabels:map[string]string{cidr.10.102.3.28/32: ,},MatchExpressions:[]LabelSelectorRequirement{},}}]} ToPorts:[] ToFQDNs:[] ICMPs:[] Auth:<nil>} {EgressCommonRule:{ToEndpoints:[] ToRequires:[] ToCIDR: ToCIDRSet:[] ToEntities:[host remote-node] ToServices:[] ToGroups:[] aggregatedSelectors:[{LabelSelector:0xc00051b940 requirements:0xc00050cbd0 cachedLabelSelectorString:&LabelSelector{MatchLabels:map[string]string{reserved.host: ,},MatchExpressions:[]LabelSelectorRequirement{},}} {LabelSelector:0xc00051b9a0 requirements:0xc00050ccf0 cachedLabelSelectorString:&LabelSelector{MatchLabels:map[string]string{reserved.remote-node: ,},MatchExpressions:[]LabelSelectorRequirement{},}}]} ToPorts:[{Ports:[{Port:443 Protocol:TCP}] TerminatingTLS:<nil> OriginatingTLS:<nil> ServerNames:[] Listener:<nil> Rules:<nil>}] ToFQDNs:[] ICMPs:[] Auth:<nil>}] EgressDeny:[] Labels:[k8s:io.cilium.k8s.policy.derived-from=CiliumNetworkPolicy k8s:io.cilium.k8s.policy.name=datadog-cluster-agent k8s:io.cilium.k8s.policy.namespace=datadog k8s:io.cilium.k8s.policy.uid=8c470cd3-6f44-493f-9e7b-0587a55a859d] Description:Egress to Kube API server},\n&{EndpointSelector:{\"matchLabels\":{\"any:app\":\"datadog-cluster-agent\",\"k8s:io.kubernetes.pod.namespace\":\"datadog\"}} NodeSelector:{} Ingress:[{IngressCommonRule:{FromEndpoints:[{\"matchLabels\":{\"any:app\":\"datadog\",\"k8s:io.kubernetes.pod.namespace\":\"datadog\"}}] FromRequires:[] FromCIDR: FromCIDRSet:[] FromEntities:[] aggregatedSelectors:[]} ToPorts:[{Ports:[{Port:5005 Protocol:TCP}] TerminatingTLS:<nil> OriginatingTLS:<nil> ServerNames:[] Listener:<nil> Rules:<nil>}] ICMPs:[] Auth:<nil>}] IngressDeny:[] Egress:[] EgressDeny:[] Labels:[k8s:io.cilium.k8s.policy.derived-from=CiliumNetworkPolicy k8s:io.cilium.k8s.policy.name=datadog-cluster-agent k8s:io.cilium.k8s.policy.namespace=datadog k8s:io.cilium.k8s.policy.uid=8c470cd3-6f44-493f-9e7b-0587a55a859d] Description:Ingress from cluster agent},\n&{EndpointSelector:{\"matchLabels\":{\"any:app\":\"datadog-cluster-agent\",\"k8s:io.kubernetes.pod.namespace\":\"datadog\"}} NodeSelector:{} Ingress:[] IngressDeny:[] Egress:[{EgressCommonRule:{ToEndpoints:[{\"matchLabels\":{\"any:app\":\"datadog\",\"k8s:io.kubernetes.pod.namespace\":\"datadog\"}}] ToRequires:[] ToCIDR: ToCIDRSet:[] ToEntities:[] ToServices:[] ToGroups:[] aggregatedSelectors:[]} ToPorts:[{Ports:[{Port:5005 Protocol:TCP}] TerminatingTLS:<nil> OriginatingTLS:<nil> ServerNames:[] Listener:<nil> Rules:<nil>}] ToFQDNs:[] ICMPs:[] Auth:<nil>}] EgressDeny:[] Labels:[k8s:io.cilium.k8s.policy.derived-from=CiliumNetworkPolicy k8s:io.cilium.k8s.policy.name=datadog-cluster-agent k8s:io.cilium.k8s.policy.namespace=datadog k8s:io.cilium.k8s.policy.uid=8c470cd3-6f44-493f-9e7b-0587a55a859d] Description:Egress to cluster agent},\n&{EndpointSelector:{\"matchLabels\":{\"any:app\":\"datadog-cluster-agent\",\"k8s:io.kubernetes.pod.namespace\":\"datadog\"}} NodeSelector:{} Ingress:[{IngressCommonRule:{FromEndpoints:[{\"matchLabels\":{\"any:app\":\"datadog\",\"k8s:io.kubernetes.pod.namespace\":\"datadog\"}}] FromRequires:[] FromCIDR: FromCIDRSet:[] FromEntities:[] aggregatedSelectors:[]} ToPorts:[{Ports:[{Port:5000 Protocol:TCP} {Port:5005 Protocol:TCP}] TerminatingTLS:<nil> OriginatingTLS:<nil> ServerNames:[] Listener:<nil> Rules:<nil>}] ICMPs:[] Auth:<nil>}] IngressDeny:[] Egress:[] EgressDeny:[] Labels:[k8s:io.cilium.k8s.policy.derived-from=CiliumNetworkPolicy k8s:io.cilium.k8s.policy.name=datadog-cluster-agent k8s:io.cilium.k8s.policy.namespace=datadog k8s:io.cilium.k8s.policy.uid=8c470cd3-6f44-493f-9e7b-0587a55a859d] Description:Ingress from agent}]" policyAddRequest=7740f4e2-df0f-4753-ab7a-2fd6ba8613fd subsys=daemon
2023-03-21T12:38:02.537482319Z level=debug msg="Policy imported via API, found CIDR prefixes..." policyAddRequest=7740f4e2-df0f-4753-ab7a-2fd6ba8613fd prefixes="[169.254.169.254/32 10.102.3.42/32 10.102.3.28/32]" subsys=daemon
2023-03-21T12:38:02.537488979Z level=debug msg="Resolving identity" identityLabels="cidr:10.102.3.42/32,reserved:kube-apiserver,reserved:world" subsys=identity-cache
2023-03-21T12:38:02.537491849Z level=debug msg="Resolving identity" identityLabels="cidr:10.102.3.28/32,reserved:kube-apiserver,reserved:world" subsys=identity-cache
2023-03-21T12:38:02.537501589Z level=debug msg="Upserting IP into ipcache layer" identity="{16777222 generated false}" ipAddr=10.102.3.42/32 key=0 subsys=ipcache
2023-03-21T12:38:02.537505019Z level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777222 generated false}" ipAddr="{10.102.3.42 ffffffff}" modification=Upsert subsys=datapath-ipcache
2023-03-21T12:38:02.537511709Z level=debug msg="Upserting IP into ipcache layer" identity="{16777223 generated false}" ipAddr=10.102.3.28/32 key=0 subsys=ipcache
2023-03-21T12:38:02.537513599Z level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777223 generated false}" ipAddr="{10.102.3.28 ffffffff}" modification=Upsert subsys=datapath-ipcache
2023-03-21T12:38:02.537518019Z level=debug msg="UpdateIdentities: Adding a new identity" identity=16777222 labels="[cidr:0.0.0.0/0 cidr:0.0.0.0/1 cidr:0.0.0.0/2 cidr:0.0.0.0/3 cidr:0.0.0.0/4 cidr:10.0.0.0/7 cidr:10.0.0.0/8 cidr:10.0.0.0/9 cidr:10.100.0.0/14 cidr:10.102.0.0/15 cidr:10.102.0.0/16 cidr:10.102.0.0/17 cidr:10.102.0.0/18 cidr:10.102.0.0/19 cidr:10.102.0.0/20 cidr:10.102.0.0/21 cidr:10.102.0.0/22 cidr:10.102.2.0/23 cidr:10.102.3.0/24 cidr:10.102.3.0/25 cidr:10.102.3.0/26 cidr:10.102.3.32/27 cidr:10.102.3.32/28 cidr:10.102.3.40/29 cidr:10.102.3.40/30 cidr:10.102.3.42/31 cidr:10.102.3.42/32 cidr:10.64.0.0/10 cidr:10.96.0.0/11 cidr:10.96.0.0/12 cidr:10.96.0.0/13 cidr:8.0.0.0/5 cidr:8.0.0.0/6 reserved:kube-apiserver reserved:world]" subsys=policy
2023-03-21T12:38:02.537519549Z level=debug msg="UpdateIdentities: Adding a new identity" identity=16777223 labels="[cidr:0.0.0.0/0 cidr:0.0.0.0/1 cidr:0.0.0.0/2 cidr:0.0.0.0/3 cidr:0.0.0.0/4 cidr:10.0.0.0/7 cidr:10.0.0.0/8 cidr:10.0.0.0/9 cidr:10.100.0.0/14 cidr:10.102.0.0/15 cidr:10.102.0.0/16 cidr:10.102.0.0/17 cidr:10.102.0.0/18 cidr:10.102.0.0/19 cidr:10.102.0.0/20 cidr:10.102.0.0/21 cidr:10.102.0.0/22 cidr:10.102.2.0/23 cidr:10.102.3.0/24 cidr:10.102.3.0/25 cidr:10.102.3.0/26 cidr:10.102.3.0/27 cidr:10.102.3.16/28 cidr:10.102.3.24/29 cidr:10.102.3.28/30 cidr:10.102.3.28/31 cidr:10.102.3.28/32 cidr:10.64.0.0/10 cidr:10.96.0.0/11 cidr:10.96.0.0/12 cidr:10.96.0.0/13 cidr:8.0.0.0/5 cidr:8.0.0.0/6 reserved:kube-apiserver reserved:world]" subsys=policy
2023-03-21T12:38:03.283568317Z level=debug msg="ToMapState: Allowed remote IDs" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{reserved.kube-apiserver: ,},MatchExpressions:[]LabelSelectorRequirement{},}" policyID="[kube-apiserver 16777222 16777223]" port=0 portName= protocol=0 subsys=policy trafficDirection=Egress
2023-03-21T12:38:03.293190788Z level=debug msg="ToMapState: Allowed remote IDs" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{reserved.kube-apiserver: ,},MatchExpressions:[]LabelSelectorRequirement{},}" policyID="[kube-apiserver 16777222 16777223]" port=0 portName= protocol=0 subsys=policy trafficDirection=Egress
2023-03-21T12:38:03.315919687Z level=debug msg="Injecting CIDR labels for prefix" cidr=10.102.3.28/32 labels="reserved:kube-apiserver" subsys=ipcache
2023-03-21T12:38:03.316049329Z level=debug msg="Resolving identity" identityLabels="cidr:10.102.3.28/32,reserved:kube-apiserver,reserved:world" subsys=identity-cache
2023-03-21T12:38:03.316098860Z level=debug msg="Injecting CIDR labels for prefix" cidr=10.102.3.42/32 labels="reserved:kube-apiserver" subsys=ipcache
2023-03-21T12:38:03.316174742Z level=debug msg="Resolving identity" identityLabels="cidr:10.102.3.42/32,reserved:kube-apiserver,reserved:world" subsys=identity-cache
2023-03-21T12:38:04.206800560Z level=debug msg="stale identity observed" identity=16777219 ipAddr=10.102.3.28 oldIdentity=16777223 subsys=hubble
2023-03-21T12:38:04.208662435Z level=debug msg="stale identity observed" identity=16777219 ipAddr=10.102.3.28 oldIdentity=16777223 subsys=hubble
2023-03-21T12:38:05.299388052Z level=debug msg="stale identity observed" identity=16777219 ipAddr=10.102.3.28 oldIdentity=16777223 subsys=hubble
2023-03-21T12:38:05.299391722Z level=debug msg="stale identity observed" identity=16777219 ipAddr=10.102.3.28 oldIdentity=16777223 subsys=hubble
2023-03-21T12:38:05.299399042Z level=debug msg="stale identity observed" identity=16777219 ipAddr=10.102.3.28 oldIdentity=16777223 subsys=hubble
2023-03-21T12:38:05.299402912Z level=debug msg="stale identity observed" identity=16777219 ipAddr=10.102.3.28 oldIdentity=16777223 subsys=hubble
2023-03-21T12:38:05.304862465Z level=debug msg="stale identity observed" identity=16777219 ipAddr=10.102.3.28 oldIdentity=16777223 subsys=hubble
2023-03-21T12:38:05.304876056Z level=debug msg="stale identity observed" identity=16777219 ipAddr=10.102.3.28 oldIdentity=16777223 subsys=hubble
2023-03-21T12:38:05.341891184Z level=debug msg="stale identity observed" identity=16777219 ipAddr=10.102.3.28 oldIdentity=16777223 subsys=hubble
2023-03-21T12:38:05.341915855Z level=debug msg="stale identity observed" identity=16777219 ipAddr=10.102.3.28 oldIdentity=16777223 subsys=hubble
2023-03-21T12:38:06.284295199Z level=debug msg="ToMapState: Allowed remote IDs" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{reserved.kube-apiserver: ,},MatchExpressions:[]LabelSelectorRequirement{},}" policyID="[kube-apiserver 16777222 16777223]" port=0 portName= protocol=0 subsys=policy trafficDirection=Egress
2023-03-21T12:38:06.298362225Z level=debug msg="ToMapState: Allowed remote IDs" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{reserved.kube-apiserver: ,},MatchExpressions:[]LabelSelectorRequirement{},}" policyID="[kube-apiserver 16777222 16777223]" port=0 portName= protocol=0 subsys=policy trafficDirection=Egress
2023-03-21T12:38:06.356223898Z level=debug msg="ToMapState: Allowed remote IDs" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{reserved.kube-apiserver: ,},MatchExpressions:[]LabelSelectorRequirement{},}" policyID="[kube-apiserver 16777222 16777223]" port=0 portName= protocol=0 subsys=policy trafficDirection=Egress
2023-03-21T12:38:06.371419664Z level=debug msg="ToMapState: Allowed remote IDs" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{reserved.kube-apiserver: ,},MatchExpressions:[]LabelSelectorRequirement{},}" policyID="[kube-apiserver 16777222 16777223]" port=0 portName= protocol=0 subsys=policy trafficDirection=Egress
2023-03-21T12:38:06.421476170Z level=debug msg="stale identity observed" identity=16777219 ipAddr=10.102.3.28 oldIdentity=16777223 subsys=hubble
2023-03-21T12:38:06.422089272Z level=debug msg="stale identity observed" identity=16777219 ipAddr=10.102.3.28 oldIdentity=16777223 subsys=hubble
2023-03-21T12:38:06.431103502Z level=debug msg="ToMapState: Allowed remote IDs" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{reserved.kube-apiserver: ,},MatchExpressions:[]LabelSelectorRequirement{},}" policyID="[kube-apiserver 16777222 16777223]" port=0 portName= protocol=0 subsys=policy trafficDirection=Egress
2023-03-21T12:38:06.434927684Z level=debug msg="stale identity observed" identity=16777219 ipAddr=10.102.3.28 oldIdentity=16777223 subsys=hubble
2023-03-21T12:38:06.451099640Z level=debug msg="ToMapState: Allowed remote IDs" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{reserved.kube-apiserver: ,},MatchExpressions:[]LabelSelectorRequirement{},}" policyID="[kube-apiserver 16777222 16777223]" port=0 portName= protocol=0 subsys=policy trafficDirection=Egress
2023-03-21T12:38:06.558145501Z level=debug msg="ToMapState: Allowed remote IDs" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{reserved.kube-apiserver: ,},MatchExpressions:[]LabelSelectorRequirement{},}" policyID="[kube-apiserver 16777222 16777223]" port=0 portName= protocol=0 subsys=policy trafficDirection=Egress
2023-03-21T12:38:06.614780770Z level=debug msg="stale identity observed" identity=16777219 ipAddr=10.102.3.28 oldIdentity=16777223 subsys=hubble
2023-03-21T12:38:06.635244646Z level=debug msg="ToMapState: Allowed remote IDs" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{reserved.kube-apiserver: ,},MatchExpressions:[]LabelSelectorRequirement{},}" policyID="[kube-apiserver 16777222 16777223]" port=0 portName= protocol=0 subsys=policy trafficDirection=Egress
2023-03-21T12:38:06.708572662Z level=debug msg="ToMapState: Allowed remote IDs" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{reserved.kube-apiserver: ,},MatchExpressions:[]LabelSelectorRequirement{},}" policyID="[kube-apiserver 16777222 16777223]" port=0 portName= protocol=0 subsys=policy trafficDirection=Egress
2023-03-21T12:38:06.780494170Z level=debug msg="ToMapState: Allowed remote IDs" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{reserved.kube-apiserver: ,},MatchExpressions:[]LabelSelectorRequirement{},}" policyID="[kube-apiserver 16777222 16777223]" port=0 portName= protocol=0 subsys=policy trafficDirection=Egress
2023-03-21T12:38:06.887803976Z level=debug msg="ToMapState: Allowed remote IDs" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{reserved.kube-apiserver: ,},MatchExpressions:[]LabelSelectorRequirement{},}" policyID="[kube-apiserver 16777222 16777223]" port=0 portName= protocol=0 subsys=policy trafficDirection=Egress
2023-03-21T12:38:06.914983419Z level=debug msg="stale identity observed" identity=16777219 ipAddr=10.102.3.28 oldIdentity=16777223 subsys=hubble
2023-03-21T12:38:06.960803665Z level=debug msg="ToMapState: Allowed remote IDs" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{reserved.kube-apiserver: ,},MatchExpressions:[]LabelSelectorRequirement{},}" policyID="[kube-apiserver 16777222 16777223]" port=0 portName= protocol=0 subsys=policy trafficDirection=Egress
2023-03-21T12:48:07.411157839Z level=debug msg="UpdateIdentities: Deleting identity" identity=16777219 labels="[cidr:0.0.0.0/0 cidr:0.0.0.0/1 cidr:0.0.0.0/2 cidr:0.0.0.0/3 cidr:0.0.0.0/4 cidr:10.0.0.0/7 cidr:10.0.0.0/8 cidr:10.0.0.0/9 cidr:10.100.0.0/14 cidr:10.102.0.0/15 cidr:10.102.0.0/16 cidr:10.102.0.0/17 cidr:10.102.0.0/18 cidr:10.102.0.0/19 cidr:10.102.0.0/20 cidr:10.102.0.0/21 cidr:10.102.0.0/22 cidr:10.102.2.0/23 cidr:10.102.3.0/24 cidr:10.102.3.0/25 cidr:10.102.3.0/26 cidr:10.102.3.0/27 cidr:10.102.3.16/28 cidr:10.102.3.24/29 cidr:10.102.3.28/30 cidr:10.102.3.28/31 cidr:10.102.3.28/32 cidr:10.64.0.0/10 cidr:10.96.0.0/11 cidr:10.96.0.0/12 cidr:10.96.0.0/13 cidr:8.0.0.0/5 cidr:8.0.0.0/6 reserved:world]" subsys=policy
2023-03-21T12:48:07.411168359Z level=debug msg="UpdateIdentities: Deleting identity" identity=16777220 labels="[cidr:0.0.0.0/0 cidr:0.0.0.0/1 cidr:0.0.0.0/2 cidr:0.0.0.0/3 cidr:0.0.0.0/4 cidr:10.0.0.0/7 cidr:10.0.0.0/8 cidr:10.0.0.0/9 cidr:10.100.0.0/14 cidr:10.102.0.0/15 cidr:10.102.0.0/16 cidr:10.102.0.0/17 cidr:10.102.0.0/18 cidr:10.102.0.0/19 cidr:10.102.0.0/20 cidr:10.102.0.0/21 cidr:10.102.0.0/22 cidr:10.102.2.0/23 cidr:10.102.3.0/24 cidr:10.102.3.0/25 cidr:10.102.3.0/26 cidr:10.102.3.32/27 cidr:10.102.3.32/28 cidr:10.102.3.40/29 cidr:10.102.3.40/30 cidr:10.102.3.42/31 cidr:10.102.3.42/32 cidr:10.64.0.0/10 cidr:10.96.0.0/11 cidr:10.96.0.0/12 cidr:10.96.0.0/13 cidr:8.0.0.0/5 cidr:8.0.0.0/6 reserved:world]" subsys=policy
2023-03-21T12:48:07.411172319Z level=debug msg="identities selected by L4Filter updated" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{},}" policyID.Added="[]" policyID.Deleted="[16777219 16777220]" subsys=policy
2023-03-21T12:48:07.411175119Z level=debug msg="identities selected by L4Filter updated" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{},}" policyID.Added="[]" policyID.Deleted="[16777219 16777220]" subsys=policy
2023-03-21T12:48:07.411177510Z level=debug msg="identities selected by L4Filter updated" EndpointSelector="&LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{},}" policyID.Added="[]" policyID.Deleted="[16777219 16777220]" subsys=policy
2023-03-21T12:48:07.411224640Z level=debug msg="Deleting IP from ipcache layer" ipAddr=10.102.3.28/32 subsys=ipcache
2023-03-21T12:48:07.411231711Z level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777223 generated false}" ipAddr="{10.102.3.28 ffffffff}" modification=Delete subsys=datapath-ipcache
2023-03-21T12:48:07.411240471Z level=debug msg="Deleting IP from ipcache layer" ipAddr=10.102.3.42/32 subsys=ipcache
2023-03-21T12:48:07.411242771Z level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777222 generated false}" ipAddr="{10.102.3.42 ffffffff}" modification=Delete subsys=datapath-ipcache

Anything else?

When it's in a "broken" state and I restart cilium agent, it will work again. After this restart it will also keep working after the 10 minutes. If cilium agent is restarted again after that, it will break after 10 minutes again.

I was unable to reproduce the issue with Cilium v1.12.7. I compared the debug logging between v1.12.7 and v.13.1 for what happened after 10 minutes. The primary differences are as follows:

v1.12.7:

2023-03-21T12:30:05.711083450Z level=debug msg="Deleting CIDR shadowed by endpoint IP" ipAddr=10.102.3.28/32 subsys=ipcache
2023-03-21T12:30:05.711091211Z level=debug msg="Deleting IP from ipcache layer" ipAddr=10.102.3.28/32 subsys=ipcache
2023-03-21T12:30:05.711129211Z level=debug msg="Deleting CIDR shadowed by endpoint IP" ipAddr=10.102.3.42/32 subsys=ipcache
2023-03-21T12:30:05.711188562Z level=debug msg="Deleting IP from ipcache layer" ipAddr=10.102.3.42/32 subsys=ipcache

v1.13.1:

2023-03-21T12:48:07.411224640Z level=debug msg="Deleting IP from ipcache layer" ipAddr=10.102.3.28/32 subsys=ipcache
2023-03-21T12:48:07.411231711Z level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777223 generated false}" ipAddr="{10.102.3.28 ffffffff}" modification=Delete subsys=datapath-ipcache
2023-03-21T12:48:07.411240471Z level=debug msg="Deleting IP from ipcache layer" ipAddr=10.102.3.42/32 subsys=ipcache
2023-03-21T12:48:07.411242771Z level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777222 generated false}" ipAddr="{10.102.3.42 ffffffff}" modification=Delete subsys=datapath-ipcache

Code of Conduct

  • I agree to follow this project's Code of Conduct
@svanschie svanschie added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Mar 21, 2023
@joestringer
Copy link
Member

Hi @svanschie , thanks for the report.

I'm wondering, in the case that doesn't work, does Cilium successfully sync with kube-apiserver on startup? Also, do you see any other matches for the kube-apiserver IP addresses in the logs beyond the output that you've shared?

The intent behind the grace period logic is to restore information about IPs (such as kube-apiserver IPs) from the previous cilium-agent run, and give enough time for cilium-agent to sync with the kube-apiserver to pick up the new kube-apiserver IPs. If that synchronization step doesn't occur, I could see how these IPs end up getting deleted from the ipcache later on.

@youngnick youngnick added the sig/agent Cilium agent related. label Mar 23, 2023
@svanschie
Copy link
Author

Hi @joestringer,

I'm not entirely sure what to look for in the logs in regard to syncing with kube-apiserver, hopefully this helps (there's lots of logs in between these entries but these seemed most relevant):

2023-03-21T12:38:01.256878256Z level=info msg="Establishing connection to apiserver" host="https://172.16.0.1:443" subsys=k8s-client
2023-03-21T12:38:01.276496647Z level=info msg="Connected to apiserver" subsys=k8s-client
2023-03-21T12:38:02.398189727Z level=info msg="All Cilium CRDs have been found and are available" subsys=k8s
2023-03-21T12:38:02.618491487Z level=debug msg="cache synced" kubernetesResource="core/v1::Namespace" subsys=k8s
2023-03-21T12:38:02.618619330Z level=debug msg="cache synced" kubernetesResource="core/v1::Service" subsys=k8s
2023-03-21T12:38:02.618628430Z level=debug msg="cache synced" kubernetesResource="core/v1::Pods" subsys=k8s
2023-03-21T12:38:02.618858444Z level=debug msg="cache synced" kubernetesResource="discovery/v1::EndpointSlice" subsys=k8s

The logs I provided include all matches for the API server IPs, this is based on logs for the first 20 minutes after a restart of the Cilium agent.

@joestringer joestringer added needs/triage This issue requires triaging to establish severity and next steps. and removed needs/triage This issue requires triaging to establish severity and next steps. labels Mar 27, 2023
@aanm aanm added the kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. label Mar 27, 2023
@voltagebots
Copy link

Hey guys, i'm having exactly same issue... everything was working great.. then suddenly all endpoints that requires access to the kube-apiserver starts getting dropped... is there a work around?

@gandro gandro assigned gandro and squeed and unassigned gandro Apr 4, 2023
@christarazi christarazi added sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. priority/high This is considered vital to an upcoming release. labels Apr 4, 2023
@squeed
Copy link
Contributor

squeed commented Apr 5, 2023

@voltagebots is this also on EKS?

@giovannirco
Copy link

giovannirco commented Apr 5, 2023

Working with @voltagebots here, yes eks 1.23 in our case. what other info can I grab for you. I can confirm that rolling back to 1.12.8 I dont see this issue anymore

@squeed
Copy link
Contributor

squeed commented Apr 5, 2023

I tried to naively reproduce this on eks with images from master, no luck; even after 30 minutes, the identities were still there.

I'll try and reproduce with 1.13.1 now.

@squeed squeed added the release-blocker/1.13 This issue will prevent the release of the next version of Cilium. label Apr 5, 2023
@purwandi
Copy link

purwandi commented Apr 5, 2023

Hello, i have this problem too. I have egress network policy to kube-apiserver in 1.13.0 stil ok, but after upgrading to 1.13.1 kube-apiserver ip address disapear from bpf after 10-20minutes. Then i rolled back to 1.13.0 i dont see the issue anymore. Upgrade via helm chart btw,

@jscaltreto
Copy link

jscaltreto commented Apr 5, 2023

I'm seeing a very similar issue. I rolled back to 1.13.0, the problem persists.

# cilium version
Client: 1.13.0 c9723a8d 2023-02-15T14:18:31+01:00 go version go1.19.6 linux/amd64
Daemon: 1.13.0 c9723a8d 2023-02-15T14:18:31+01:00 go version go1.19.6 linux/amd64

In my case I see drops such as (10.21.96.226 is the IP of the Kube API server):

Apr  5 19:04:03.227: logging/fluent-bit-vkcnf:34478 (ID:1993) <> 10.21.96.226:443 (ID:16777333) Policy denied DROPPED (TCP Flags: SYN)

The identity does not show the kube-apiserver label:

# cilium identity get 16777333
ID         LABELS
16777333   cidr:10.21.96.0/24
           reserved:world

If I list Identities, I do see an identity with the correct IP and labels:

# cilium identity list
...
16777368   cidr:10.21.96.226/32
           reserved:kube-apiserver
           reserved:world
...

But the ipcache has that IP mapped to the 16777333 identity:

# cilium bpf ipcache get 10.21.96.226
10.21.96.226 maps to identity identity=16777333 encryptkey=0 tunnelendpoint=0.0.0.0 nodeid=0

As with OP, restarting Cilium fixes it for a while, but it comes back after about 20 minutes.

I was seeing the exact same behavior with 1.13.1, and rolled back to 1.13.0 but it persists. I will rollback to 1.12.8 to see if that resolves it, but (assuming this is the same issue) it does appear to be present in both 1.13.1 and 1.13.0.

Edited to add: This is an EKS cluster

Update: Was unable to reproduce in 1.12.8.

@squeed
Copy link
Contributor

squeed commented Apr 6, 2023

I was, again, not able to reproduce this by merely creating a cluster. I find it curious that in all of these cases, there is some kind of duplication around identities. Specifically, the IPs of the apiservers show up in multiple identities

Out of curiosity, are you doing anything explicit that would cause this to happen? Manually referencing those IPs in a CiliumNetworkPolicy?

Anything else to help me reproduce?

@voltagebots
Copy link

so in my case, the drops weren't immediate.. like we have had cilium running in our env - IIRC v1.13.0 for atleast a month.. and as we kept deploying more rules to cover our namespaces ... we suddenly saw the drops (precisely on the 30th)... and after 2/3 days then we started seeing these drops in our other envs... not sure exactly what changed...even whitlisting the kubeapiserver ips seem not to have worked... However, we had to rollback to 12.8 to restore the functionality of our envs...

for now, we intend to deploy the 13.0/13.1 on our test env and try to reproduce the issue..will get back to you once done...

@jscaltreto
Copy link

In my case there are other policies in the cluster which reference the same CIDRs (/24) that the API server is in. These are the private subnets of the VPC.

I don't know if this would impact anything, but worth noting that we're using cluster-pool ipam.

@squeed
Copy link
Contributor

squeed commented Apr 6, 2023

One observation: restarting the cilium agent allocates a new numerical identity for the apiserver IPs. This is surprising to me.

@squeed
Copy link
Contributor

squeed commented Apr 6, 2023

So, garbage collection relies on a specific identity not existing to keep from deleting the CIDR identity.

This strikes me as incredibly fragile. Am I missing something, @christarazi?

@christarazi
Copy link
Member

christarazi commented Apr 6, 2023

I think the logic is correct there, but it assumes that the identities are associated with the expected labels. The labels association with the identities seems to be the mismatch here.

Let's say you have a policy which allows to 1.1.1.1/32. This will create a CIDR identity for 1.1.1.1. CIDR identities are always associated with the following labels: [cidr-labels (i.e. cidr:1.1.1.1/32, ...) + reserved:world].

So in the end, you have 1.1.1.1 -> [cidr-labels (i.e. cidr:1.1.1.1/32, ...) + reserved:world]. Let's say that's identity 99.

When the kube-apiserver logic kicks it and notices that 1.1.1.1 is an IP of a kube-apiserver, it associates reserved:kube-apiserver to that IP. So what should happen is 1.1.1.1 -> [cidr-labels (i.e. cidr:1.1.1.1/32, ...) + reserved:world + reserved:kube-apiserver]. Now, we have identity, let's say 100.

For the reference counting to be done properly, 99 identity should have been deallocated. It seems like what's happening in this bug is that we are not only deallocating the old identity (99), but also the new one somehow. My suspicion is something to do with how the CIDR restoration logic (i.e. "Reallocated restored CIDR identity" log msg) interacts with the ipcache / identity allocation / labels.

@squeed
Copy link
Contributor

squeed commented Apr 6, 2023

So, on startup, the basic flow is

  1. Dump the ipcache
  2. Create identities for all IPs found in the ipcache without reserved:kube-apiserver label
  3. Process apiserver endpoints, inserting new identities and deleting the old ones
  4. Delete any dumped cidrs

Step 4 doesn't delete the kube-apsierver identities because it only deletes identities that have cidr labels. Specifically, it looks for identities with a complete set of cidr labels and no others.

It seems like, when this bug occurs, step 3 is somehow broken in that it creates new identities for the apiserver, but doesn't delete the restored ones. The question: why?

@squeed
Copy link
Contributor

squeed commented Apr 11, 2023

My initial idea was to track restored IDs, rather than restored CIDRs, and release those upon grace-period cleanup. However, that's not going to work right now.

This is because ReleaseCIDRIdentitiesByID actually does a ID->CIDR lookup, then enqueues a list of CIDRs to clean up.

So, either we need to change how the CIDR release mechanism works, or think of a new approach.

@n0rad
Copy link

n0rad commented Apr 17, 2023

I think this problem, that affect me too, may be showing another one.

For me when it occurs, flux cannot reconcile anymore because of a problem on all services.

✗ Kustomization reconciliation failed: Service/network-cluster/nginx-ingress-controller dry-run failed, error: failed to create manager for existing fields: failed to convert new object (network-cluster/nginx-ingress-controller; /v1, Kind=Service) to smd typed: .status.conditions: duplicate entries for key [type="io.cilium/lb-ipam-request-satisfied"]

I have to remove all ippool to get around this issue bug and remove all services to get around the one I'm talking about.

@squeed
Copy link
Contributor

squeed commented Apr 18, 2023

@n0rad that seems - at first glance - like a different issue. Would you mind opening a separate bug?

@squeed
Copy link
Contributor

squeed commented Apr 18, 2023

@schaffino @dctrwatson @svanschie I have a proposed fix for this. Would you be willing to try a test image and see if this resolves this issue for you?

If so, please reach out to me on the Cilium slack, or via email. Thanks!

squeed added a commit to squeed/cilium that referenced this issue Apr 18, 2023
InjectLabels is one of the functions responsible for synchronizing the
ipcache metadata store and ip store. As such, it shouldn't shortcut when
the numeric identity is the same, but the source is different; this
means that an update to the ipcache isn't complete.

This can happen, for example, when there are two identities for the same
IP, which can happen on daemon restart whenever a CIDR is referenced.

Fixes: cilium#24502
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
@schaffino
Copy link

@squeed
This looks promising. I've tested just manually editing the agent deployment at runtime in a couple dev clusters to use the quay.io/cilium/cilium-dev:v1.13.2-with-24875 image. That triggers the restart of all agents and would usually cause us problems with several services after the 10 minute window. I'll give it a test with a few clusters running as an upgrade from 1.13.1 tomorrow and see if any problems arise.

@dctrwatson
Copy link
Contributor

My tests are still failing with the same issue with quay.io/cilium/cilium-dev:v1.13.2-with-24875 (policy denial)

@joestringer
Copy link
Member

@dctrwatson looking back over your initial report, I saw this comment:

When a node is having this issue, it doesn't happen until a network policy with unrelated egress rules for the pod is deleted. Then the pod can't make connections the apiserver.

This comment seems to indicate that policy add/delete plays a role in the issue in your case.

My tests are still failing with the same issue with quay.io/cilium/cilium-dev:v1.13.2-with-24875 (policy denial)

Do you have a writeup somewhere around exactly how this test works? Do you think we could reproduce it by just following some instructions or running a script?

@dctrwatson
Copy link
Contributor

dctrwatson commented Apr 19, 2023

This comment seems to indicate that policy add/delete plays a role in the issue in your case.

I can split off into a separate issue if that would help.

Do you have a writeup somewhere around exactly how this test works? Do you think we could reproduce it by just following some instructions or running a script?

It's an internal e2e test suite. Gist is, it's adding/removing a handful of "services" (usually, 6 pods each) in parallel each with their own network policy with ingress+egress rules to allow only access to pods in the same service. There is a global CCNP that allows egress to 10/8, cluster, and world though. When our test suite cleans up a service, it will delete the service specific network policy before the pods are terminated, and that's when some of the pods have an issue accessing 1 or both of the apiservers. It's usually only a couple nodes (out of ~6) that it happens on, so that's why not all pods are affected.

squeed added a commit to squeed/cilium that referenced this issue Apr 20, 2023
InjectLabels is one of the functions responsible for synchronizing the
ipcache metadata store and ip store. As such, it shouldn't shortcut when
the numeric identity is the same, but the source is different; this
means that an update to the ipcache isn't complete.

This can happen, for example, when there are two identities for the same
IP, which can happen on daemon restart whenever a CIDR is referenced.

Fixes: cilium#24502
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
@squeed
Copy link
Contributor

squeed commented Apr 20, 2023

@dctrwatson interesting. At first glance, that seems like a separate issue. When the bug happens, do all pods (with an applicable policy) on the node in question lose access to the apiserver? If you trigger it, can you confirm that the apiserver's IP is missing from cilium bpf ipcache list?

@giovannirco
Copy link

I have tried installing v1.13.2 and within 10min I started seeing a lot of unexpected drops so I tried quay.io/cilium/cilium-dev:v1.13.2-with-24875 image and indeed it fixed these unwanted drops.

image

From the image above it is possible to see that after deploying the image with the fix at 0820utc the pattern of the expected drops came to back to normal and its been like that after ~11h. I will keep monitoring and if I spot something new I will update here

@dctrwatson
Copy link
Contributor

@dctrwatson interesting. At first glance, that seems like a separate issue. When the bug happens, do all pods (with an applicable policy) on the node in question lose access to the apiserver?

No, only pods that have their specific network policy deleted are affected. Other pods on the node continue to work fine until then.

If you trigger it, can you confirm that the apiserver's IP is missing from cilium bpf ipcache list?

When I checked affected nodes, both apiserver ips there. That's why I wasn't sure if this was the same same issue. It just manifested similarly.

joestringer pushed a commit that referenced this issue Apr 21, 2023
InjectLabels is one of the functions responsible for synchronizing the
ipcache metadata store and ip store. As such, it shouldn't shortcut when
the numeric identity is the same, but the source is different; this
means that an update to the ipcache isn't complete.

This can happen, for example, when there are two identities for the same
IP, which can happen on daemon restart whenever a CIDR is referenced.

Fixes: #24502
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
squeed added a commit to squeed/cilium that referenced this issue Apr 24, 2023
[ upstream commit 8d3a498 ]

InjectLabels is one of the functions responsible for synchronizing the
ipcache metadata store and ip store. As such, it shouldn't shortcut when
the numeric identity is the same, but the source is different; this
means that an update to the ipcache isn't complete.

This can happen, for example, when there are two identities for the same
IP, which can happen on daemon restart whenever a CIDR is referenced.

Fixes: cilium#24502
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
michi-covalent pushed a commit that referenced this issue Apr 25, 2023
[ upstream commit 8d3a498 ]

InjectLabels is one of the functions responsible for synchronizing the
ipcache metadata store and ip store. As such, it shouldn't shortcut when
the numeric identity is the same, but the source is different; this
means that an update to the ipcache isn't complete.

This can happen, for example, when there are two identities for the same
IP, which can happen on daemon restart whenever a CIDR is referenced.

Fixes: #24502
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
@schaffino
Copy link

@squeed This looks promising. I've tested just manually editing the agent deployment at runtime in a couple dev clusters to use the quay.io/cilium/cilium-dev:v1.13.2-with-24875 image. That triggers the restart of all agents and would usually cause us problems with several services after the 10 minute window. I'll give it a test with a few clusters running as an upgrade from 1.13.1 tomorrow and see if any problems arise.

Apologies for the slow follow up. We've now done the v1.13.2 upgrade on a number of v1.13.0 and v1.13.1 clusters, setting the agent image specifically to quay.io/cilium/cilium-dev:v1.13.2-with-24875 and are no longer facing this issue

@schaffino
Copy link

schaffino commented Apr 27, 2023

@squeed
Have some further feedback on this, as it may well affect anyone who finds this thread or might be a problem in someone elses 1.13.1 to 1.13.x upgrade if not fixed. After my early success I started rolling this out further to prod with the patched image.

This batch of about 30 clusters was all 1.13.1 to 1.13.2 upgrades. Unfortunately this doesn't seem completely solved, specifically in the case of an upgrade. About a third of the clusters exhibited 1 or more nodes with pods having issue's communicating with the k8s master after upgrade, it also seemed to happen within less than the previous 10 minute window. Post upgrade, agent restarts don't seem to be problematic or cause any outages that we have seen yet. I used to be able to consistently exhibit the problem by manually restarting an agent. This is no longer the case so it looks fixed forward.

Now that things are upgraded and the nodes with k8s connectivity issues restarted, everything is gravey. But I wouldn't say the upgrade was smooth. Additionally, there also seemed to be more significant network disruption with this rollout than I've previously experienced. Services exposed by external load balancers were getting 502's for periods of up to a couple minutes during upgrades when previously this seemed to only be a few seconds, if noticeable at all.

For context, this is on GKE, all clusters updated to 1.25.7-gke.1000 prior to update

@squeed
Copy link
Contributor

squeed commented Apr 27, 2023

@schaffino thanks for the update. Of the pods that had issues, did they have unconstrained network access, or was it restricted by (Cilium)NetworkPolicy?

Would you be able to get a sysdump off a node experiencing this issue? Feel free to give it to me privately.

Do you use KubeProxyReplacement?

@jerryguowei
Copy link

@dctrwatson
I am suffering the same issue as you described when using v1.13.2, but i have not find a way to reproduce it.
just wondering if you have the same issue for v1.12.x?

@dctrwatson
Copy link
Contributor

I created a separate issue for what I'm seeing: #25172

@dctrwatson
I am suffering the same issue as you described when using v1.13.2, but i have not find a way to reproduce it.
just wondering if you have the same issue for v1.12.x?

Not seeing this issue on v1.12.x up through v1.12.9

@schaffino
Copy link

@squeed
All pods that had issues had cilium ingress and egress policies applied and the following to allow access to the kubernetes API. The CIDR value was only added to help debug the issue

# Allow egress to the kubernetes API
  - toServices:
    - k8sService:
        serviceName: kubernetes
        namespace: default
  - toCIDR:
      - {{ .Values.private_kube_endpoint }}/32
    toPorts:
      - ports:
        - port: "443"
          protocol: TCP

I have noticed that some other users give a policy like the following. But that wouldn't make any difference would it? Given the above has been working previously

  - toEntities:
    - kube-apiserver

We don't use KubeProxyReplacement. We use pretty much the default GKE config, except ipv4NativeRoutingCIDR is a wider range than just the clusters.

I feel like this must be something that is specific to cilium in GKE, because after a number of weeks of not having any problems, this has started randomly occurring on fresh cilium installs (with the patched agent) on newly deployed clusters. The GKE version has bumped a couple of times since.

Symptoms are the same, no connectivity to the kube master. Fix is also the same, restart the agent on the problem node. I'll try and get a sysdump for you. Would you prefer I open a new issue?

@schaffino
Copy link

I found that modifying my network policy to include the following fixed it for me without an agent restart

  - toEntities:
      - kube-apiserver
    toPorts:
      - ports:
         - port: "443"
            protocol: TCP

Also, @squeed pointed me at #25241 which I think should fully fix this

jrajahalme pushed a commit to jrajahalme/cilium that referenced this issue May 22, 2023
[ upstream commit 8d3a498 ]

InjectLabels is one of the functions responsible for synchronizing the
ipcache metadata store and ip store. As such, it shouldn't shortcut when
the numeric identity is the same, but the source is different; this
means that an update to the ipcache isn't complete.

This can happen, for example, when there are two identities for the same
IP, which can happen on daemon restart whenever a CIDR is referenced.

Fixes: cilium#24502
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. needs/triage This issue requires triaging to establish severity and next steps. priority/high This is considered vital to an upcoming release. sig/agent Cilium agent related. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging a pull request may close this issue.