Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eBPF dataplane support for host postrouting masquerade to external services #8701

Open
brentmjohnson opened this issue Apr 8, 2024 · 2 comments
Labels
area/bpf eBPF Dataplane issues kind/enhancement

Comments

@brentmjohnson
Copy link

Simplified scenario involving multi-interface nodes, that works in iptables mode but doesn't in eBPF mode. There doesn't seem to be a way to allow SNAT / MASQ of traffic originating from subnet 192.168.6.0/24 to calico node with destinations to external services in the node's 10.0.0.0/24 subnet.

image

In iptables mode this can be accomplished with something like this:

iptables-nft -A FORWARD -i wg0 -j ACCEPT
iptables-nft -A FORWARD -o wg0 -j ACCEPT
iptables-nft -t nat -A POSTROUTING -o eth0 -j MASQUERADE

Expected Behavior

When running the eBPF dataplane, there should be a mechanism to define postrouting masquerade rules for external destinations.

Current Behavior

ping 10.0.0.2 (k8s-lb) from 192.168.6.1 (tunnel gateway for internal clients)

ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_req=1 ttl=63 time=2.85 ms
^C
--- 10.0.0.2 ping statistics ---
2 packets transmitted, 1 received, 50% packet loss, time 1002ms
rtt min/avg/max/mdev = 2.858/2.858/2.858/0.000 ms

tcpdump on 192.168.6.1

IP 192.168.6.1 > 10.0.0.2: ICMP echo request, id 20348, seq 1, length 64
IP 10.0.0.2 > 192.168.6.1: ICMP echo reply, id 20348, seq 1, length 64
IP 192.168.6.1 > 10.0.0.2: ICMP echo request, id 20348, seq 2, length 64

tcpdump on k8s-control-2 (10.0.0.5)

wg0   In  IP 192.168.6.1 > k8s-lb: ICMP echo request, id 20348, seq 1, length 64
eth0  Out IP k8s-control-2 > k8s-lb: ICMP echo request, id 20348, seq 1, length 64
eth0  In  IP k8s-lb > k8s-control-2: ICMP echo reply, id 20348, seq 1, length 64
wg0   Out IP k8s-lb > 192.168.6.1: ICMP echo reply, id 20348, seq 1, length 64
wg0   In  IP 192.168.6.1 > k8s-lb: ICMP echo request, id 20348, seq 2, length 64
eth0  Out IP 192.168.6.1 > k8s-lb: ICMP echo request, id 20348, seq 2, length 64

tcpdump on k8s-lb (10.0.0.2)

eth0  In  IP k8s-control-2 > k8s-lb: ICMP echo request, id 20348, seq 1, length 64
eth0  Out IP k8s-lb > k8s-control-2: ICMP echo reply, id 20348, seq 1, length 64
eth0  In  IP 192.168.6.1 > k8s-lb: ICMP echo request, id 20348, seq 2, length 64
eth0  Out IP k8s-lb > 192.168.6.1: ICMP echo reply, id 20348, seq 2, length 64

Oddly enough, the first ping gets a response, but the second and subsequent are not SNAT'ed / MASQ'ed at the k8s-control-2 node for the k8s-lb (10.0.0.2) destination and thus are un-routable at k8s-lb back to the original source ip.

Only tried this in IPv4 while waiting for #8636, but presumably IPv6 deployments share the same issue.

Possible Solution

None identified in eBPF mode.

Steps to Reproduce (for bugs)

  1. provision a multi-node k8s cluster or single node cluster with access to a separate host in the same subnet
  2. deploy calico with eBPF dataplane mode

calico-node DaemonSet:

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: calico-node
  namespace: calico-system
  labels:
    k8s-app: calico-node
spec:
  selector:
    matchLabels:
      k8s-app: calico-node
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        k8s-app: calico-node
    spec:
      nodeSelector:
        kubernetes.io/os: linux
      hostNetwork: true
      tolerations:
        - effect: NoSchedule
          operator: Exists
        - key: CriticalAddonsOnly
          operator: Exists
        - effect: NoExecute
          operator: Exists
      serviceAccountName: calico-node
      terminationGracePeriodSeconds: 0
      priorityClassName: system-node-critical
      initContainers:
        - name: upgrade-ipam
          image: docker.io/calico/cni:v3.27.2
          imagePullPolicy: IfNotPresent
          command: ["/opt/cni/bin/calico-ipam", "-upgrade"]
          envFrom:
          - configMapRef:
              name: kubernetes-services-endpoint
              optional: true
          env:
            - name: KUBERNETES_SERVICE_HOST
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: kubernetes_service_host
            - name: KUBERNETES_SERVICE_PORT
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: kubernetes_service_port
            - name: KUBERNETES_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: CALICO_NETWORKING_BACKEND
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: calico_backend
          volumeMounts:
            - mountPath: /var/lib/cni/networks
              name: host-local-net-dir
            - mountPath: /host/opt/cni/bin
              name: cni-bin-dir
          securityContext:
            privileged: true
        - name: install-cni
          image: docker.io/calico/cni:v3.27.2
          imagePullPolicy: IfNotPresent
          command: ["/opt/cni/bin/install"]
          envFrom:
          - configMapRef:
              name: kubernetes-services-endpoint
              optional: true
          env:
            - name: KUBERNETES_SERVICE_HOST
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: kubernetes_service_host
            - name: KUBERNETES_SERVICE_PORT
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: kubernetes_service_port
            - name: CNI_CONF_NAME
              value: "10-calico.conflist"
            - name: CNI_NETWORK_CONFIG
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: cni_network_config
            - name: KUBERNETES_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: CNI_MTU
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: veth_mtu
            - name: SLEEP
              value: "false"
          volumeMounts:
            - mountPath: /host/opt/cni/bin
              name: cni-bin-dir
            - mountPath: /host/etc/cni/net.d
              name: cni-net-dir
          securityContext:
            privileged: true
        - name: "mount-bpffs"
          image: docker.io/calico/node:v3.27.2
          imagePullPolicy: IfNotPresent
          command: ["calico-node", "-init", "-best-effort"]
          volumeMounts:
            - mountPath: /sys/fs
              name: sys-fs
              mountPropagation: Bidirectional
            - mountPath: /var/run/calico
              name: var-run-calico
              mountPropagation: Bidirectional
           - mountPath: /nodeproc
              name: nodeproc
              readOnly: true
          securityContext:
            privileged: true
      containers:
        - name: calico-node
          image: docker.io/calico/node:v3.27.2
          imagePullPolicy: IfNotPresent
          envFrom:
          - configMapRef:
              name: kubernetes-services-endpoint
              optional: true
          env:
            - name: KUBERNETES_SERVICE_HOST
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: kubernetes_service_host
            - name: KUBERNETES_SERVICE_PORT
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: kubernetes_service_port
            - name: FELIX_BPFENABLED
              value: "true"
            - name: FELIX_BPFCONNECTTIMELOADBALANCING
              value: "Disabled"
            - name: FELIX_BPFHOSTNETWORKEDNATWITHOUTCTLB
              value: "Enabled"
            - name: FELIX_BPFEXTERNALSERVICEMODE
              value: "DSR"
            - name: FELIX_BPFKUBEPROXYIPTABLESCLEANUPENABLED
              value: "false"
            - name: FELIX_BPFL3IFACEPATTERN
              value: "wg0"
            - name: DATASTORE_TYPE
              value: "kubernetes"
            - name: WAIT_FOR_DATASTORE
              value: "true"
            - name: NODENAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: CALICO_NETWORKING_BACKEND
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: calico_backend
            - name: CLUSTER_TYPE
              value: "k8s,bgp"
            - name: IP
              value: "autodetect"
            - name: IP4_AUTODETECTION_METHOD
              value: "kubernetes-internal-ip"
            - name: IP6
              value: "none"
            - name: IP6_AUTODETECTION_METHOD
              value: "kubernetes-internal-ip"
            - name: CALICO_IPV4POOL_IPIP
              value: "Never"
            - name: CALICO_IPV4POOL_VXLAN
              value: "Never"
            - name: CALICO_IPV4POOL_BLOCK_SIZE
              value: "26"
            - name: CALICO_IPV4POOL_NAT_OUTGOING
              value: "true"
            - name: CALICO_IPV4POOL_NODE_SELECTOR
              value: all()
            - name: CALICO_IPV4POOL_DISABLE_BGP_EXPORT
              value: "false"
            - name: FELIX_IPINIPMTU
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: veth_mtu
            - name: FELIX_VXLANMTU
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: veth_mtu
            - name: FELIX_WIREGUARDMTU
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: veth_mtu
            - name: NO_DEFAULT_POOLS
              value: "true"
            - name: CALICO_DISABLE_FILE_LOGGING
              value: "true"
            - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
              value: "ACCEPT"
            - name: FELIX_IPV6SUPPORT
              value: "false"
            - name: FELIX_POLICYSYNCPATHPREFIX
              value: /var/run/nodeagent
            - name: FELIX_PROMETHEUSMETRICSENABLED
              value: "true"
            - name: FELIX_SERVICELOOPPREVENTION
              value: "Disabled"
            - name: FELIX_HEALTHENABLED
              value: "true"
            - name: FELIX_HEALTHPORT
              value: "9099"
            - name: CALICO_ROUTER_ID
              value: hash
          securityContext:
            privileged: true
          resources:
            requests:
              cpu: 250m
          lifecycle:
            preStop:
              exec:
                command:
                - /bin/calico-node
                - -shutdown
          livenessProbe:
            exec:
              command:
              - /bin/calico-node
              - -felix-live
              - -bird-live
            periodSeconds: 10
            initialDelaySeconds: 10
            failureThreshold: 6
            timeoutSeconds: 10
          readinessProbe:
            exec:
              command:
              - /bin/calico-node
              - -felix-ready
              - -bird-ready
            periodSeconds: 10
            timeoutSeconds: 10
          volumeMounts:
            - mountPath: /host/etc/cni/net.d
              name: cni-net-dir
              readOnly: false
            - mountPath: /lib/modules
              name: lib-modules
              readOnly: true
            - mountPath: /run/xtables.lock
              name: xtables-lock
              readOnly: false
            - mountPath: /var/run/calico
              name: var-run-calico
              readOnly: false
            - mountPath: /var/lib/calico
              name: var-lib-calico
              readOnly: false
            - name: policysync
              mountPath: /var/run/nodeagent
            - name: bpffs
              mountPath: /sys/fs/bpf
            - name: cni-log-dir
              mountPath: /var/log/calico/cni
              readOnly: true
      volumes:
        - name: lib-modules
          hostPath:
            path: /lib/modules
        - name: var-run-calico
          hostPath:
            path: /var/run/calico
        - name: var-lib-calico
          hostPath:
            path: /var/lib/calico
        - name: xtables-lock
          hostPath:
            path: /run/xtables.lock
            type: FileOrCreate
        - name: sys-fs
          hostPath:
            path: /sys/fs/
            type: DirectoryOrCreate
        - name: bpffs
          hostPath:
            path: /sys/fs/bpf
            type: Directory
        - name: nodeproc
          hostPath:
            path: /proc
        - name: cni-bin-dir
          hostPath:
            path: /opt/cni/bin
        - name: cni-net-dir
          hostPath:
            path: /etc/cni/net.d
        - name: cni-log-dir
          hostPath:
            path: /var/log/calico/cni
        - name: host-local-net-dir
          hostPath:
            path: /var/lib/cni/networks
        - name: policysync
          hostPath:
            type: DirectoryOrCreate
            path: /var/run/nodeagent

calico IPPool:

apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  name: ipv4-ippool
spec:
  blockSize: 26
  cidr: 10.244.0.0/12
  disableBGPExport: false
  ipipMode: Never
  natOutgoing: true
  nodeSelector: all()
  vxlanMode: Never
  1. enable wireguard tunnel / secondary interface on calico node
  2. across the wireguard tunnel / secondary interface, attempt to ping another node / external host on the cluster subnet (outside of cluster service CIDR)
  3. observe that SNAT / MASQ is not applied consistently to the traffic

Context

It doesn't seem like there is a way to enable postrouting rules at the host level for traffic with destinations != cluster services.

Your Environment

Calico Cluster Version: v3.27.2
Calico Cluster Type: k8s,bgp,kubeadm
Orchestrator version: Kubernetes v1.29.2
Operating System and version: Ubuntu 22.04.4 LTS

@tomastigera
Copy link
Contributor

Since you have listed wg0 as a device to be managed by calico ebpf, you essentially opted out from using iptables. The core principle of the ebpf dataplane is that packets mostly travel from device to device without going through the Linux net stack and thus bypassing iptables as well.

         name: FELIX_BPFL3IFACEPATTERN
                 value: "wg0"

Atm moment, disappointingly, I do not think there is a simple solution to your problem. If you were to remove wg0 from the list of devices managed by calico, it would work for incoming traffic, but most likely not for returning echo replies. And you would not get policies on wg0.

Perhaps you could make the k8s-control-2 to be a gateway for 192.168.6.0/24 or make the 192.168.6.1 part of you 10.x.x.x network and do the MASQ?

@brentmjohnson
Copy link
Author

Yeah @tomastigera that is kind of what I expected. Those are good suggestions outside of hosting the destination side of the tunnel on non-calico nodes.

How about with calico-node -bpf nat set - is there currently a mechanism for user-defined persistent bpf nat rules?

Excited to see how eBPF support evolves across the ecosystem. Thanks for all of the hard work / support for the community!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bpf eBPF Dataplane issues kind/enhancement
Projects
None yet
Development

No branches or pull requests

2 participants