Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network problem on node pool workloads-1 with access to the attached network #1309

Closed
BigBrather opened this issue May 14, 2024 · 3 comments
Closed

Comments

@BigBrather
Copy link

BigBrather commented May 14, 2024

/kind bug

What steps did you take and what happened:

I have the following configuration for control-plane node pool:

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: hz-k8s-capi-fsn1-cw-dev
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 10.244.0.0/16
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: hz-k8s-capi-fsn1-cw-dev-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: HetznerCluster
    name: hz-k8s-capi-fsn1-cw-dev

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: hz-k8s-capi-fsn1-cw-dev-control-plane-unhealthy-5m
  namespace: default
spec:
  clusterName: hz-k8s-capi-fsn1-cw-dev
  maxUnhealthy: 100%
  nodeStartupTimeout: 15m
  remediationTemplate:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: HCloudRemediationTemplate
    name: control-plane-remediation-request
  selector:
    matchLabels:
      cluster.x-k8s.io/control-plane: ""
  unhealthyConditions:
  - status: Unknown
    timeout: 180s
    type: Ready
  - status: "False"
    timeout: 180s
    type: Ready

---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: hz-k8s-capi-fsn1-cw-dev-control-plane
  namespace: default
spec:
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        extraArgs:
          authorization-mode: Node,RBAC
          client-ca-file: /etc/kubernetes/pki/ca.crt
          cloud-provider: external
          default-not-ready-toleration-seconds: "45"
          default-unreachable-toleration-seconds: "45"
          enable-aggregator-routing: "true"
          enable-bootstrap-token-auth: "true"
          encryption-provider-config: /etc/kubernetes/encryption-provider.yaml
          etcd-cafile: /etc/kubernetes/pki/etcd/ca.crt
          etcd-certfile: /etc/kubernetes/pki/etcd/server.crt
          etcd-keyfile: /etc/kubernetes/pki/etcd/server.key
          kubelet-client-certificate: /etc/kubernetes/pki/apiserver-kubelet-client.crt
          kubelet-client-key: /etc/kubernetes/pki/apiserver-kubelet-client.key
          kubelet-preferred-address-types: ExternalIP,Hostname,InternalDNS,ExternalDNS
          profiling: "false"
          proxy-client-cert-file: /etc/kubernetes/pki/front-proxy-client.crt
          proxy-client-key-file: /etc/kubernetes/pki/front-proxy-client.key
          requestheader-allowed-names: front-proxy-client
          requestheader-client-ca-file: /etc/kubernetes/pki/front-proxy-ca.crt
          requestheader-extra-headers-prefix: X-Remote-Extra-
          requestheader-group-headers: X-Remote-Group
          requestheader-username-headers: X-Remote-User
          service-account-key-file: /etc/kubernetes/pki/sa.pub
          service-account-lookup: "true"
          tls-cert-file: /etc/kubernetes/pki/apiserver.crt
          tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256
          tls-private-key-file: /etc/kubernetes/pki/apiserver.key
        extraVolumes:
        - hostPath: /etc/kubernetes/encryption-provider.yaml
          mountPath: /etc/kubernetes/encryption-provider.yaml
          name: encryption-provider
      controllerManager:
        extraArgs:
          allocate-node-cidrs: "true"
          authentication-kubeconfig: /etc/kubernetes/controller-manager.conf
          authorization-kubeconfig: /etc/kubernetes/controller-manager.conf
          bind-address: 0.0.0.0
          cloud-provider: external
          cluster-signing-cert-file: /etc/kubernetes/pki/ca.crt
          cluster-signing-duration: 6h0m0s
          cluster-signing-key-file: /etc/kubernetes/pki/ca.key
          kubeconfig: /etc/kubernetes/controller-manager.conf
          profiling: "false"
          requestheader-client-ca-file: /etc/kubernetes/pki/front-proxy-ca.crt
          root-ca-file: /etc/kubernetes/pki/ca.crt
          secure-port: "10257"
          service-account-private-key-file: /etc/kubernetes/pki/sa.key
          terminated-pod-gc-threshold: "10"
          use-service-account-credentials: "true"
      etcd:
        local:
          dataDir: /var/lib/etcd
          extraArgs:
            auto-tls: "false"
            cert-file: /etc/kubernetes/pki/etcd/server.crt
            cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256
            client-cert-auth: "true"
            key-file: /etc/kubernetes/pki/etcd/server.key
            peer-auto-tls: "false"
            peer-client-cert-auth: "true"
            trusted-ca-file: /etc/kubernetes/pki/etcd/ca.crt
      scheduler:
        extraArgs:
          bind-address: 0.0.0.0
          kubeconfig: /etc/kubernetes/scheduler.conf
          profiling: "false"
          secure-port: "10259"
    files:
    - content: |
        apiVersion: apiserver.config.k8s.io/v1
        kind: EncryptionConfiguration
        resources:
          - resources:
            - secrets
            providers:
            - aescbc:
                keys:
                - name: key1
                  secret: 8d7iAcg3/NwN9aijhtEXj5kL2NOHIgokGFjbIBfL6X0=
            - identity: {}
      owner: root:root
      path: /etc/kubernetes/encryption-provider.yaml
      permissions: "0600"
    - content: |
        net.ipv4.conf.lxc*.rp_filter = 0
      owner: root:root
      path: /etc/sysctl.d/99-cilium.conf
      permissions: "0744"
    - content: |
        overlay
        br_netfilter
      owner: root:root
      path: /etc/modules-load.d/crio.conf
      permissions: "0744"
    - content: |
        net.bridge.bridge-nf-call-iptables  = 1
        net.bridge.bridge-nf-call-ip6tables = 1
        net.ipv4.ip_forward                 = 1
      owner: root:root
      path: /etc/sysctl.d/99-kubernetes-cri.conf
      permissions: "0744"
    - content: |
        vm.overcommit_memory=1
        kernel.panic=10
        kernel.panic_on_oops=1
      owner: root:root
      path: /etc/sysctl.d/99-kubelet.conf
      permissions: "0744"
    - content: |
        nameserver 1.1.1.1
        nameserver 1.0.0.1
        nameserver 2606:4700:4700::1111
      owner: root:root
      path: /etc/kubernetes/resolv.conf
      permissions: "0744"
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          anonymous-auth: "false"
          authentication-token-webhook: "true"
          authorization-mode: Webhook
          cloud-provider: external
          event-qps: "5"
          kubeconfig: /etc/kubernetes/kubelet.conf
          max-pods: "120"
          read-only-port: "0"
          resolv-conf: /etc/kubernetes/resolv.conf
          rotate-server-certificates: "true"
          tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256
    joinConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          anonymous-auth: "false"
          authentication-token-webhook: "true"
          authorization-mode: Webhook
          cloud-provider: external
          event-qps: "5"
          kubeconfig: /etc/kubernetes/kubelet.conf
          max-pods: "120"
          read-only-port: "0"
          resolv-conf: /etc/kubernetes/resolv.conf
          rotate-server-certificates: "true"
          tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256
    preKubeadmCommands:
    - set -x
    - export CONTAINERD=1.7.14
    - export KUBERNETES_VERSION=$(echo 1.28.7 | sed 's/^v//')
    - export TRIMMED_KUBERNETES_VERSION=$(echo 1.28.7 | sed 's/^v//' | awk -F . '{print
      $1 "." $2}')
    - ARCH=amd64
    - if [ "$(uname -m)" = "aarch64" ]; then ARCH=arm64; fi
    - localectl set-locale LANG=en_US.UTF-8
    - localectl set-locale LANGUAGE=en_US.UTF-8
    - apt-get update -y
    - apt-get -y install at jq unzip wget socat mtr logrotate apt-transport-https
    - sed -i '/swap/d' /etc/fstab
    - swapoff -a
    - modprobe overlay && modprobe br_netfilter && sysctl --system
    - wget https://github.com/containerd/containerd/releases/download/v$CONTAINERD/cri-containerd-cni-$CONTAINERD-linux-$ARCH.tar.gz
    - wget https://github.com/containerd/containerd/releases/download/v$CONTAINERD/cri-containerd-cni-$CONTAINERD-linux-$ARCH.tar.gz.sha256sum
    - sha256sum --check cri-containerd-cni-$CONTAINERD-linux-$ARCH.tar.gz.sha256sum
    - tar --no-overwrite-dir -C / -xzf cri-containerd-cni-$CONTAINERD-linux-$ARCH.tar.gz
    - rm -f cri-containerd-cni-$CONTAINERD-linux-$ARCH.tar.gz cri-containerd-cni-$CONTAINERD-linux-$ARCH.tar.gz.sha256sum
    - chmod -R 644 /etc/cni && chown -R root:root /etc/cni
    - mkdir -p /etc/containerd
    - containerd config default > /etc/containerd/config.toml
    - sed -i  "s/SystemdCgroup = false/SystemdCgroup = true/" /etc/containerd/config.toml
    - systemctl daemon-reload && systemctl enable containerd && systemctl start containerd
    - mkdir -p /etc/apt/keyrings/
    - curl -fsSL https://pkgs.k8s.io/core:/stable:/v$TRIMMED_KUBERNETES_VERSION/deb/Release.key
      | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
    - echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v$TRIMMED_KUBERNETES_VERSION/deb/
      /" | sudo tee /etc/apt/sources.list.d/kubernetes.list
    - apt-get update
    - apt-get install -y kubelet=$KUBERNETES_VERSION-1.1 kubeadm=$KUBERNETES_VERSION-1.1
      kubectl=$KUBERNETES_VERSION-1.1  bash-completion && apt-mark hold kubelet kubectl
      kubeadm && systemctl enable kubelet
    - kubeadm config images pull --kubernetes-version $KUBERNETES_VERSION
    - echo 'source <(kubectl completion bash)' >>/root/.bashrc
    - echo 'export KUBECONFIG=/etc/kubernetes/admin.conf' >>/root/.bashrc
    - apt-get -y autoremove && apt-get -y clean all
  machineTemplate:
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: HCloudMachineTemplate
      name: hz-k8s-capi-fsn1-cw-dev-control-plane
  replicas: 3
  version: 1.28.7

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: HCloudMachineTemplate
metadata:
  name: hz-k8s-capi-fsn1-cw-dev-control-plane
  namespace: default
spec:
  template:
    spec:
      imageName: ubuntu-22.04
      placementGroupName: control-plane
      type: cpx31

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: HCloudRemediationTemplate
metadata:
  name: control-plane-remediation-request
  namespace: default
spec:
  template:
    spec:
      strategy:
        retryLimit: 1
        timeout: 180s
        type: Reboot

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: HCloudRemediationTemplate
metadata:
  name: worker-remediation-request
  namespace: default
spec:
  template:
    spec:
      strategy:
        retryLimit: 1
        timeout: 180s
        type: Reboot

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: HetznerCluster
metadata:
  name: hz-k8s-capi-fsn1-cw-dev
  namespace: default
spec:
  controlPlaneEndpoint:
    host: ""
    port: 443
  controlPlaneLoadBalancer:
    region: fsn1
  controlPlaneRegions:
  - fsn1
  - nbg1
  - hel1
  hcloudNetwork:
    enabled: true
    cidrBlock: "10.0.0.0/16"
    subnetCidrBlock: "10.0.0.0/24"
    networkZone: "eu-central"
  hcloudPlacementGroups:
  - name: control-plane
    type: spread
  - name: services
    type: spread
  - name: monitoring
    type: spread
  - name: workloads-1
    type: spread
  - name: egress-1
    type: spread
  hetznerSecretRef:
    key:
      hcloudToken: hcloud
      hetznerRobotPassword: robot-password
      hetznerRobotUser: robot-user
    name: hetzner
  sshKeys:
    hcloud:
    - name: user1
    - name: user2

The network of my K8s Cluster, which was created using CAPI, is 10.0.0.0/16, but all the resources that I need to connect to from pods are on another network 10.81.0.0/16.

Thus, the following configuration was decided for node-pool workloads-1:

---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: hz-k8s-capi-fsn1-cw-dev-workloads-1
  namespace: default
spec:
  template:
    spec:
      files:
      - content: |
          net.ipv4.conf.lxc*.rp_filter = 0
        owner: root:root
        path: /etc/sysctl.d/99-cilium.conf
        permissions: "0744"
      - content: |
          overlay
          br_netfilter
        owner: root:root
        path: /etc/modules-load.d/crio.conf
        permissions: "0744"
      - content: |
          net.bridge.bridge-nf-call-iptables  = 1
          net.bridge.bridge-nf-call-ip6tables = 1
          net.ipv4.ip_forward                 = 1
        owner: root:root
        path: /etc/sysctl.d/99-kubernetes-cri.conf
        permissions: "0744"
      - content: |
          vm.overcommit_memory=1
          kernel.panic=10
          kernel.panic_on_oops=1
        owner: root:root
        path: /etc/sysctl.d/99-kubelet.conf
        permissions: "0744"
      - content: |
          nameserver 1.1.1.1
          nameserver 1.0.0.1
          nameserver 2606:4700:4700::1111
        owner: root:root
        path: /etc/kubernetes/resolv.conf
        permissions: "0744"
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            anonymous-auth: "false"
            authentication-token-webhook: "true"
            authorization-mode: Webhook
            cloud-provider: external
            event-qps: "5"
            kubeconfig: /etc/kubernetes/kubelet.conf
            max-pods: "220"
            read-only-port: "0"
            resolv-conf: /etc/kubernetes/resolv.conf
            rotate-server-certificates: "true"
            tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256
      preKubeadmCommands:
      - set -x
      - grep VERSION= /etc/os-release; uname -a
      - export CONTAINERD=1.7.14
      - export KUBERNETES_VERSION=$(echo 1.28.7 | sed 's/^v//')
      - export TRIMMED_KUBERNETES_VERSION=$(echo 1.28.7 | sed 's/^v//' | awk -F .
        '{print $1 "." $2}')
      - ARCH=amd64
      - if [ "$(uname -m)" = "aarch64" ]; then ARCH=arm64; fi
      - localectl set-locale LANG=en_US.UTF-8
      - localectl set-locale LANGUAGE=en_US.UTF-8
      - apt-get update -y
      - apt-get -y install at jq unzip wget socat mtr logrotate apt-transport-https hcloud-cli nfs-common
      - export IHN=$(cat /etc/hostname)
      - export HCLOUD_TOKEN="*********************************************************************"
      - export HCLOUD_CONTEXT="hz-k8s-capi-fsn1-cw-dev"
      - |
        while true; do
          if hcloud server attach-to-network $IHN --network main-network; then
            echo "hcloud command executed successfully. Continuing with the next commands."
            break
          else
            echo "hcloud command failed. Retrying in 10 seconds..."
            sleep 10
          fi
        done
      - sed -i '/swap/d' /etc/fstab
      - swapoff -a
      - modprobe overlay && modprobe br_netfilter && sysctl --system
      - wget https://github.com/containerd/containerd/releases/download/v$CONTAINERD/cri-containerd-cni-$CONTAINERD-linux-$ARCH.tar.gz
      - wget https://github.com/containerd/containerd/releases/download/v$CONTAINERD/cri-containerd-cni-$CONTAINERD-linux-$ARCH.tar.gz.sha256sum
      - sha256sum --check cri-containerd-cni-$CONTAINERD-linux-$ARCH.tar.gz.sha256sum
      - tar --no-overwrite-dir -C / -xzf cri-containerd-cni-$CONTAINERD-linux-$ARCH.tar.gz
      - rm -f cri-containerd-cni-$CONTAINERD-linux-$ARCH.tar.gz cri-containerd-cni-$CONTAINERD-linux-$ARCH.tar.gz.sha256sum
      - chmod -R 644 /etc/cni && chown -R root:root /etc/cni
      - mkdir -p /etc/containerd
      - containerd config default > /etc/containerd/config.toml
      - sed -i  "s/SystemdCgroup = false/SystemdCgroup = true/" /etc/containerd/config.toml
      - systemctl daemon-reload && systemctl enable containerd && systemctl start
        containerd
      - mkdir -p /etc/apt/keyrings/
      - curl -fsSL https://pkgs.k8s.io/core:/stable:/v$TRIMMED_KUBERNETES_VERSION/deb/Release.key
        | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
      - echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v$TRIMMED_KUBERNETES_VERSION/deb/
        /" | sudo tee /etc/apt/sources.list.d/kubernetes.list
      - apt-get update
      - apt-get install -y kubelet=$KUBERNETES_VERSION-1.1 kubeadm=$KUBERNETES_VERSION-1.1
        kubectl=$KUBERNETES_VERSION-1.1  bash-completion && apt-mark hold kubelet
        kubectl kubeadm && systemctl enable kubelet
      - kubeadm config images pull --kubernetes-version $KUBERNETES_VERSION
      - echo 'source <(kubectl completion bash)' >>/root/.bashrc
      - echo 'export KUBECONFIG=/etc/kubernetes/admin.conf' >>/root/.bashrc
      - apt-get -y autoremove && apt-get -y clean all

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  labels:
    nodepool: hz-k8s-capi-fsn1-cw-dev-workloads-1
  name: hz-k8s-capi-fsn1-cw-dev-workloads-1
  namespace: default
spec:
  clusterName: hz-k8s-capi-fsn1-cw-dev
  replicas: 1
  selector:
    matchLabels: null
  template:
    metadata:
      labels:
        nodepool: hz-k8s-capi-fsn1-cw-dev-workloads-1
        node-role.kubernetes.io/nodepool: workloads-1
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: hz-k8s-capi-fsn1-cw-dev-workloads-1
      clusterName: hz-k8s-capi-fsn1-cw-dev
      failureDomain: fsn1
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: HCloudMachineTemplate
        name: hz-k8s-capi-fsn1-cw-dev-workloads-1
      version: 1.28.7

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: hz-k8s-capi-fsn1-cw-dev-workloads-1-unhealthy-5m
  namespace: default
spec:
  clusterName: hz-k8s-capi-fsn1-cw-dev
  maxUnhealthy: 100%
  nodeStartupTimeout: 10m
  remediationTemplate:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: HCloudRemediationTemplate
    name: worker-remediation-request
  selector:
    matchLabels:
      nodepool: hz-k8s-capi-fsn1-cw-dev-workloads-1
  unhealthyConditions:
  - status: Unknown
    timeout: 180s
    type: Ready
  - status: "False"
    timeout: 180s
    type: Ready

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: HCloudMachineTemplate
metadata:
  name: hz-k8s-capi-fsn1-cw-dev-workloads-1
  namespace: default
spec:
  template:
    spec:
      imageName: ubuntu-22.04
      placementGroupName: workloads-1
      type: cpx41

Thus, after deploying node-pool workloads-1, I attach an additional network 10.81.0.0/16 in which I have all the services that pods should have access to and everything works correctly for me in this configuration, but sometimes this network falls off 10.81.0.0/16 on node pool workloads-1 and restoring this network helps only by deleting the server and re-creating it using CAPI. Previously, these network failures happened rarely, but now they are common.

What did you expect to happen:

I expected that after these manipulations I would get access to the network 10.81.0.0/16 and this access to the network would be stable, but in practice, the network constantly falls off, for various reasons, ping does not work or this interface goes down.

Anything else you would like to add:

I would also like to understand whether it is possible to do such manipulations as I did above in these configurations, with an additional network attached to workloads-1.

I'm also interested in whether I can use the following configuration for the control plane:

---
...
spec:
  controlPlaneEndpoint:
    host: ""
    port: 443
  controlPlaneLoadBalancer:
    region: fsn1
  controlPlaneRegions:
  - fsn1
  - nbg1
  - hel1
  hcloudNetwork:
    enabled: true
    cidrBlock: "10.81.0.0/16"
    subnetCidrBlock: "10.81.0.0/24"
    networkZone: "eu-central"
  hcloudPlacementGroups:
  - name: control-plane
    type: spread
  - name: services
    type: spread
  - name: monitoring
    type: spread
  - name: workloads-1
    type: spread
  - name: egress-1
    type: spread
  hetznerSecretRef:
    key:
      hcloudToken: hcloud
      hetznerRobotPassword: robot-password
      hetznerRobotUser: robot-user
    name: hetzner
  sshKeys:
    hcloud:
    - name: user1
    - name: user2

That is, in this configuration, I specified two parameters cidrBlock: "10.81.0.0/16" and subnetCidrBlock: "10.81.0.0/24" that are already in Hetzner Cloud.

Is it possible to add an existing network in this way to a K8s Cluster that is deployed using CAPI or is it necessary to specify a resource ID?

Perhaps such a configuration with the addition of an existing network would help solve my problem.

Environment:

  • cluster-api-provider-hetzner version: v1.0.0-beta.33
  • Kubernetes version: (use kubectl version): v1.28.7
  • OS (e.g. from /etc/os-release): ubuntu-22.04
@batistein
Copy link
Contributor

@BigBrather without going too much into details, I can only tell you from my experience: we used hcloud networks in the beginning three years ago, since we started our managed kubernetes on hetzner, we had so many problems that we stopped using it. Since then, I try it from time to time, but I still have those kinds of problems.

There are so many great solutions out there that use the zero trust principle and with that approach really everything in your infrastructure is going to be much more secure because you always have to think about it from a security perspective. For example, we do not use a hetzner firewall, we only use the cilium host firewall, which makes the management much easier, single pane of declarative configuration and not the problem of misconfiguration or external issues. For internal traffic, we use things like mTLS where appropriate, so you can also get workload attestation with the right tools and a lot more visibility.

@BigBrather
Copy link
Author

@batistein Thanks for your reply.

We abandoned the local network and will observe how CAPI works in this version. Hope this solves it our network problem.

Regarding the firewall settings on the Hetzner Cloud side, it is also clear.

@batistein
Copy link
Contributor

ok then i will close this issue. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants