Skip to content

Latest commit

 

History

History
523 lines (395 loc) · 18.4 KB

MattersNeedingAttention.md

File metadata and controls

523 lines (395 loc) · 18.4 KB

注意事项

当前 Kubernetes 部署过程中遇到的问题可以通过如下描述进行解决:

Overaly 驱动初始化

因 CRI 默认使用 Overaly 作为存储类型的驱动则需要对磁盘如 xfs 进行 d_type 的支持, 需初始化后才可以正常使用.

$ mkfs.xfs -f -n ftype=1 /dev/xxx

无法通过 nodePort 访问服务

如发生此类问题时修改 Kube-Proxy 中 node_port_address 地址即可,或通过 crane 的 kube_proxy_node_port_addresses 修改。

不要对 pause 进行升级

如果集群是正在使用的不要对 pause 进行更新,它会重启所有的 Pod。

使用 Calico IPIP 更换 BGP

使用 Calico 时默认是 IPIP 协议,如果需要修改 BGP 只需要对 Crane 的 calico_type 改为 Off 即可生效。

但如果是正在使用的集群,启动后删除 tunl0 网卡后才会生效,这里面存在网络冲突。

CNI/ClusterIP 变更

CNI 变更修改 Calico 中的 CALICO_IPV4POOL_CIDR 后重启所有 Pod 生效,或者不重启因 iptables 等规则已经生效且等待下一次更新即可。

ClusterIP 地址变更需要对所有的 Kube-apiServer 的 service-cluster-ip-range 参数进行变更,然后重启 apiServer 即可生效。

ClustarIP 变更后需要对 DNS 进行地址变更,首先修改 DNS Service 的 ClusterIP, 然后变更所有 kubelet 配置,执行如下:

$ sed -i 's/10.96.0.10/10.9.0.10/g' /etc/systemd/system/kubelet.service.d/10-kubelet.conf
$ sed -i 's/10.96.0.10/10.9.0.10/g' /var/lib/kubelet/config.yaml
$ systemctl daemon-reload
$ systemctl restart kubelet

执行完毕后整个变更成功。

Jenkins Build 403 Error

部署 Jenkins 后通过 Ingress 暴露服务访问并构建时,存在 403 错误,请使用 Calico 的 BGP 模式解决。

CoreDNS Error

部署 CoreDNS 是报错如下:

$ docker logs -f f028ba22151a
plugin/forward: no nameservers found

配置本机 /etc/resolv.conf 添加一条 nameserver 解决。

Ansible in Docker

ansible in docker 如果使用部署机器中的某一个实例并使用离线安装,则需要规避两个问题:

  1. 执行的节点不要在 nodes 第一个节点上。

  2. 执行 make local_save_image 后一定要删除当前的 image, 否则会出现无法执行的命令。

不使用离线安装请忽略。

Crane 会判断当前机器是否存在 image,如果判断生效则不会 load 镜像数据到本地,可能会造成第一次部署时丢失执行文件。

apiServer

如果发现 apiServer 日志输出如下:

I0722 20:27:25.279953       1 log.go:172] http: TLS handshake error from 10.200.1.127:47606: read tcp 10.200.1.127:5443->10.200.1.127:47606: read: connection reset by peer
I0722 20:27:43.031841       1 log.go:172] http: TLS handshake error from 10.200.1.127:47718: read tcp 10.200.1.127:5443->10.200.1.127:47718: read: connection reset by peer

请忽略,这是 kubelet 进行健康检查时无法通过 https 校验报错,可以通过 nc 模拟请求进行查看。

hostname

主机名 Master 节点不要设为大写,在生成 kubelet.conf 时会产生无法注册的问题。

Istio-init Bug

当使用 Istio 时,可能会发生 Sidecar istio-init 异常退出问题: (Istio 1.7.1)

$ docker logs -f a19061ee1977
Environment:
------------
ENVOY_PORT=
INBOUND_CAPTURE_PORT=
ISTIO_INBOUND_INTERCEPTION_MODE=
ISTIO_INBOUND_TPROXY_MARK=
ISTIO_INBOUND_TPROXY_ROUTE_TABLE=
ISTIO_INBOUND_PORTS=
ISTIO_OUTBOUND_PORTS=
ISTIO_LOCAL_EXCLUDE_PORTS=
ISTIO_SERVICE_CIDR=
ISTIO_SERVICE_EXCLUDE_CIDR=

Variables:
----------
PROXY_PORT=15001
PROXY_INBOUND_CAPTURE_PORT=15006
PROXY_TUNNEL_PORT=15008
PROXY_UID=1337
PROXY_GID=1337
INBOUND_INTERCEPTION_MODE=REDIRECT
INBOUND_TPROXY_MARK=1337
INBOUND_TPROXY_ROUTE_TABLE=133
INBOUND_PORTS_INCLUDE=*
INBOUND_PORTS_EXCLUDE=15090,15021,15020
OUTBOUND_IP_RANGES_INCLUDE=*
OUTBOUND_IP_RANGES_EXCLUDE=
OUTBOUND_PORTS_INCLUDE=
OUTBOUND_PORTS_EXCLUDE=
KUBEVIRT_INTERFACES=
ENABLE_INBOUND_IPV6=false

Writing following contents to rules file:  /tmp/iptables-rules-1605843213716566540.txt699852508
* nat
-N ISTIO_INBOUND
-N ISTIO_REDIRECT
-N ISTIO_IN_REDIRECT
-N ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp --dport 15008 -j RETURN
-A ISTIO_REDIRECT -p tcp -j REDIRECT --to-ports 15001
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A ISTIO_INBOUND -p tcp --dport 22 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15090 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15021 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15020 -j RETURN
-A ISTIO_INBOUND -p tcp -j ISTIO_IN_REDIRECT
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_OUTPUT -o lo -s 127.0.0.6/32 -j RETURN
-A ISTIO_OUTPUT -o lo ! -d 127.0.0.1/32 -m owner --uid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -o lo ! -d 127.0.0.1/32 -m owner --gid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -d 127.0.0.1/32 -j RETURN
-A ISTIO_OUTPUT -j ISTIO_REDIRECT
COMMIT

iptables-restore --noflush /tmp/iptables-rules-1605843213716566540.txt699852508
iptables-restore: line 2 failed
iptables-save
# Generated by iptables-save v1.6.1 on Fri Nov 20 03:33:33 2020
*raw
:PREROUTING ACCEPT [1005743:938837910]
:OUTPUT ACCEPT [947488:1329385422]
COMMIT
# Completed on Fri Nov 20 03:33:33 2020
# Generated by iptables-save v1.6.1 on Fri Nov 20 03:33:33 2020
*mangle
:PREROUTING ACCEPT [1005743:938837910]
:INPUT ACCEPT [1005743:938837910]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [947488:1329385422]
:POSTROUTING ACCEPT [947488:1329385422]
COMMIT
# Completed on Fri Nov 20 03:33:33 2020
# Generated by iptables-save v1.6.1 on Fri Nov 20 03:33:33 2020
*nat
:PREROUTING ACCEPT [42106:2526360]
:INPUT ACCEPT [42109:2526540]
:OUTPUT ACCEPT [52112:4596284]
:POSTROUTING ACCEPT [75582:6004484]
:ISTIO_INBOUND - [0:0]
:ISTIO_IN_REDIRECT - [0:0]
:ISTIO_OUTPUT - [0:0]
:ISTIO_REDIRECT - [0:0]
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp -m tcp --dport 15008 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 22 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15090 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15021 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15020 -j RETURN
-A ISTIO_INBOUND -p tcp -j ISTIO_IN_REDIRECT
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006
-A ISTIO_OUTPUT -s 127.0.0.6/32 -o lo -j RETURN
-A ISTIO_OUTPUT ! -d 127.0.0.1/32 -o lo -m owner --uid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT ! -d 127.0.0.1/32 -o lo -m owner --gid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -d 127.0.0.1/32 -j RETURN
-A ISTIO_OUTPUT -j ISTIO_REDIRECT
-A ISTIO_REDIRECT -p tcp -j REDIRECT --to-ports 15001
COMMIT
# Completed on Fri Nov 20 03:33:33 2020
# Generated by iptables-save v1.6.1 on Fri Nov 20 03:33:33 2020
*filter
:INPUT ACCEPT [1005743:938837910]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [947488:1329385422]
COMMIT
# Completed on Fri Nov 20 03:33:33 2020
panic: exit status 1

goroutine 1 [running]:
istio.io/istio/tools/istio-iptables/pkg/dependencies.(*RealDependencies).RunOrFail(0x51d3150, 0x2ec4603, 0x10, 0xc00071c1e0, 0x2, 0x2)
	istio.io/istio/tools/istio-iptables/pkg/dependencies/implementation.go:44 +0x96
istio.io/istio/tools/istio-iptables/pkg/cmd.(*IptablesConfigurator).executeIptablesRestoreCommand(0xc00161fd68, 0x7ff59cc9e501, 0x0, 0x0)
	istio.io/istio/tools/istio-iptables/pkg/cmd/run.go:544 +0x387
istio.io/istio/tools/istio-iptables/pkg/cmd.(*IptablesConfigurator).executeCommands(0xc00161fd68)
	istio.io/istio/tools/istio-iptables/pkg/cmd/run.go:551 +0x45
istio.io/istio/tools/istio-iptables/pkg/cmd.(*IptablesConfigurator).run(0xc00161fd68)
	istio.io/istio/tools/istio-iptables/pkg/cmd/run.go:489 +0x2d78
istio.io/istio/tools/istio-iptables/pkg/cmd.glob..func1(0x517a840, 0xc0005cee00, 0x0, 0x10)
	istio.io/istio/tools/istio-iptables/pkg/cmd/root.go:66 +0x148
github.com/spf13/cobra.(*Command).execute(0x517a840, 0xc0005ced00, 0x10, 0x10, 0x517a840, 0xc0005ced00)
	github.com/spf13/cobra@v1.0.0/command.go:846 +0x29d
github.com/spf13/cobra.(*Command).ExecuteC(0x517aae0, 0x0, 0x0, 0x0)
	github.com/spf13/cobra@v1.0.0/command.go:950 +0x349
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.0.0/command.go:887
main.main()
	istio.io/istio/pilot/cmd/pilot-agent/main.go:535 +0x2d

https://zhuanlan.zhihu.com/p/166206610 此问题会在 1.6.6 版本中引发使用这个异常 Pod 的问题,请跳过此版本。 问题出现一般是由于 istio-init 被异常清理导致的重复启动后 istio-proxy2 无法正常启动导致的 Pod 异常。如果是定时清理请通过类似命令提高过 istio docker system prune -af --volumes --filter "label!=io.kubernetes.container.name=istio-init"

慎用 Istioctl

使用 istioctl 部署时,重复使用会冲掉之前的配置,请谨慎使用。

Istio Install

部署 Istio 是可以用命令:

$ istioctl manifest install --set profile=demo --set addonComponents.grafana.enabled=true --set addonComponents.kiali.enabled=true --set addonComponents.prometheus.enabled=true --set addonComponents.tracing.enabled=true

升级时执行: (如需要添加资源建议使用 upgrade)

$ istioctl upgrade -f istio-default-operatpr.yaml --set profile=demo --set addonComponents.grafana.enabled=true --set addonComponents.kiali.enabled=true --set addonComponents.prometheus.enabled=true --set addonComponents.tracing.enabled=true

应用接入 Sidecar 使用方式:

$ istioctl kube-inject -f logging-dep.yaml |kubectl apply -f - -n logging
#or kubectl apply -f < (istioctl kube-inject -f <original application deployment yaml>)

Namespace 默认 Sidecar 部署方式:

$ kubectl label namespace default istio-injection=enabled

禁止 Sidecar 部署配置:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ignored
spec:
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"
    spec:
      containers:
      - name: ignored
        image: tutum/curl
        command: ["/bin/sleep","infinity"]

Deployment Docs

IstioOperator

部署 Istio 后一定要部署 IstioOperator 并配置 ingressgateway, 否则任意的修改可能会导致 ingressgateway 被篡改:

$ istioctl operator init

部署时, 可通过默认配置文件进行部署:

$ istioctl manifest install -f ./istio-operator.yaml  --set profile=demo
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  components:
    ingressGateways:
      - name: istio-ingressgateway
        enabled: true
      - name: aws-ingressgateway
        enabled: true
        label:
          app: aws-ingressgateway
          istio: aws-ingressgateway
        k8s:
          service:
            externalTrafficPolicy: Local
            ports:
            - name: status-port
              nodePort: 52536
              port: 15021
              protocol: TCP
              targetPort: 15021
            - name: http2
              nodePort: 21190
              port: 80
              protocol: TCP
              targetPort: 8080
            - name: https
              nodePort: 44693
              port: 443
              protocol: TCP
              targetPort: 8443
            - name: tcp
              nodePort: 58698
              port: 31400
              protocol: TCP
              targetPort: 31400
            - name: tls
              nodePort: 832
              port: 15443
              protocol: TCP
              targetPort: 15443

Docs

Gateway 使用时写入自定义的 Ingressgateway 名称:

gateways:
  - istio-system/ireader-mobi-gateway

如出现如下错误:

IPVS: rr: TCP 10.200.1.206:44693 - no destination available

尝试更改执行如下命令解决:

$ kubectl rollout restart deployment/aws-ingressgateway -n istio-system
#or
$ kubectl scale deployment/aws-ingressgateway -n istio-system --replicas=3

Kiali Install

Kiali Operator Install.

此方式不会安装 Kiali Server。

$ kubectl create namespace kiali-operator
$ bash <(curl -L https://kiali.io/getLatestKialiOperator) --accessible-namespaces '**'

快速安装 高级安装 卸载

Kiali Auth

Kiali (1.12) 现支持 4 中认证协议:

  • anonymous: gives free access to Kiali.

  • openid: requires authentication through a third-party service to access Kiali.

  • openshift: requires authentication through the OpenShift authentication to access Kiali.

  • token: requires user to provide a Kubernetes ServiceAccount token to access Kiali.

Auth

[过时 Auth 说明]https://kiali.io/documentation/v1.21/installation-guide/#_login_options

修改 auth 时需要修改 kilii-operator 的配置:

$ kubectl edit kiali -n istio-system kiali

Kubelet Bug

kubelet 日志如下:

Dec 21 13:01:03 BJ-M8-HADOOP-105-145 kubelet[353352]: I1221 13:01:03.731043  353352 kubelet_node_status.go:73] Successfully registered node bj-m8-hadoop-105-145
Dec 21 13:01:03 BJ-M8-HADOOP-105-145 kubelet[353352]: E1221 13:01:03.747928  353352 kubelet.go:1845] skipping pod synchronization - container runtime status check may not have completed yet
Dec 21 13:01:03 BJ-M8-HADOOP-105-145 kubelet[353352]: F1221 13:01:03.777407  353352 kubelet.go:1383] Failed to start ContainerManager failed to build map of initial containers from runtime: no PodsandBox found with Id 'a69881c50fe65f411010f10309ed653784650f48c1e4586814932b4acf7437e1'

问题属于 docker 存储驱动的问题,使用 overlay2 后解决。

相关文档 http://blog.ittour.net/2019/09/25/container-runtime-is-down-pleg-is-not-healthy/

kubelet 由于 kernel 较旧引起的 Qos 错误:(kernel 3.10.0-327.el7.x86_64)

Dec 24 10:30:01 BJ-M8-HADOOP-105-144 kubelet[350845]: E1224 10:30:01.932321  350845 dns.go:135] Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.100.20.105 10.100.105.11 10.100.21.240
Dec 24 10:30:08 BJ-M8-HADOOP-105-144 kubelet[350845]: E1224 10:30:08.150179  350845 qos_container_manager_linux.go:328] [ContainerManager]: Failed to update QoS cgroup configuration
Dec 24 10:30:08 BJ-M8-HADOOP-105-144 kubelet[350845]: W1224 10:30:08.150199  350845 qos_container_manager_linux.go:138] [ContainerManager] Failed to reserve QoS requests: failed to set supported cgroup subsystems for cgroup [kubepods burstable]: failed to find subsystem mount for required subsystem: pids

升级内核值 4.14 以上解决。此问题会导致服务启动后容器异常崩溃。

kubelet DNS 错误:

Dec 24 11:13:31 BJ-M8-HADOOP-105-144 kubelet[3680]: E1224 11:13:31.677889    3680 dns.go:135] Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.100.20.105 10.100.105.11 10.100.21.240

修改 /etc/resolv.conf 中 nameserver 数量 <= 3 即可.

Ansible Bug

如果执行时报错rux:

TASK [downloads-packages : Import Kubernetes Containerd Image] *****************************************************************************************************************************************************************************************************************
task path: /crane/crane/roles/downloads-packages/includes/crane/containerd/local_file.yaml:18
fatal: [192.168.6.96]: FAILED! => {"changed": true, "cmd": "ctr -n k8s.io i import /tmp/crane/kubernetes.tar.gz", "delta": "0:00:00.018852", "end": "2020-12-22 15:54:19.290911", "msg": "non-zero return code", "rc": 127, "start": "2020-12-22 15:54:19.272059", "stderr": "/bin/sh: ctr: command not found", "stderr_lines": ["/bin/sh: ctr: command not found"], "stdout": "", "stdout_lines": []}

因 Crane 默认把组件下载到 /usr/local/bin 目前已经把所有涉及到二进制的文件加入了 {{ kubernetes_ctl_path }} 参数头, 如还发现类似问题请自行解决。

Docker build

此问题不只是 build 会出现, run 时也会出现:(docker 19.03.12)

 ---> Running in db89287c9932
OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:319: getting the final child's pid from pipe caused \"EOF\"": unknown

$ uname -a
Linux ip-10-200-1-206.ap-southeast-1.compute.internal 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

相关文档: moby/moby#40835

此问题升级内核解决.

Docker Load

通过 containerd 导入 docker 镜像时报错:

$ ctr image export crane-v1.20.1.5-image.tar.gz docker.io/slzcc/crane:v1.20.1.5

$ docker load -i crane-v1.20.1.5-image.tar.gz
open /data/docker/tmp/docker-import-011951360/blobs/json: no such file or directory

解决方法:

$ cat crane-v1.20.1.5-image.tar.gz |docker import - slzcc/crane
sha256:c7849d33e7d813be764ebd0669a33eec0d4d95818e2761aff14d3de1b833e0e4

Docker Run

启动是可能存在如下问题:(17.12.1-ce)

$ docker run ..
/usr/local/bin/docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 40\"": unknown.

此问题属于系统内存 Bug, 执行内存清除命令即可解决。

$ sync
$ echo 1 > /proc/sys/vm/drop_caches

在 Istio 中使用 Nginx

如果在 Istio 中使用 Nginx 并通过 Proxy 方式代理后端服务则会发生无法通过 vs 进行管理而直接走默认 svc 方式进行访问, 原因是 Nginx 把 Proxy 解析成 IP 进行访问了与 istio 模式想被, 所以需要在 location 中添加如下配置:

proxy_set_header Host <svc>;

强制不要修改 Host 值即可。

istio/istio#14450

升级时 kub-proxy 可能无法正常启动

需要对 kubelet 配置添加如下:

featureGates:
  CSIMigration: false

此问题一般会发生在 1.16 升级 1.17 时出现。

kubernetes/kubernetes#86094