zhangguanzhang's Blog

银河麒麟arm64系统上k8s集群跨节点不通的一次排查

字数统计: 3.2k阅读时长: 17 min
2020/10/20

由来

同事在客户那边部署的集群问题频繁,先给他解决了个问题后又反映说业务 POD 由于 DNS 无法解析而启动失败,排查完发现这样的情况从没遇到过,挺有意思的,这里记录下。实际排查过程也有往错误的方向浪费了一些时间和尝试,就不写进来了,以正确的角度写下排查过程。

环境信息

集群信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ kubectl version -o json
{
"clientVersion": {
"major": "1",
"minor": "15",
"gitVersion": "v1.15.12",
"gitCommit": "e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725",
"gitTreeState": "clean",
"buildDate": "2020-05-06T05:17:59Z",
"goVersion": "go1.12.17",
"compiler": "gc",
"platform": "linux/arm64"
},
"serverVersion": {
"major": "1",
"minor": "15",
"gitVersion": "v1.15.12",
"gitCommit": "e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725",
"gitTreeState": "clean",
"buildDate": "2020-05-06T05:09:48Z",
"goVersion": "go1.12.17",
"compiler": "gc",
"platform": "linux/arm64"
}
}

OS 是 arm64 的银河麒麟系统

1
2
3
4
5
6
7
8
9
$ cat /etc/os-release
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Tercel)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Tercel)"
ANSI_COLOR="0;31"
$ uname -a
Linux xxx 4.19.90-17.ky10.aarch64 #1 SMP Sun Jun 28 14:27:40 CST 2020 aarch64 aarch64 aarch64 GNU/Linux

排查

先看下集群 DNS 的 SVC IP。

1
2
3
$ kubectl -n kube-system get svc -l k8s-app=kube-dns
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.186.0.2 <none> 53/UDP,53/TCP,9153/TCP 87m

手动用 dig 发 DNS 请求看看,刚开始是用的cluster.local,后面感觉不对劲看了下 kubelet 的参数发现cluster.domaincluster1.local

1
2
3
$ dig @10.186.0.2 kubernetes.default.svc.cluster1.local +tcp
;; Connection to 10.186.0.2#53(10.186.0.2) for kubernetes.default.svc.cluster1.local failed: timed out.
;; Connection to 10.186.0.2#53(10.186.0.2) for kubernetes.default.svc.cluster1.local failed: timed out.

超时,用 coredns 的 metrics 接口试试:

1
2
$ curl -I 10.186.0.2:9153/metrics
^C

还是超时,看下 flannel 的 vtep 都正确

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ kubectl get node -o yaml | grep -A3 Vtep
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"ea:77:37:86:ee:bf"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.31.159.19
--
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"f2:d2:28:8e:4c:61"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.31.159.20
--
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"2a:f1:d4:d0:32:24"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.31.159.21
--
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"4a:e7:02:47:20:b8"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.31.159.22
--
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"ce:ce:f3:fc:3f:77"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.31.159.23

看下 coredns 的 pod ip,绕过集群 SVC 使用 pod ip 测试下

1
2
3
4
5
6
$ kubectl -n kube-system get po -o wide -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-677d9c57f-tdnd4 1/1 Running 0 10m 10.187.1.24 172.31.159.21 <none> <none>
coredns-677d9c57f-x274j 1/1 Running 0 10m 10.187.4.24 172.31.159.22 <none> <none>
$ curl -I 10.187.1.24:9153/metrics
^C

还是超时,继续上面的 curl ,因为是 curl 的 9153 ,它不是常见的端口,否则下文的 tcpdump 过滤条件太麻烦了。这里去目的主机上抓包

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
$ tcpdump -nn -i flannel.1 host 10.187.1.24 and port 9153 -vv
tcpdump: listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes
16:39:35.019165 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.45920: Flags [S.], cksum 0xe94e (correct), seq 1670919335, ack 1103099581, win 64308, options [mss 1410,sackOK,TS val 3440201878 ecr 632709592,nop,wscale 7], length 0
16:39:35.068097 IP (tos 0x0, ttl 64, id 39684, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.45920 > 10.187.1.24.9153: Flags [S], cksum 0x80ed (correct), seq 1103099580, win 64860, options [mss 1410,sackOK,TS val 632716806 ecr 0,nop,wscale 7], length 0
16:39:35.068241 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.45920: Flags [S.], cksum 0xe91c (correct), seq 1670919335, ack 1103099581, win 64308, options [mss 1410,sackOK,TS val 3440201928 ecr 632709592,nop,wscale 7], length 0
16:39:43.419197 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.45920: Flags [S.], cksum 0xc87e (correct), seq 1670919335, ack 1103099581, win 64308, options [mss 1410,sackOK,TS val 3440210278 ecr 632709592,nop,wscale 7], length 0
16:39:43.708101 IP (tos 0x0, ttl 64, id 39685, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.45920 > 10.187.1.24.9153: Flags [S], cksum 0x5f2d (correct), seq 1103099580, win 64860, options [mss 1410,sackOK,TS val 632725446 ecr 0,nop,wscale 7], length 0
16:39:43.708233 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.45920: Flags [S.], cksum 0xc75c (correct), seq 1670919335, ack 1103099581, win 64308, options [mss 1410,sackOK,TS val 3440210568 ecr 632709592,nop,wscale 7], length 0
16:39:54.141929 IP (tos 0x0, ttl 64, id 12300, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.46388 > 10.187.1.24.9153: Flags [S], cksum 0x0a5a (correct), seq 3149899513, win 64860, options [mss 1410,sackOK,TS val 632735880 ecr 0,nop,wscale 7], length 0
16:39:54.142080 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xeb46 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440221001 ecr 632735880,nop,wscale 7], length 0
16:39:55.148096 IP (tos 0x0, ttl 64, id 12301, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.46388 > 10.187.1.24.9153: Flags [S], cksum 0x066c (correct), seq 3149899513, win 64860, options [mss 1410,sackOK,TS val 632736886 ecr 0,nop,wscale 7], length 0
16:39:55.148381 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xe757 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440222008 ecr 632735880,nop,wscale 7], length 0
16:39:56.219200 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xe329 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440223078 ecr 632735880,nop,wscale 7], length 0
16:39:57.228103 IP (tos 0x0, ttl 64, id 12302, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.46388 > 10.187.1.24.9153: Flags [S], cksum 0xfe4b (correct), seq 3149899513, win 64860, options [mss 1410,sackOK,TS val 632738966 ecr 0,nop,wscale 7], length 0
16:39:57.228247 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xdf37 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440224088 ecr 632735880,nop,wscale 7], length 0
16:39:59.259269 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xd748 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440226119 ecr 632735880,nop,wscale 7], length 0
16:40:00.059221 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.45920: Flags [S.], cksum 0x877e (correct), seq 1670919335, ack 1103099581, win 64308, options [mss 1410,sackOK,TS val 3440226918 ecr 632709592,nop,wscale 7], length 0
16:40:01.308098 IP (tos 0x0, ttl 64, id 12303, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.46388 > 10.187.1.24.9153: Flags [S], cksum 0xee5b (correct), seq 3149899513, win 64860, options [mss 1410,sackOK,TS val 632743046 ecr 0,nop,wscale 7], length 0
16:40:01.308248 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xcf47 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440228168 ecr 632735880,nop,wscale 7], length 0

可以看到回了包,但是报文的Flags都是[S][S.],说明是 TCP 的 SYN 的报文重传了,回到 curl 的机器上,另开一个窗口抓包

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ tcpdump -nn -i flannel.1 host 10.187.1.24 and port 9153 -vv
tcpdump: listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes
16:46:20.324596 IP (tos 0x0, ttl 64, id 7952, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.53748 > 10.187.1.24.9153: Flags [S], cksum 0x29f7 (correct), seq 1340604575, win 64860, options [mss 1410,sackOK,TS val 633118295 ecr 0,nop,wscale 7], length 0
16:46:20.324636 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.53748: Flags [S.], cksum 0xdf09 (correct), seq 1764271535, ack 1340604576, win 64308, options [mss 1410,sackOK,TS val 3440603416 ecr 633118295,nop,wscale 7], length 0
16:46:21.346975 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.53748: Flags [S.], cksum 0xdb0b (correct), seq 1764271535, ack 1340604576, win 64308, options [mss 1410,sackOK,TS val 3440604438 ecr 633118295,nop,wscale 7], length 0
16:46:21.395375 IP (tos 0x0, ttl 64, id 7953, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.53748 > 10.187.1.24.9153: Flags [S], cksum 0x25c8 (correct), seq 1340604575, win 64860, options [mss 1410,sackOK,TS val 633119366 ecr 0,nop,wscale 7], length 0
16:46:21.395409 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.53748: Flags [S.], cksum 0xdada (correct), seq 1764271535, ack 1340604576, win 64308, options [mss 1410,sackOK,TS val 3440604487 ecr 633118295,nop,wscale 7], length 0
16:46:23.426969 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.53748: Flags [S.], cksum 0xd2eb (correct), seq 1764271535, ack 1340604576, win 64308, options [mss 1410,sackOK,TS val 3440606518 ecr 633118295,nop,wscale 7], length 0
16:46:23.475374 IP (tos 0x0, ttl 64, id 7954, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.53748 > 10.187.1.24.9153: Flags [S], cksum 0x1da8 (correct), seq 1340604575, win 64860, options [mss 1410,sackOK,TS val 633121446 ecr 0,nop,wscale 7], length 0

当时没详细的看上面的报文,这里来仔细分析下上面的报文,收到10.187.1.24.9153回复的报文里seq都是1340604575,从抓包现象看是这个握手包确实回来了,但是从seq的数字看是没有接收者,也是就是目的主机上 pod 一直 tcp 重传。查看了下路由:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.187.0.0/16
FLANNEL_SUBNET=10.187.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
$ ip a s flannel.1
542: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether b6:9b:ed:b0:37:74 brd ff:ff:ff:ff:ff:ff
inet 10.187.0.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::b49b:edff:feb0:3774/64 scope link
valid_lft forever preferred_lft forever
$ ip route get 10.187.0.0
local 10.187.0.0 dev lo src 10.187.0.0 uid 0
cache <local>

绝了,居然错了,莫名奇妙的是lo,看了下NetworkManager是开启的,重启了下它。

1
2
3
4
5
6
7
8
9
10
11
$ systemctl restart NetworkManager
$ ip route get 10.187.0.0
broadcast 10.187.0.0 dev cni0 src 10.187.0.1 uid 0
cache <local,brd>
$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.31.159.254 0.0.0.0 UG 100 0 0 eno1
10.185.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
10.187.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
172.31.159.0 0.0.0.0 255.255.255.0 U 100 0 0 eno1

路由正确了,但是 flannel 到其他节点的路由消失了,得重启下 flannel。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ docker ps -a | grep flanneld
4b3f04e62b25 122cdb7aa710 "/opt/bin/flanneld -…" 2 hours ago Up 2 hours k8s_kube-flannel_kube-flannel-ds-22bwd_kube-system_6f5ce812-c5ae-4102-9398-c4a6fee4c7ab_0
$ docker restart 4b3
4b3
$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.31.159.254 0.0.0.0 UG 100 0 0 eno1
10.185.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
10.187.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
10.187.1.0 10.187.1.0 255.255.255.0 UG 0 0 0 flannel.1
10.187.2.0 10.187.2.0 255.255.255.0 UG 0 0 0 flannel.1
10.187.3.0 10.187.3.0 255.255.255.0 UG 0 0 0 flannel.1
10.187.4.0 10.187.4.0 255.255.255.0 UG 0 0 0 flannel.1
172.31.159.0 0.0.0.0 255.255.255.0 U 100 0 0 eno1
$ ip route get 10.187.0.0
broadcast 10.187.0.0 dev cni0 src 10.187.0.1 uid 0
cache <local,brd>

再 curl 下试试:

1
2
3
4
5
curl -I 10.187.4.24:9153/metrics
HTTP/1.1 200 OK
Content-Length: 19491
Content-Type: text/plain; version=0.0.4; charset=utf-8
Date: Tue, 20 Oct 2020 10:14:42 GMT

然后每台机器上去操作了下,集群跨节点网络没有任何问题了。我们也有其他开了NetworkManager的 K8S 环境,但是麒麟系统上是头一次遇到这个

个人对于 NetworkManager 的一些看法

这个东西我个人角度讲是感觉不成熟,之前有次同事用 nmcli 配置的掩码导致 VIP 失效,配置文件里是 PREFIX ,最后我改回 NETMASK 正常,其他的一些问题也有,这里不多说。它是一个 daemon 进程,但是现在 Linux 上的网络技术层出不穷,它并没有及时的适配好,而且更新发布缓慢。

2021/01/12 尝试了下面,在部署之前执行下面即可永久解决

1
2
3
4
5
6
7
8
9
cat> /etc/NetworkManager/conf.d/k8s.conf << 'EOF'
[keyfile]
unmanaged-devices=interface-name:cali*;interface-name:tunl*;interface-name:vxlan.calico;interface-name:flannel*;interface-name:veth*;interface-name:cni0;interface-name:docker0
# nmcli con show 显示 uuid
# 解决nmcli con show <uuid> | grep ipv4.dns 为空
# 机器重启后 /etc/resolv.conf 为空 nameserver 的问题
[main]
dns=none
EOF

参考

CATALOG
  1. 1. 由来
    1. 1.1. 环境信息
  2. 2. 排查
    1. 2.1. 个人对于 NetworkManager 的一些看法
  3. 3. 参考