zhangguanzhang's Blog

银河麒麟arm64系统上k8s集群跨节点不通的一次排查

字数统计: 2.9k阅读时长: 16 min
2020/10/20 Share

由来

同事在客户那边部署的集群问题频繁,先给他解决了个问题后又反映说业务 POD 由于 DNS 无法解析而启动失败,排查完发现这样的情况从没遇到过,挺有意思的,这里记录下。实际排查过程也有往错误的方向浪费了一些时间和尝试,就不写进来了,以正确的角度写下排查过程。

环境信息

集群信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ kubectl version -o json
{
"clientVersion": {
"major": "1",
"minor": "15",
"gitVersion": "v1.15.12",
"gitCommit": "e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725",
"gitTreeState": "clean",
"buildDate": "2020-05-06T05:17:59Z",
"goVersion": "go1.12.17",
"compiler": "gc",
"platform": "linux/arm64"
},
"serverVersion": {
"major": "1",
"minor": "15",
"gitVersion": "v1.15.12",
"gitCommit": "e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725",
"gitTreeState": "clean",
"buildDate": "2020-05-06T05:09:48Z",
"goVersion": "go1.12.17",
"compiler": "gc",
"platform": "linux/arm64"
}
}

OS 是 arm64 的银河麒麟系统

1
2
3
4
5
6
7
8
9
$ cat /etc/os-release
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Tercel)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Tercel)"
ANSI_COLOR="0;31"
$ uname -a
Linux xxx 4.19.90-17.ky10.aarch64 #1 SMP Sun Jun 28 14:27:40 CST 2020 aarch64 aarch64 aarch64 GNU/Linux

排查

先看下集群 DNS 的 SVC IP。

1
2
3
$ kubectl -n kube-system get svc -l k8s-app=kube-dns
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.186.0.2 <none> 53/UDP,53/TCP,9153/TCP 87m

手动用 dig 发 DNS 请求看看,刚开始是用的cluster.local,后面感觉不对劲看了下 kubelet 的参数发现cluster.domaincluster1.local

1
2
3
$ dig @10.186.0.2 kubernetes.default.svc.cluster1.local +tcp
;; Connection to 10.186.0.2#53(10.186.0.2) for kubernetes.default.svc.cluster1.local failed: timed out.
;; Connection to 10.186.0.2#53(10.186.0.2) for kubernetes.default.svc.cluster1.local failed: timed out.

超时,用 coredns 的 metrics 接口试试:

1
2
$ curl -I 10.186.0.2:9153/metrics
^C

还是超时,看下 flannel 的 vtep 都正确

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ kubectl get node -o yaml | grep -A3 Vtep
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"ea:77:37:86:ee:bf"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.31.159.19
--
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"f2:d2:28:8e:4c:61"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.31.159.20
--
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"2a:f1:d4:d0:32:24"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.31.159.21
--
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"4a:e7:02:47:20:b8"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.31.159.22
--
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"ce:ce:f3:fc:3f:77"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.31.159.23

看下 coredns 的 pod ip,绕过集群 SVC 使用 pod ip 测试下

1
2
3
4
5
6
$ kubectl -n kube-system get po -o wide -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-677d9c57f-tdnd4 1/1 Running 0 10m 10.187.1.24 172.31.159.21 <none> <none>
coredns-677d9c57f-x274j 1/1 Running 0 10m 10.187.4.24 172.31.159.22 <none> <none>
$ curl -I 10.187.1.24:9153/metrics
^C

还是超时,继续上面的 curl ,因为是 curl 的 9153 ,它不是常见的端口,否则下文的 tcpdump 过滤条件太麻烦了。这里去目的主机上抓包

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
$ tcpdump -nn -i flannel.1 host 10.187.1.24 and port 9153 -vv
tcpdump: listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes
16:39:35.019165 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.45920: Flags [S.], cksum 0xe94e (correct), seq 1670919335, ack 1103099581, win 64308, options [mss 1410,sackOK,TS val 3440201878 ecr 632709592,nop,wscale 7], length 0
16:39:35.068097 IP (tos 0x0, ttl 64, id 39684, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.45920 > 10.187.1.24.9153: Flags [S], cksum 0x80ed (correct), seq 1103099580, win 64860, options [mss 1410,sackOK,TS val 632716806 ecr 0,nop,wscale 7], length 0
16:39:35.068241 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.45920: Flags [S.], cksum 0xe91c (correct), seq 1670919335, ack 1103099581, win 64308, options [mss 1410,sackOK,TS val 3440201928 ecr 632709592,nop,wscale 7], length 0
16:39:43.419197 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.45920: Flags [S.], cksum 0xc87e (correct), seq 1670919335, ack 1103099581, win 64308, options [mss 1410,sackOK,TS val 3440210278 ecr 632709592,nop,wscale 7], length 0
16:39:43.708101 IP (tos 0x0, ttl 64, id 39685, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.45920 > 10.187.1.24.9153: Flags [S], cksum 0x5f2d (correct), seq 1103099580, win 64860, options [mss 1410,sackOK,TS val 632725446 ecr 0,nop,wscale 7], length 0
16:39:43.708233 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.45920: Flags [S.], cksum 0xc75c (correct), seq 1670919335, ack 1103099581, win 64308, options [mss 1410,sackOK,TS val 3440210568 ecr 632709592,nop,wscale 7], length 0
16:39:54.141929 IP (tos 0x0, ttl 64, id 12300, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.46388 > 10.187.1.24.9153: Flags [S], cksum 0x0a5a (correct), seq 3149899513, win 64860, options [mss 1410,sackOK,TS val 632735880 ecr 0,nop,wscale 7], length 0
16:39:54.142080 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xeb46 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440221001 ecr 632735880,nop,wscale 7], length 0
16:39:55.148096 IP (tos 0x0, ttl 64, id 12301, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.46388 > 10.187.1.24.9153: Flags [S], cksum 0x066c (correct), seq 3149899513, win 64860, options [mss 1410,sackOK,TS val 632736886 ecr 0,nop,wscale 7], length 0
16:39:55.148381 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xe757 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440222008 ecr 632735880,nop,wscale 7], length 0
16:39:56.219200 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xe329 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440223078 ecr 632735880,nop,wscale 7], length 0
16:39:57.228103 IP (tos 0x0, ttl 64, id 12302, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.46388 > 10.187.1.24.9153: Flags [S], cksum 0xfe4b (correct), seq 3149899513, win 64860, options [mss 1410,sackOK,TS val 632738966 ecr 0,nop,wscale 7], length 0
16:39:57.228247 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xdf37 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440224088 ecr 632735880,nop,wscale 7], length 0
16:39:59.259269 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xd748 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440226119 ecr 632735880,nop,wscale 7], length 0
16:40:00.059221 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.45920: Flags [S.], cksum 0x877e (correct), seq 1670919335, ack 1103099581, win 64308, options [mss 1410,sackOK,TS val 3440226918 ecr 632709592,nop,wscale 7], length 0
16:40:01.308098 IP (tos 0x0, ttl 64, id 12303, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.46388 > 10.187.1.24.9153: Flags [S], cksum 0xee5b (correct), seq 3149899513, win 64860, options [mss 1410,sackOK,TS val 632743046 ecr 0,nop,wscale 7], length 0
16:40:01.308248 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xcf47 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440228168 ecr 632735880,nop,wscale 7], length 0

可以看到回了包,但是报文的Flags都是[S][S.],说明是 TCP 的 SYN 的报文重传了,回到 curl 的机器上,另开一个窗口抓包

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ tcpdump -nn -i flannel.1 host 10.187.1.24 and port 9153 -vv
tcpdump: listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes
16:46:20.324596 IP (tos 0x0, ttl 64, id 7952, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.53748 > 10.187.1.24.9153: Flags [S], cksum 0x29f7 (correct), seq 1340604575, win 64860, options [mss 1410,sackOK,TS val 633118295 ecr 0,nop,wscale 7], length 0
16:46:20.324636 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.53748: Flags [S.], cksum 0xdf09 (correct), seq 1764271535, ack 1340604576, win 64308, options [mss 1410,sackOK,TS val 3440603416 ecr 633118295,nop,wscale 7], length 0
16:46:21.346975 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.53748: Flags [S.], cksum 0xdb0b (correct), seq 1764271535, ack 1340604576, win 64308, options [mss 1410,sackOK,TS val 3440604438 ecr 633118295,nop,wscale 7], length 0
16:46:21.395375 IP (tos 0x0, ttl 64, id 7953, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.53748 > 10.187.1.24.9153: Flags [S], cksum 0x25c8 (correct), seq 1340604575, win 64860, options [mss 1410,sackOK,TS val 633119366 ecr 0,nop,wscale 7], length 0
16:46:21.395409 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.53748: Flags [S.], cksum 0xdada (correct), seq 1764271535, ack 1340604576, win 64308, options [mss 1410,sackOK,TS val 3440604487 ecr 633118295,nop,wscale 7], length 0
16:46:23.426969 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.187.1.24.9153 > 10.187.0.0.53748: Flags [S.], cksum 0xd2eb (correct), seq 1764271535, ack 1340604576, win 64308, options [mss 1410,sackOK,TS val 3440606518 ecr 633118295,nop,wscale 7], length 0
16:46:23.475374 IP (tos 0x0, ttl 64, id 7954, offset 0, flags [DF], proto TCP (6), length 60)
10.187.0.0.53748 > 10.187.1.24.9153: Flags [S], cksum 0x1da8 (correct), seq 1340604575, win 64860, options [mss 1410,sackOK,TS val 633121446 ecr 0,nop,wscale 7], length 0

当时没详细的看上面的报文,这里来仔细分析下上面的报文,收到10.187.1.24.9153回复的报文里seq都是1340604575,从抓包现象看是这个握手包确实回来了,但是从seq的数字看是没有接收者,也是就是目的主机上 pod 一直 tcp 重传。查看了下路由:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.187.0.0/16
FLANNEL_SUBNET=10.187.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
$ ip a s flannel.1
542: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether b6:9b:ed:b0:37:74 brd ff:ff:ff:ff:ff:ff
inet 10.187.0.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::b49b:edff:feb0:3774/64 scope link
valid_lft forever preferred_lft forever
$ ip route get 10.187.0.0
local 10.187.0.0 dev lo src 10.187.0.0 uid 0
cache <local>

绝了,居然错了,莫名奇妙的是lo,看了下NetworkManager是开启的,重启了下它。

1
2
3
4
5
6
7
8
9
10
11
$ systemctl restart NetworkManager
$ ip route get 10.187.0.0
broadcast 10.187.0.0 dev cni0 src 10.187.0.1 uid 0
cache <local,brd>
$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.31.159.254 0.0.0.0 UG 100 0 0 eno1
10.185.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
10.187.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
172.31.159.0 0.0.0.0 255.255.255.0 U 100 0 0 eno1

路由正确了,但是 flannel 到其他节点的路由消失了,得重启下 flannel。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ docker ps -a | grep flanneld
4b3f04e62b25 122cdb7aa710 "/opt/bin/flanneld -…" 2 hours ago Up 2 hours k8s_kube-flannel_kube-flannel-ds-22bwd_kube-system_6f5ce812-c5ae-4102-9398-c4a6fee4c7ab_0
$ docker restart 4b3
4b3
$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.31.159.254 0.0.0.0 UG 100 0 0 eno1
10.185.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
10.187.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
10.187.1.0 10.187.1.0 255.255.255.0 UG 0 0 0 flannel.1
10.187.2.0 10.187.2.0 255.255.255.0 UG 0 0 0 flannel.1
10.187.3.0 10.187.3.0 255.255.255.0 UG 0 0 0 flannel.1
10.187.4.0 10.187.4.0 255.255.255.0 UG 0 0 0 flannel.1
172.31.159.0 0.0.0.0 255.255.255.0 U 100 0 0 eno1
$ ip route get 10.187.0.0
broadcast 10.187.0.0 dev cni0 src 10.187.0.1 uid 0
cache <local,brd>

再 curl 下试试:

1
2
3
4
5
curl -I 10.187.4.24:9153/metrics
HTTP/1.1 200 OK
Content-Length: 19491
Content-Type: text/plain; version=0.0.4; charset=utf-8
Date: Tue, 20 Oct 2020 10:14:42 GMT

然后每台机器上去操作了下,集群跨节点网络没有任何问题了。我们也有其他开了NetworkManager的 K8S 环境,但是麒麟系统上是头一次遇到这个

个人对于 NetworkManager 的一些看法

这个东西我个人角度讲是感觉不成熟,之前有次同事用 nmcli 配置的掩码导致 VIP 失效,配置文件里是 PREFIX ,最后我改回 NETMASK 正常,其他的一些问题也有,这里不多说。它是一个 daemon 进程,但是现在 Linux 上的网络技术层出不穷,它并没有及时的适配好,而且更新发布缓慢。

CATALOG
  1. 1. 由来
    1. 1.1. 环境信息
  2. 2. 排查
    1. 2.1. 个人对于 NetworkManager 的一些看法