zhangguanzhang's Blog

exsi 使用 redhat8.4 搭建k8s集群,flannel vxlan 模式的不正常排错

字数统计: 4.2k阅读时长: 23 min
2022/07/28

这几天处理的一个问题,exsi 上 redhat8.4 搭建 k8s 环境,使用 flannel vxlan 模式,pod 的网络表现得非常异常

由来

测试使用我们的工具部署,发现部署中有问题,无法顺利部署完,看到我负责的 etcd 有问题,我就上去看看

排错过程

故障现象

etcd 的 pod 信息:

1
2
3
4
$ kubectl get pod -o wide | grep etcd
etcd1-10.xx.xx.188 1/1 Running 0 27m 172.27.2.11 10.xx.xx.188 <none> <none>
etcd2-10.xx.xx.201 1/1 Running 0 27m 172.27.1.11 10.xx.xx.201 <none> <none>
etcd3-10.xx.xx.208 1/1 Running 0 27m 172.27.0.12 10.xx.xx.208 <none> <none>

看了下三个 etcd 的日志,报错无法连到其他的 etcd ,看下他们的互相通信,执行命令的主机是在 188 上:

1
2
3
4
$ curl 172.27.2.11:2380
404 page not found
$ curl 172.27.1.11:2380
^C

很奇怪,跨节点不通,但是更奇怪的是可以 ping 通非本机的 pod ip:

1
2
3
4
5
6
7
$ ping 172.27.1.11 
PING 172.27.1.11 (172.27.1.11) 56(84) bytes of data.
64 bytes from 172.27.1.11: icmp_seq=1 ttl=63 time=0.534 ms
64 bytes from 172.27.1.11: icmp_seq=2 ttl=63 time=0.464 ms
^C
--- 172.27.1.11 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1030ms

抓包现象

其实这个就已经很奇怪了,因为无论 ping 还是发应用层请求到另一个节点上的 pod,都会走 flannel 的 vxlan 封包的,不应该 icmp 通,而 curl 不通。

下面是在 201 上抓包, icmp 能抓到, curl 的抓不到:

1
2
3
4
5
6
7
$ tcpdump -nn -i flannel.1  host 172.27.1.11     
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on flannel.1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
03:52:49.136420 IP 172.27.2.0 > 172.27.1.11: ICMP echo request, id 30780, seq 1, length 64
03:52:49.136586 IP 172.27.1.11 > 172.27.2.0: ICMP echo reply, id 30780, seq 1, length 64
03:52:50.165997 IP 172.27.2.0 > 172.27.1.11: ICMP echo request, id 30780, seq 2, length 64
03:52:50.166129 IP 172.27.1.11 > 172.27.2.0: ICMP echo reply, id 30780, seq 2, length 64

错误的尝试

虽然最终解决了,但是还是按照时间线写下我的处理过程,看下 iptables 规则:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
$ iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-N LIBVIRT_INP
-N LIBVIRT_OUT
-N LIBVIRT_FWO
-N LIBVIRT_FWI
-N LIBVIRT_FWX
-N DOCKER
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-N KUBE-PROXY-CANARY
-N KUBE-EXTERNAL-SERVICES
-N KUBE-SERVICES
-N KUBE-FORWARD
-N KUBE-FIREWALL
-N KUBE-KUBELET-CANARY
-A INPUT -j KUBE-FIREWALL
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A INPUT -j LIBVIRT_INP
-A INPUT -i cni0 -j ACCEPT
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -j LIBVIRT_FWX
-A FORWARD -j LIBVIRT_FWI
-A FORWARD -j LIBVIRT_FWO
-A FORWARD -j ACCEPT
-A OUTPUT -j KUBE-FIREWALL
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j LIBVIRT_OUT
-A LIBVIRT_INP -i virbr0 -p udp -m udp --dport 53 -j ACCEPT
-A LIBVIRT_INP -i virbr0 -p tcp -m tcp --dport 53 -j ACCEPT
-A LIBVIRT_INP -i virbr0 -p udp -m udp --dport 67 -j ACCEPT
-A LIBVIRT_INP -i virbr0 -p tcp -m tcp --dport 67 -j ACCEPT
-A LIBVIRT_OUT -o virbr0 -p udp -m udp --dport 53 -j ACCEPT
-A LIBVIRT_OUT -o virbr0 -p tcp -m tcp --dport 53 -j ACCEPT
-A LIBVIRT_OUT -o virbr0 -p udp -m udp --dport 68 -j ACCEPT
-A LIBVIRT_OUT -o virbr0 -p tcp -m tcp --dport 68 -j ACCEPT
-A LIBVIRT_FWO -s 192.168.122.0/24 -i virbr0 -j ACCEPT
-A LIBVIRT_FWO -i virbr0 -j REJECT --reject-with icmp-port-unreachable
-A LIBVIRT_FWI -d 192.168.122.0/24 -o virbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A LIBVIRT_FWI -o virbr0 -j REJECT --reject-with icmp-port-unreachable
-A LIBVIRT_FWX -i virbr0 -o virbr0 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
-A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -j DROP
# Warning: iptables-legacy tables present, use iptables-legacy to see them

看到 iptables 里有 libvirt 的规则,问了下,这个机器是安装的过程中选的 Server with GUI,一般 GUI 的 centos 也会带 libvirt 相关的包,先尝试关闭所有节点上的服务:

1
2
3
4
5
systemctl disable --now \
libvirtd-admin.socket \
libvirtd-ro.socket \
libvirtd.socket \
libvirtd

重启后看看规则:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
$ iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-N KUBE-PROXY-CANARY
-N DOCKER
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-N KUBE-EXTERNAL-SERVICES
-N KUBE-SERVICES
-N KUBE-FORWARD
-N KUBE-FIREWALL
-N KUBE-KUBELET-CANARY
-A INPUT -j KUBE-FIREWALL
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A INPUT -i cni0 -j ACCEPT
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -j ACCEPT
-A OUTPUT -j KUBE-FIREWALL
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
-A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -j DROP
# Warning: iptables-legacy tables present, use iptables-legacy to see them

之前没注意看结尾的 Warning,redhat 8 换成了 nf_tables 了。

1
2
$ iptables -V
iptables v1.8.4 (nf_tables)

想看下是不是 legacy 的 iptables 规则影响了,但是机器上并没有 iptables-legacy 命令,rpmfind 的网站下也没找到相关的 rpm 包,然后突发奇想,拉下 kube-proxy 的镜像看看

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ docker pull registry.aliyuncs.com/k8sxio/kube-proxy:v1.21.11
v1.21.11: Pulling from k8sxio/kube-proxy
20b09fbd3037: Pull complete
89906a4ae339: Pull complete
Digest: sha256:2dde58797be0da1f63ba386016c3b11d4447cfbf9b9bad9b72763ea24d9016f3
Status: Downloaded newer image for registry.aliyuncs.com/k8sxio/kube-proxy:v1.21.11
registry.aliyuncs.com/k8sxio/kube-proxy:v1.21.11
$ docker run --rm -ti --privileged -v /run/xtables.lock:/run/xtables.lock \
-v /lib/modules:/lib/modules \
registry.aliyuncs.com/k8sxio/kube-proxy:v1.21.11 sh
# iptables -V
iptables v1.8.5 (legacy)
# iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
# iptables -t nat -S
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
# exit

规则也是对的,使用 nft list ruleset 看了下 nf tables 的规则也没啥问题,和 iptables 看到的是一样的。

开始有头绪

我们之前客户遇到过深信服的超融合和 aCloud 虚拟化平台会使用 8472/udp 导致虚机搭建的 k8s 集群使用 flannel 的 vxlan 模式有问题,所以我们的 flannel 的 vxlan 端口改为了 8475 端口了:

1
2
3
4
$ kubectl -n kube-system get cm kube-flannel-cfg -o yaml | grep -PB2 '847\d\s*$'
"Backend": {
"Type": "vxlan",
"Port": 8475

然后我突发奇想的改下端口试试:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ kubectl -n kube-system edit cm kube-flannel-cfg
$ kubectl -n kube-system get cm kube-flannel-cfg -o yaml | grep -PB2 '847\d\s*$'
"Backend": {
"Type": "vxlan",
"Port": 8472
$ kubectl -n kube-system get pod -o wide -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-6jwmg 1/1 Running 2 3h54m 172.27.4.2 10.xx.xx.222 <none> <none>
coredns-bxqx5 1/1 Running 0 3h54m 172.27.0.2 10.xx.xx.201 <none> <none>
coredns-k84bn 1/1 Running 0 3h54m 172.27.5.2 10.xx.xx.224 <none> <none>
coredns-pzvgq 1/1 Running 3 3h54m 172.27.2.2 10.xx.xx.188 <none> <none>
coredns-rrhql 1/1 Running 1 3h54m 172.27.1.2 10.xx.xx.208 <none> <none>
coredns-s2kn4 1/1 Running 0 3h54m 172.27.7.2 10.xx.xx.223 <none> <none>
coredns-wh8qc 1/1 Running 0 3h54m 172.27.6.2 10.xx.xx.225 <none> <none>
coredns-wqpsw 1/1 Running 0 3h54m 172.27.3.2 10.xx.xx.221 <none> <none>
$ kubectl -n kube-system delete pod -l app=flannel
pod "kube-flannel-ds-6jjkb" deleted
pod "kube-flannel-ds-6xvwg" deleted
pod "kube-flannel-ds-7kpvl" deleted
pod "kube-flannel-ds-8zzbc" deleted
pod "kube-flannel-ds-hgnzz" deleted
pod "kube-flannel-ds-jcthk" deleted
pod "kube-flannel-ds-l757g" deleted
pod "kube-flannel-ds-mmnh8" deleted

然后发现 curl 能通( coredns 的 metrics 9153 是 http 应用,平时测跨节点通信的时候去 curl 访问下就行了,毕竟不是每台机器都有 telnet 命令的),改回 8475 就不通:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ curl 172.27.4.2:9153
404 page not found
$ kubectl -n kube-system edit cm kube-flannel-cfg
configmap/kube-flannel-cfg edited
$ kubectl -n kube-system get cm kube-flannel-cfg -o yaml | grep -PB2 '847\d\s*$'
"Backend": {
"Type": "vxlan",
"Port": 8475
$ kubectl -n kube-system delete pod -l app=flannel
pod "kube-flannel-ds-2622n" deleted
pod "kube-flannel-ds-2zssl" deleted
pod "kube-flannel-ds-85kbs" deleted
pod "kube-flannel-ds-89qp9" deleted
pod "kube-flannel-ds-c6qgm" deleted
pod "kube-flannel-ds-dtxzv" deleted
pod "kube-flannel-ds-kjldq" deleted
pod "kube-flannel-ds-ntfbs" deleted
$ curl 172.27.5.2:9153
^C

验证了下,发现非 8472 的端口就不行,搜了下相关关键字 redhat 8 vxlan exsi,搜到红帽的一个文章 vxlan 的使用,给的端口也是 8472 端口。

我们加了 NetworkManager 的子配置文件不让 NetworkManager 管理 flannel.1 接口,尝试纳管后,调整 flannel 的 vxlan 端口,对比了下属性,没啥大致区别:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
$ nmcli conn show flannel.1
vxlan.parent: ens192
vxlan.id: 1
vxlan.local: 10.xx.xx.222
vxlan.remote: --
vxlan.source-port-min: 0
vxlan.source-port-max: 0
vxlan.destination-port: 8475
vxlan.tos: 0
vxlan.ttl: 0
vxlan.ageing: 300
vxlan.limit: 0
vxlan.learning: no
vxlan.proxy: no
vxlan.rsc: no
vxlan.l2-miss: no
vxlan.l3-miss: no
$ nmcli conn show flannel.1 > 8475.txt
$ nmcli conn show flannel.1 > 8472.txt
$ diff 847*.txt
2c2
< connection.uuid: 7da6bb14-a551-49e9-affe-2569dc04c800
---
> connection.uuid: 1af65851-a9a5-4899-816e-f8c064882643
11c11
< connection.timestamp: 1658910968
---
> connection.timestamp: 1658910776
81c81
< vxlan.destination-port: 8472
---
> vxlan.destination-port: 8475
96c96
< GENERAL.UUID: 7da6bb14-a551-49e9-affe-2569dc04c800
---
> GENERAL.UUID: 1af65851-a9a5-4899-816e-f8c064882643
104,105c104,105
< GENERAL.DBUS-PATH: /org/freedesktop/NetworkManager/ActiveConnection/3
< GENERAL.CON-PATH: /org/freedesktop/NetworkManager/Settings/3
---
> GENERAL.DBUS-PATH: /org/freedesktop/NetworkManager/ActiveConnection/2
> GENERAL.CON-PATH: /org/freedesktop/NetworkManager/Settings/2
110,117c110,117
< IP4.ROUTE[1]: dst = 172.27.3.0/24, nh = 172.27.3.0, mt = 0
< IP4.ROUTE[2]: dst = 172.27.7.0/24, nh = 172.27.7.0, mt = 0
< IP4.ROUTE[3]: dst = 172.27.5.0/24, nh = 172.27.5.0, mt = 0
< IP4.ROUTE[4]: dst = 172.27.6.0/24, nh = 172.27.6.0, mt = 0
< IP4.ROUTE[5]: dst = 172.27.2.0/24, nh = 172.27.2.0, mt = 0
< IP4.ROUTE[6]: dst = 172.27.0.0/24, nh = 172.27.0.0, mt = 0
< IP4.ROUTE[7]: dst = 172.27.1.0/24, nh = 172.27.1.0, mt = 0
< IP6.ADDRESS[1]: fe80::f8a4:b9ff:fe5d:6c9d/64
---
> IP4.ROUTE[1]: dst = 172.27.0.0/24, nh = 172.27.0.0, mt = 0
> IP4.ROUTE[2]: dst = 172.27.1.0/24, nh = 172.27.1.0, mt = 0
> IP4.ROUTE[3]: dst = 172.27.2.0/24, nh = 172.27.2.0, mt = 0
> IP4.ROUTE[4]: dst = 172.27.3.0/24, nh = 172.27.3.0, mt = 0
> IP4.ROUTE[5]: dst = 172.27.5.0/24, nh = 172.27.5.0, mt = 0
> IP4.ROUTE[6]: dst = 172.27.6.0/24, nh = 172.27.6.0, mt = 0
> IP4.ROUTE[7]: dst = 172.27.7.0/24, nh = 172.27.7.0, mt = 0
> IP6.ADDRESS[1]: fe80::60f1:eeff:fe77:1633/64
119,120c119,120
< IP6.ROUTE[1]: dst = ff00::/8, nh = ::, mt = 256, table=255
< IP6.ROUTE[2]: dst = fe80::/64, nh = ::, mt = 256
---
> IP6.ROUTE[1]: dst = fe80::/64, nh = ::, mt = 256
> IP6.ROUTE[2]: dst = ff00::/8, nh = ::, mt = 256, table=255

对照组对比

最开始 redhat8.4 的适配任务应该是分配给我的,我手上在忙其他项目事情,分配给了另一个同事。他说他在武汉的 exsi 上的 redhat8.4 搞的跨节点没问题。

我上去看了下确实,有问题的这个虚机所在的 exsi 是在珠海,武汉这个虚机是 minimal install 的,然后让几个人一起,搞了个对照组。

系统的镜像都是一样的 redhat 8.4 ,安装类型为 gui 和 minimal :

类型 exsi地区+版本 pod 跨节点通信正常? flannel vxlan的端口单独改为8472才通
gui 珠海6.7.0 x yes
minimal 珠海6.7.0 x yes
gui 武汉6.0.0 不需要
minimal 武汉6.0.0 不需要

而且四个场景的 vxlan 模块信息是一模一样的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
$ modinfo vxlan
filename: /lib/modules/4.18.0-305.el8.x86_64/kernel/drivers/net/vxlan.ko.xz
alias: rtnl-link-vxlan
description: Driver for VXLAN encapsulated traffic
author: Stephen Hemminger <stephen@networkplumber.org>
version: 0.1
license: GPL
rhelversion: 8.4
srcversion: C4B9CCC8F1BB3F9CDEEDACF
depends: udp_tunnel,ip6_udp_tunnel
intree: Y
name: vxlan
vermagic: 4.18.0-305.el8.x86_64 SMP mod_unload modversions
sig_id: PKCS#7
signer: Red Hat Enterprise Linux kernel signing key
sig_key: 0D:85:6D:FE:90:3F:7B:A0:D7:04:19:55:4C:9C:D5:EE:1D:42:8D:B6
sig_hashalgo: sha256
signature: 93:09:AA:FE:BA:D1:10:CD:12:8F:2A:F9:43:D4:50:36:4F:36:51:0C:
4B:BD:3D:89:65:1F:5D:7E:24:EE:4E:90:8B:38:99:24:EE:0B:31:4F:
E5:DC:57:C8:60:4A:6F:FE:43:27:43:B1:EC:A4:A1:A4:9F:47:65:91:
0C:6D:6D:E0:A8:4C:97:95:75:27:D5:B0:CD:0A:77:40:A9:A6:ED:E6:
C9:72:26:23:07:4D:B7:D3:B8:B9:AF:C5:18:AF:EA:F8:B7:6C:90:B9:
FD:F1:8F:CE:73:A8:1F:92:F2:FA:A7:5E:53:BE:D6:64:55:06:5B:54:
29:DB:E3:2E:CC:DF:CF:1C:7D:DC:53:CB:92:38:BC:42:7D:89:1F:21:
0A:47:07:63:E6:B9:C6:1E:26:C5:4E:B2:9A:9F:DB:0D:86:31:EE:2A:
DA:87:AE:16:AA:6F:0D:B3:11:0B:44:FD:5E:11:82:8E:83:9D:E8:4F:
2E:1B:A9:AC:66:2D:12:11:43:B0:9B:1E:2C:1C:8B:8B:80:B8:16:9B:
8C:A3:C8:73:C5:D7:0F:E7:B5:F7:30:7D:57:CA:CE:74:3C:A2:DB:9F:
D6:ED:F3:A4:EE:D7:D2:FF:F0:46:1E:18:52:92:A5:6E:BA:30:7F:18:
BB:1C:49:A1:03:05:87:A2:6A:FE:07:8A:CE:14:1F:EE:C9:82:84:B4:
CC:2B:2E:BF:21:BF:78:7B:39:01:1C:EE:C4:48:7B:9C:BA:8C:3B:D3:
75:B5:1D:5A:57:9F:C6:FA:D7:2C:C3:30:49:3E:94:FC:1E:C2:5E:AA:
F4:D8:80:46:46:C1:BF:3C:80:54:46:78:5F:4D:A5:93:41:65:CC:E4:
ED:78:0E:28:2A:DF:EE:C4:8E:EF:25:82:9C:28:07:7D:C1:95:AA:AD:
E8:5C:A7:CC:91:22:03:BB:1F:AD:87:E9:AD:E3:DE:4B:6C:33:A2:FD:
15:E3:41:3B:C5:A7:84:89:25:2F:B7:1B:EF:1F:6A:D6:FE:A6:36:D6:
19:1E:0F:06
parm: udp_port:Destination UDP port (ushort)
parm: log_ecn_error:Log packets received with corrupted ECN (bool)

exsi 的 DVS 虚拟交换机开混杂模式了(中间还把几台机器迁移到同一台 exsi 上,可是还是有问题)还是不行,询问了内部的 exsi 人员,他让我看看是不是 NSX 影响的,给我发了个文章:

1
2
https://kb.vmware.com/s/article/2149996?lang=zh_cn 这个文章说:
从 NSX 6.2.4 开始,默认 VXLAN 端口为 4789,这是由 IANA 分配的标准端口。在 NSX 6.2.4 之前的版本中,默认 VXLAN UDP 端口号为 8472

对于 NSX 完全懵逼,然后找了个 vmware 的大佬 spark-go 咨询了下,大佬帮我看了下我们的 exsi 上没有 NSX。大佬帮我搜索了下,搜到一个 redhat 发布的 kernel bug RHSA-2021:2570 :

1
[ESXi][RHEL-8] VMXNET3 v4 causes invalid checksums of inner packets of VXLAN tunnel (BZ#1960702)

其实昨天我搜素结果里有这个文章的,但是没有点进去看,光去看第一个 redhat 的使用 libvirtd 和 vxlan 的文档了。vmxnet3 是 exsi 给虚机的虚拟网卡,看 bug 描述和内核版本对比了下,就应该是这个问题了:

1
2
$ uname -a
Linux localhost.localdomain 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux

但是找不到下载内核的地方,redhat 使用 yum 要注册,中间试过安装 centos 8 的内核后起不来:

1
2
3
4
5
6
7
8
$ grubby --info DEFAULT
index=0
kernel="/boot/vmlinuz-4.18.0-348.7.1.el8_5.x86_64"
args="ro crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet $tuned_params"
root="/dev/mapper/rhel-root"
initrd="/boot/initramfs-4.18.0-348.7.1.el8_5.x86_64.img $tuned_initrd"
title="CentOS Linux (4.18.0-348.7.1.el8_5.x86_64) 8"
id="d0b79c361c3d4f728f1f9b86bb54acd5-4.18.0-348.7.1.el8_5.x86_64"

修复的链接里看到影响范围是非标准的 vxlan 端口才会出现问题,端口修改挺麻烦的。然后使用简单粗暴的办法,既然触发的 checksum 错误,关闭 flannel.1 的校验看看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ kubectl -n kube-system get cm kube-flannel-cfg -o yaml | grep -PB2 '847\d\s*$'
"Backend": {
"Type": "vxlan",
"Port": 8475
$ curl 172.27.1.2:9153
^C
$ /sbin/ethtool -K flannel.1 tx-checksum-ip-generic off
Actual changes:
tx-checksum-ip-generic: off
tx-tcp-segmentation: off [not requested]
tx-tcp-ecn-segmentation: off [not requested]
tx-tcp-mangleid-segmentation: off [not requested]
tx-tcp6-segmentation: off [not requested]
$ curl 172.27.1.2:9153
404 page not found

看了下珠海的虚机网络适配器类型就是 VMXNET 3 ,武汉的是 E1000。确认是不是 vmxnet3:

1
2
$ readlink -f /sys/class/net/ens192/device/driver/module
/sys/module/vmxnet3

参考:

CATALOG
  1. 1. 由来
  2. 2. 排错过程
    1. 2.1. 故障现象
    2. 2.2. 抓包现象
    3. 2.3. 错误的尝试
    4. 2.4. 开始有头绪
    5. 2.5. 对照组对比
  3. 3. 参考: