这几天处理的一个问题,exsi 上 redhat8.4 搭建 k8s 环境,使用 flannel vxlan 模式,pod 的网络表现得非常异常
由来
测试使用我们的工具部署,发现部署中有问题,无法顺利部署完,看到我负责的 etcd 有问题,我就上去看看
排错过程
故障现象
etcd 的 pod 信息:
1 2 3 4
| $ kubectl get pod -o wide | grep etcd etcd1-10.xx.xx.188 1/1 Running 0 27m 172.27.2.11 10.xx.xx.188 <none> <none> etcd2-10.xx.xx.201 1/1 Running 0 27m 172.27.1.11 10.xx.xx.201 <none> <none> etcd3-10.xx.xx.208 1/1 Running 0 27m 172.27.0.12 10.xx.xx.208 <none> <none>
|
看了下三个 etcd 的日志,报错无法连到其他的 etcd ,看下他们的互相通信,执行命令的主机是在 188 上:
1 2 3 4
| $ curl 172.27.2.11:2380 404 page not found $ curl 172.27.1.11:2380 ^C
|
很奇怪,跨节点不通,但是更奇怪的是可以 ping 通非本机的 pod ip:
1 2 3 4 5 6 7
| $ ping 172.27.1.11 PING 172.27.1.11 (172.27.1.11) 56(84) bytes of data. 64 bytes from 172.27.1.11: icmp_seq=1 ttl=63 time=0.534 ms 64 bytes from 172.27.1.11: icmp_seq=2 ttl=63 time=0.464 ms ^C --- 172.27.1.11 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1030ms
|
抓包现象
其实这个就已经很奇怪了,因为无论 ping 还是发应用层请求到另一个节点上的 pod,都会走 flannel 的 vxlan 封包的,不应该 icmp 通,而 curl 不通。
下面是在 201 上抓包, icmp 能抓到, curl 的抓不到:
1 2 3 4 5 6 7
| $ tcpdump -nn -i flannel.1 host 172.27.1.11 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on flannel.1, link-type EN10MB (Ethernet), snapshot length 262144 bytes 03:52:49.136420 IP 172.27.2.0 > 172.27.1.11: ICMP echo request, id 30780, seq 1, length 64 03:52:49.136586 IP 172.27.1.11 > 172.27.2.0: ICMP echo reply, id 30780, seq 1, length 64 03:52:50.165997 IP 172.27.2.0 > 172.27.1.11: ICMP echo request, id 30780, seq 2, length 64 03:52:50.166129 IP 172.27.1.11 > 172.27.2.0: ICMP echo reply, id 30780, seq 2, length 64
|
错误的尝试
虽然最终解决了,但是还是按照时间线写下我的处理过程,看下 iptables 规则:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
| $ iptables -S -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT -N LIBVIRT_INP -N LIBVIRT_OUT -N LIBVIRT_FWO -N LIBVIRT_FWI -N LIBVIRT_FWX -N DOCKER -N DOCKER-ISOLATION-STAGE-1 -N DOCKER-ISOLATION-STAGE-2 -N DOCKER-USER -N KUBE-PROXY-CANARY -N KUBE-EXTERNAL-SERVICES -N KUBE-SERVICES -N KUBE-FORWARD -N KUBE-FIREWALL -N KUBE-KUBELET-CANARY -A INPUT -j KUBE-FIREWALL -A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES -A INPUT -j LIBVIRT_INP -A INPUT -i cni0 -j ACCEPT -A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD -A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES -A FORWARD -j DOCKER-USER -A FORWARD -j DOCKER-ISOLATION-STAGE-1 -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -o docker0 -j DOCKER -A FORWARD -i docker0 ! -o docker0 -j ACCEPT -A FORWARD -i docker0 -o docker0 -j ACCEPT -A FORWARD -j LIBVIRT_FWX -A FORWARD -j LIBVIRT_FWI -A FORWARD -j LIBVIRT_FWO -A FORWARD -j ACCEPT -A OUTPUT -j KUBE-FIREWALL -A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A OUTPUT -j LIBVIRT_OUT -A LIBVIRT_INP -i virbr0 -p udp -m udp --dport 53 -j ACCEPT -A LIBVIRT_INP -i virbr0 -p tcp -m tcp --dport 53 -j ACCEPT -A LIBVIRT_INP -i virbr0 -p udp -m udp --dport 67 -j ACCEPT -A LIBVIRT_INP -i virbr0 -p tcp -m tcp --dport 67 -j ACCEPT -A LIBVIRT_OUT -o virbr0 -p udp -m udp --dport 53 -j ACCEPT -A LIBVIRT_OUT -o virbr0 -p tcp -m tcp --dport 53 -j ACCEPT -A LIBVIRT_OUT -o virbr0 -p udp -m udp --dport 68 -j ACCEPT -A LIBVIRT_OUT -o virbr0 -p tcp -m tcp --dport 68 -j ACCEPT -A LIBVIRT_FWO -s 192.168.122.0/24 -i virbr0 -j ACCEPT -A LIBVIRT_FWO -i virbr0 -j REJECT --reject-with icmp-port-unreachable -A LIBVIRT_FWI -d 192.168.122.0/24 -o virbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A LIBVIRT_FWI -o virbr0 -j REJECT --reject-with icmp-port-unreachable -A LIBVIRT_FWX -i virbr0 -o virbr0 -j ACCEPT -A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2 -A DOCKER-ISOLATION-STAGE-1 -j RETURN -A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP -A DOCKER-ISOLATION-STAGE-2 -j RETURN -A DOCKER-USER -j RETURN -A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP -A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP -A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -j DROP # Warning: iptables-legacy tables present, use iptables-legacy to see them
|
看到 iptables 里有 libvirt 的规则,问了下,这个机器是安装的过程中选的 Server with GUI
,一般 GUI 的 centos 也会带 libvirt 相关的包,先尝试关闭所有节点上的服务:
1 2 3 4 5
| systemctl disable --now \ libvirtd-admin.socket \ libvirtd-ro.socket \ libvirtd.socket \ libvirtd
|
重启后看看规则:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
| $ iptables -S -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT -N KUBE-PROXY-CANARY -N DOCKER -N DOCKER-ISOLATION-STAGE-1 -N DOCKER-ISOLATION-STAGE-2 -N DOCKER-USER -N KUBE-EXTERNAL-SERVICES -N KUBE-SERVICES -N KUBE-FORWARD -N KUBE-FIREWALL -N KUBE-KUBELET-CANARY -A INPUT -j KUBE-FIREWALL -A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES -A INPUT -i cni0 -j ACCEPT -A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD -A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES -A FORWARD -j DOCKER-USER -A FORWARD -j DOCKER-ISOLATION-STAGE-1 -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -o docker0 -j DOCKER -A FORWARD -i docker0 ! -o docker0 -j ACCEPT -A FORWARD -i docker0 -o docker0 -j ACCEPT -A FORWARD -j ACCEPT -A OUTPUT -j KUBE-FIREWALL -A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2 -A DOCKER-ISOLATION-STAGE-1 -j RETURN -A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP -A DOCKER-ISOLATION-STAGE-2 -j RETURN -A DOCKER-USER -j RETURN -A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP -A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP -A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -j DROP # Warning: iptables-legacy tables present, use iptables-legacy to see them
|
之前没注意看结尾的 Warning
,redhat 8 换成了 nf_tables
了。
1 2
| $ iptables -V iptables v1.8.4 (nf_tables)
|
想看下是不是 legacy
的 iptables 规则影响了,但是机器上并没有 iptables-legacy
命令,rpmfind 的网站下也没找到相关的 rpm 包,然后突发奇想,拉下 kube-proxy 的镜像看看
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| $ docker pull registry.aliyuncs.com/k8sxio/kube-proxy:v1.21.11 v1.21.11: Pulling from k8sxio/kube-proxy 20b09fbd3037: Pull complete 89906a4ae339: Pull complete Digest: sha256:2dde58797be0da1f63ba386016c3b11d4447cfbf9b9bad9b72763ea24d9016f3 Status: Downloaded newer image for registry.aliyuncs.com/k8sxio/kube-proxy:v1.21.11 registry.aliyuncs.com/k8sxio/kube-proxy:v1.21.11 $ docker run --rm -ti --privileged -v /run/xtables.lock:/run/xtables.lock \ -v /lib/modules:/lib/modules \ registry.aliyuncs.com/k8sxio/kube-proxy:v1.21.11 sh # iptables -V iptables v1.8.5 (legacy) # iptables -S -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT # iptables -t nat -S -P PREROUTING ACCEPT -P INPUT ACCEPT -P OUTPUT ACCEPT -P POSTROUTING ACCEPT # exit
|
规则也是对的,使用 nft list ruleset
看了下 nf tables 的规则也没啥问题,和 iptables 看到的是一样的。
开始有头绪
我们之前客户遇到过深信服的超融合和 aCloud 虚拟化平台会使用 8472/udp 导致虚机搭建的 k8s 集群使用 flannel 的 vxlan 模式有问题,所以我们的 flannel 的 vxlan 端口改为了 8475 端口了:
1 2 3 4
| $ kubectl -n kube-system get cm kube-flannel-cfg -o yaml | grep -PB2 '847\d\s*$' "Backend": { "Type": "vxlan", "Port": 8475
|
然后我突发奇想的改下端口试试:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
| $ kubectl -n kube-system edit cm kube-flannel-cfg $ kubectl -n kube-system get cm kube-flannel-cfg -o yaml | grep -PB2 '847\d\s*$' "Backend": { "Type": "vxlan", "Port": 8472 $ kubectl -n kube-system get pod -o wide -l k8s-app=kube-dns NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES coredns-6jwmg 1/1 Running 2 3h54m 172.27.4.2 10.xx.xx.222 <none> <none> coredns-bxqx5 1/1 Running 0 3h54m 172.27.0.2 10.xx.xx.201 <none> <none> coredns-k84bn 1/1 Running 0 3h54m 172.27.5.2 10.xx.xx.224 <none> <none> coredns-pzvgq 1/1 Running 3 3h54m 172.27.2.2 10.xx.xx.188 <none> <none> coredns-rrhql 1/1 Running 1 3h54m 172.27.1.2 10.xx.xx.208 <none> <none> coredns-s2kn4 1/1 Running 0 3h54m 172.27.7.2 10.xx.xx.223 <none> <none> coredns-wh8qc 1/1 Running 0 3h54m 172.27.6.2 10.xx.xx.225 <none> <none> coredns-wqpsw 1/1 Running 0 3h54m 172.27.3.2 10.xx.xx.221 <none> <none> $ kubectl -n kube-system delete pod -l app=flannel pod "kube-flannel-ds-6jjkb" deleted pod "kube-flannel-ds-6xvwg" deleted pod "kube-flannel-ds-7kpvl" deleted pod "kube-flannel-ds-8zzbc" deleted pod "kube-flannel-ds-hgnzz" deleted pod "kube-flannel-ds-jcthk" deleted pod "kube-flannel-ds-l757g" deleted pod "kube-flannel-ds-mmnh8" deleted
|
然后发现 curl 能通( coredns 的 metrics 9153 是 http 应用,平时测跨节点通信的时候去 curl 访问下就行了,毕竟不是每台机器都有 telnet 命令的),改回 8475 就不通:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| $ curl 172.27.4.2:9153 404 page not found $ kubectl -n kube-system edit cm kube-flannel-cfg configmap/kube-flannel-cfg edited $ kubectl -n kube-system get cm kube-flannel-cfg -o yaml | grep -PB2 '847\d\s*$' "Backend": { "Type": "vxlan", "Port": 8475 $ kubectl -n kube-system delete pod -l app=flannel pod "kube-flannel-ds-2622n" deleted pod "kube-flannel-ds-2zssl" deleted pod "kube-flannel-ds-85kbs" deleted pod "kube-flannel-ds-89qp9" deleted pod "kube-flannel-ds-c6qgm" deleted pod "kube-flannel-ds-dtxzv" deleted pod "kube-flannel-ds-kjldq" deleted pod "kube-flannel-ds-ntfbs" deleted $ curl 172.27.5.2:9153 ^C
|
验证了下,发现非 8472 的端口就不行,搜了下相关关键字 redhat 8 vxlan exsi
,搜到红帽的一个文章 vxlan 的使用,给的端口也是 8472 端口。
我们加了 NetworkManager 的子配置文件不让 NetworkManager 管理 flannel.1 接口,尝试纳管后,调整 flannel 的 vxlan 端口,对比了下属性,没啥大致区别:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
| $ nmcli conn show flannel.1 vxlan.parent: ens192 vxlan.id: 1 vxlan.local: 10.xx.xx.222 vxlan.remote: -- vxlan.source-port-min: 0 vxlan.source-port-max: 0 vxlan.destination-port: 8475 vxlan.tos: 0 vxlan.ttl: 0 vxlan.ageing: 300 vxlan.limit: 0 vxlan.learning: no vxlan.proxy: no vxlan.rsc: no vxlan.l2-miss: no vxlan.l3-miss: no $ nmcli conn show flannel.1 > 8475.txt $ nmcli conn show flannel.1 > 8472.txt $ diff 847*.txt 2c2 < connection.uuid: 7da6bb14-a551-49e9-affe-2569dc04c800 --- > connection.uuid: 1af65851-a9a5-4899-816e-f8c064882643 11c11 < connection.timestamp: 1658910968 --- > connection.timestamp: 1658910776 81c81 < vxlan.destination-port: 8472 --- > vxlan.destination-port: 8475 96c96 < GENERAL.UUID: 7da6bb14-a551-49e9-affe-2569dc04c800 --- > GENERAL.UUID: 1af65851-a9a5-4899-816e-f8c064882643 104,105c104,105 < GENERAL.DBUS-PATH: /org/freedesktop/NetworkManager/ActiveConnection/3 < GENERAL.CON-PATH: /org/freedesktop/NetworkManager/Settings/3 --- > GENERAL.DBUS-PATH: /org/freedesktop/NetworkManager/ActiveConnection/2 > GENERAL.CON-PATH: /org/freedesktop/NetworkManager/Settings/2 110,117c110,117 < IP4.ROUTE[1]: dst = 172.27.3.0/24, nh = 172.27.3.0, mt = 0 < IP4.ROUTE[2]: dst = 172.27.7.0/24, nh = 172.27.7.0, mt = 0 < IP4.ROUTE[3]: dst = 172.27.5.0/24, nh = 172.27.5.0, mt = 0 < IP4.ROUTE[4]: dst = 172.27.6.0/24, nh = 172.27.6.0, mt = 0 < IP4.ROUTE[5]: dst = 172.27.2.0/24, nh = 172.27.2.0, mt = 0 < IP4.ROUTE[6]: dst = 172.27.0.0/24, nh = 172.27.0.0, mt = 0 < IP4.ROUTE[7]: dst = 172.27.1.0/24, nh = 172.27.1.0, mt = 0 < IP6.ADDRESS[1]: fe80::f8a4:b9ff:fe5d:6c9d/64 --- > IP4.ROUTE[1]: dst = 172.27.0.0/24, nh = 172.27.0.0, mt = 0 > IP4.ROUTE[2]: dst = 172.27.1.0/24, nh = 172.27.1.0, mt = 0 > IP4.ROUTE[3]: dst = 172.27.2.0/24, nh = 172.27.2.0, mt = 0 > IP4.ROUTE[4]: dst = 172.27.3.0/24, nh = 172.27.3.0, mt = 0 > IP4.ROUTE[5]: dst = 172.27.5.0/24, nh = 172.27.5.0, mt = 0 > IP4.ROUTE[6]: dst = 172.27.6.0/24, nh = 172.27.6.0, mt = 0 > IP4.ROUTE[7]: dst = 172.27.7.0/24, nh = 172.27.7.0, mt = 0 > IP6.ADDRESS[1]: fe80::60f1:eeff:fe77:1633/64 119,120c119,120 < IP6.ROUTE[1]: dst = ff00::/8, nh = ::, mt = 256, table=255 < IP6.ROUTE[2]: dst = fe80::/64, nh = ::, mt = 256 --- > IP6.ROUTE[1]: dst = fe80::/64, nh = ::, mt = 256 > IP6.ROUTE[2]: dst = ff00::/8, nh = ::, mt = 256, table=255
|
对照组对比
最开始 redhat8.4 的适配任务应该是分配给我的,我手上在忙其他项目事情,分配给了另一个同事。他说他在武汉的 exsi 上的 redhat8.4 搞的跨节点没问题。
我上去看了下确实,有问题的这个虚机所在的 exsi 是在珠海,武汉这个虚机是 minimal install
的,然后让几个人一起,搞了个对照组。
系统的镜像都是一样的 redhat 8.4 ,安装类型为 gui 和 minimal :
类型 |
exsi地区+版本 |
pod 跨节点通信正常? |
flannel vxlan的端口单独改为8472才通 |
gui |
珠海6.7.0 |
x |
yes |
minimal |
珠海6.7.0 |
x |
yes |
gui |
武汉6.0.0 |
✔ |
不需要 |
minimal |
武汉6.0.0 |
✔ |
不需要 |
而且四个场景的 vxlan 模块信息是一模一样的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
| $ modinfo vxlan filename: /lib/modules/4.18.0-305.el8.x86_64/kernel/drivers/net/vxlan.ko.xz alias: rtnl-link-vxlan description: Driver for VXLAN encapsulated traffic author: Stephen Hemminger <stephen@networkplumber.org> version: 0.1 license: GPL rhelversion: 8.4 srcversion: C4B9CCC8F1BB3F9CDEEDACF depends: udp_tunnel,ip6_udp_tunnel intree: Y name: vxlan vermagic: 4.18.0-305.el8.x86_64 SMP mod_unload modversions sig_id: PKCS#7 signer: Red Hat Enterprise Linux kernel signing key sig_key: 0D:85:6D:FE:90:3F:7B:A0:D7:04:19:55:4C:9C:D5:EE:1D:42:8D:B6 sig_hashalgo: sha256 signature: 93:09:AA:FE:BA:D1:10:CD:12:8F:2A:F9:43:D4:50:36:4F:36:51:0C: 4B:BD:3D:89:65:1F:5D:7E:24:EE:4E:90:8B:38:99:24:EE:0B:31:4F: E5:DC:57:C8:60:4A:6F:FE:43:27:43:B1:EC:A4:A1:A4:9F:47:65:91: 0C:6D:6D:E0:A8:4C:97:95:75:27:D5:B0:CD:0A:77:40:A9:A6:ED:E6: C9:72:26:23:07:4D:B7:D3:B8:B9:AF:C5:18:AF:EA:F8:B7:6C:90:B9: FD:F1:8F:CE:73:A8:1F:92:F2:FA:A7:5E:53:BE:D6:64:55:06:5B:54: 29:DB:E3:2E:CC:DF:CF:1C:7D:DC:53:CB:92:38:BC:42:7D:89:1F:21: 0A:47:07:63:E6:B9:C6:1E:26:C5:4E:B2:9A:9F:DB:0D:86:31:EE:2A: DA:87:AE:16:AA:6F:0D:B3:11:0B:44:FD:5E:11:82:8E:83:9D:E8:4F: 2E:1B:A9:AC:66:2D:12:11:43:B0:9B:1E:2C:1C:8B:8B:80:B8:16:9B: 8C:A3:C8:73:C5:D7:0F:E7:B5:F7:30:7D:57:CA:CE:74:3C:A2:DB:9F: D6:ED:F3:A4:EE:D7:D2:FF:F0:46:1E:18:52:92:A5:6E:BA:30:7F:18: BB:1C:49:A1:03:05:87:A2:6A:FE:07:8A:CE:14:1F:EE:C9:82:84:B4: CC:2B:2E:BF:21:BF:78:7B:39:01:1C:EE:C4:48:7B:9C:BA:8C:3B:D3: 75:B5:1D:5A:57:9F:C6:FA:D7:2C:C3:30:49:3E:94:FC:1E:C2:5E:AA: F4:D8:80:46:46:C1:BF:3C:80:54:46:78:5F:4D:A5:93:41:65:CC:E4: ED:78:0E:28:2A:DF:EE:C4:8E:EF:25:82:9C:28:07:7D:C1:95:AA:AD: E8:5C:A7:CC:91:22:03:BB:1F:AD:87:E9:AD:E3:DE:4B:6C:33:A2:FD: 15:E3:41:3B:C5:A7:84:89:25:2F:B7:1B:EF:1F:6A:D6:FE:A6:36:D6: 19:1E:0F:06 parm: udp_port:Destination UDP port (ushort) parm: log_ecn_error:Log packets received with corrupted ECN (bool)
|
exsi 的 DVS 虚拟交换机开混杂模式了(中间还把几台机器迁移到同一台 exsi 上,可是还是有问题)还是不行,询问了内部的 exsi 人员,他让我看看是不是 NSX 影响的,给我发了个文章:
1 2
| https://kb.vmware.com/s/article/2149996?lang=zh_cn 这个文章说: 从 NSX 6.2.4 开始,默认 VXLAN 端口为 4789,这是由 IANA 分配的标准端口。在 NSX 6.2.4 之前的版本中,默认 VXLAN UDP 端口号为 8472
|
对于 NSX 完全懵逼,然后找了个 vmware 的大佬 spark-go 咨询了下,大佬帮我看了下我们的 exsi 上没有 NSX。大佬帮我搜索了下,搜到一个 redhat 发布的 kernel bug RHSA-2021:2570 :
1
| [ESXi][RHEL-8] VMXNET3 v4 causes invalid checksums of inner packets of VXLAN tunnel (BZ#1960702)
|
其实昨天我搜素结果里有这个文章的,但是没有点进去看,光去看第一个 redhat 的使用 libvirtd 和 vxlan 的文档了。vmxnet3
是 exsi 给虚机的虚拟网卡,看 bug 描述和内核版本对比了下,就应该是这个问题了:
1 2
| $ uname -a Linux localhost.localdomain 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
|
但是找不到下载内核的地方,redhat 使用 yum 要注册,中间试过安装 centos 8 的内核后起不来:
1 2 3 4 5 6 7 8
| $ grubby --info DEFAULT index=0 kernel="/boot/vmlinuz-4.18.0-348.7.1.el8_5.x86_64" args="ro crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet $tuned_params" root="/dev/mapper/rhel-root" initrd="/boot/initramfs-4.18.0-348.7.1.el8_5.x86_64.img $tuned_initrd" title="CentOS Linux (4.18.0-348.7.1.el8_5.x86_64) 8" id="d0b79c361c3d4f728f1f9b86bb54acd5-4.18.0-348.7.1.el8_5.x86_64"
|
修复的链接里看到影响范围是非标准的 vxlan 端口才会出现问题,端口修改挺麻烦的。然后使用简单粗暴的办法,既然触发的 checksum 错误,关闭 flannel.1 的校验看看:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| $ kubectl -n kube-system get cm kube-flannel-cfg -o yaml | grep -PB2 '847\d\s*$' "Backend": { "Type": "vxlan", "Port": 8475 $ curl 172.27.1.2:9153 ^C $ /sbin/ethtool -K flannel.1 tx-checksum-ip-generic off Actual changes: tx-checksum-ip-generic: off tx-tcp-segmentation: off [not requested] tx-tcp-ecn-segmentation: off [not requested] tx-tcp-mangleid-segmentation: off [not requested] tx-tcp6-segmentation: off [not requested] $ curl 172.27.1.2:9153 404 page not found
|
看了下珠海的虚机网络适配器类型就是 VMXNET 3
,武汉的是 E1000
。确认是不是 vmxnet3:
1 2
| $ readlink -f /sys/class/net/ens192/device/driver/module /sys/module/vmxnet3
|
参考: