zhangguanzhang's Blog

flannel 路由错乱

字数统计: 3.8k阅读时长: 22 min
2025/09/03

一次 flannel 路由错乱导致的跨节点不通

由来

有一套客户测试环境,实施部署业务后发现有问题,看了下业务日志无法解析域名。因为 host-gw 需要二层,而且有些虚拟化有 IP/MAC 绑定,所以默认用 vxlan 模式。

排查

路由不对

上去查了下发现 Pod 网段路由不对:

1
2
3
4
5
6
7
8
9
$  ip r s 
default via 172.16.0.1 dev eth0 proto dhcp metric 100
10.185.0.0/16 dev docker0 proto kernel scope link src 10.185.0.1 linkdown
10.187.0.0/24 via 172.16.0.250 dev eth0
10.187.1.0/24 via 172.16.0.202 dev eth0
10.187.2.0/24 via 172.16.0.104 dev eth0 #<--- 这几个
10.187.3.0/24 via 172.16.0.104 dev eth0 #<--- 这几个
10.187.3.0/24 dev cni0 proto kernel scope link src 10.187.3.1 #<--- 这几个
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.231 metric 100

是 vxlan 模式,但是 2.0、3.0 下一跳都是 172.16.0.104,而且本机是 10.187.3.0/24 网段,cni0 的路由无法匹配到。怀疑是客户添加的路由,让和客户沟通后换个和客户内网不重合的 Pod CIDR。然后实施重装后还是这样,上去看了下日志:

模式错乱

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ docker ps -a | grep flanneld
$ docker logs xxxx # flannel容器ID
W0903 01:37:31.229597 1 main.go:540] no subnet found for key: FLANNEL_IPV6_NETWORK in file: /run/flannel/subnet.env
W0903 01:37:31.229636 1 main.go:540] no subnet found for key: FLANNEL_IPV6_SUBNET in file: /run/flannel/subnet.env
I0903 01:37:31.229643 1 iptables.go:125] Setting up masking rules
I0903 01:37:31.422425 1 iptables.go:226] Changing default FORWARD chain policy to ACCEPT
I0903 01:37:31.523863 1 main.go:396] Wrote subnet file to /run/flannel/subnet.env
I0903 01:37:31.523889 1 main.go:400] Running backend.
I0903 01:37:31.524023 1 route_network.go:56] Watching for new subnet leases
I0903 01:37:31.524247 1 subnet.go:152] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa610100, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xac100068, PublicIPv6:(*ip.IP6)(nil), BackendType:"host-gw", BackendData:json.RawMessage{0x6e, 0x75, 0x6c, 0x6c}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0903 01:37:31.524425 1 subnet.go:152] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa610300, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xac1000e7, PublicIPv6:(*ip.IP6)(nil), BackendType:"host-gw", BackendData:json.RawMessage{0x6e, 0x75, 0x6c, 0x6c}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0903 01:37:31.524449 1 route_network.go:93] Subnet added: 10.97.1.0/24 via 172.16.0.104
I0903 01:37:31.524705 1 route_network.go:166] Route to {Ifindex: 2 Dst: 10.97.1.0/24 Src: <nil> Gw: 172.16.0.104 Flags: [] Table: 0 Realm: 0} already exists, skipping.
I0903 01:37:31.524797 1 route_network.go:93] Subnet added: 10.97.3.0/24 via 172.16.0.231
I0903 01:37:31.524876 1 route_network.go:166] Route to {Ifindex: 2 Dst: 10.97.3.0/24 Src: <nil> Gw: 172.16.0.231 Flags: [] Table: 0 Realm: 0} already exists, skipping.
I0903 01:37:31.524903 1 subnet.go:152] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa610000, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xac1000fa, PublicIPv6:(*ip.IP6)(nil), BackendType:"host-gw", BackendData:json.RawMessage{0x6e, 0x75, 0x6c, 0x6c}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0903 01:37:31.524939 1 route_network.go:93] Subnet added: 10.97.0.0/24 via 172.16.0.250
I0903 01:37:31.525190 1 route_network.go:166] Route to {Ifindex: 2 Dst: 10.97.0.0/24 Src: <nil> Gw: 172.16.0.250 Flags: [] Table: 0 Realm: 0} already exists, skipping.
I0903 01:37:31.621427 1 main.go:421] Waiting for all goroutines to exit
I0903 01:37:31.922267 1 iptables.go:372] bootstrap done
I0903 01:37:32.129063 1 iptables.go:372] bootstrap done

上面日志很奇怪,注意几个关键地方:

  • BackendType:"host-gw"
  • Subnet added: 10.97.0.0/24 via 172.16.0.250

怎么会是 host-gw 模式,查看下路由:

1
2
3
4
5
6
7
8
9
10
11
12
[root@vm172-16-0-202 ~]# ip r s 
default via 172.16.0.1 dev eth0 proto dhcp metric 100
10.97.0.0/24 via 172.16.0.250 dev eth0
10.97.1.0/24 via 172.16.0.104 dev eth0
10.97.2.0/24 via 172.16.0.250 dev eth0
10.97.2.0/24 dev cni0 proto kernel scope link src 10.97.2.1
10.97.3.0/24 via 172.16.0.231 dev eth0
10.185.0.0/16 dev docker0 proto kernel scope link src 10.185.0.1 linkdown
10.187.0.0/24 via 172.16.0.250 dev eth0
10.187.2.0/24 via 172.16.0.104 dev eth0
10.187.3.0/24 via 172.16.0.231 dev eth0
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.202 metric 100

老路由忽略,新路由看确实是 host-gw 模式的路由,看下 configmap 配置模式:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
[root@vm172-16-0-202 ~]# kubectl -n kube-system get cm kube-flannel-cfg -o yaml
apiVersion: v1
data:
cni-conf.json: |
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{

"Network": "10.97.0.0/16",
"Backend": {
"Type": "vxlan",
"Port": 8475
}
}
kind: ConfigMap

configmap 是 vxlan 没问题,看下文件:

1
2
3
4
5
[root@vm172-16-0-202 ~]# cat /run/flannel/subnet.env 
FLANNEL_NETWORK=10.97.0.0/16
FLANNEL_SUBNET=10.97.2.1/24
FLANNEL_MTU=1500
FLANNEL_IPMASQ=true

奇怪了,怎么是 host-gw 的 1500 MTU,删除下 flannel 容器后再看看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
[root@vm172-16-0-202 ~]# docker rm -f xxx # flanneld 容器id
[root@vm172-16-0-202 ~]# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.97.0.0/16
FLANNEL_SUBNET=10.97.2.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
[root@vm172-16-0-202 ~]# docker ps -a | grep flanneld
e288c4442ee2 reg.xxx.lan:5000/xxx/flannel "/opt/bin/flanneld -…" 41 seconds ago Up 40 seconds k8s_kube-flannel_kube-flannel-ds-w7rk2_kube-system_581101b1-cfa5-4ccc-80be-e78a9c248b96_1
[root@vm172-16-0-202 ~]# docker logs e288
I0903 01:47:13.123249 1 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true}
W0903 01:47:13.123387 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0903 01:47:13.223377 1 kube.go:139] Waiting 10m0s for node controller to sync
I0903 01:47:13.223476 1 kube.go:469] Starting kube subnet manager
I0903 01:47:13.230593 1 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.97.1.0/24]
I0903 01:47:13.230628 1 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.97.2.0/24]
I0903 01:47:13.230637 1 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.97.3.0/24]
I0903 01:47:13.230644 1 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.97.0.0/24]
I0903 01:47:14.223684 1 kube.go:146] Node controller sync successful
I0903 01:47:14.223761 1 main.go:231] Created subnet manager: Kubernetes Subnet Manager - 172.16.0.202
I0903 01:47:14.223767 1 main.go:234] Installing signal handlers
I0903 01:47:14.224176 1 main.go:452] Found network config - Backend type: vxlan
I0903 01:47:14.229346 1 kube.go:669] List of node(172.16.0.202) annotations: map[string]string{"flannel.alpha.coreos.com/backend-data":"null", "flannel.alpha.coreos.com/backend-type":"host-gw", "flannel.alpha.coreos.com/kube-subnet-manager":"true", "flannel.alpha.coreos.com/public-ip":"172.16.0.202", "node.alpha.kubernetes.io/ttl":"0", "volumes.kubernetes.io/controller-managed-attach-detach":"true"}
I0903 01:47:14.229398 1 match.go:210] Determining IP address of default interface
I0903 01:47:14.229758 1 match.go:263] Using interface with name eth0 and address 172.16.0.202
I0903 01:47:14.229788 1 match.go:285] Defaulting external address to interface address (172.16.0.202)
I0903 01:47:14.229841 1 vxlan.go:141] VXLAN config: VNI=1 Port=8475 GBP=false Learning=false DirectRouting=false
I0903 01:47:14.233427 1 kube.go:636] List of node(172.16.0.202) annotations: map[string]string{"flannel.alpha.coreos.com/backend-data":"null", "flannel.alpha.coreos.com/backend-type":"host-gw", "flannel.alpha.coreos.com/kube-subnet-manager":"true", "flannel.alpha.coreos.com/public-ip":"172.16.0.202", "node.alpha.kubernetes.io/ttl":"0", "volumes.kubernetes.io/controller-managed-attach-detach":"true"}
I0903 01:47:14.249292 1 iptables.go:51] Starting flannel in iptables mode...
W0903 01:47:14.249452 1 main.go:540] no subnet found for key: FLANNEL_IPV6_NETWORK in file: /run/flannel/subnet.env
W0903 01:47:14.249491 1 main.go:540] no subnet found for key: FLANNEL_IPV6_SUBNET in file: /run/flannel/subnet.env
I0903 01:47:14.249498 1 iptables.go:125] Setting up masking rules
I0903 01:47:14.250112 1 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.97.2.0/24]
I0903 01:47:14.529553 1 iptables.go:226] Changing default FORWARD chain policy to ACCEPT
I0903 01:47:14.622211 1 main.go:396] Wrote subnet file to /run/flannel/subnet.env
I0903 01:47:14.622240 1 main.go:400] Running backend.
I0903 01:47:14.622545 1 vxlan_network.go:65] watching for new subnet leases
I0903 01:47:14.622590 1 subnet.go:152] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa610100, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xac100068, PublicIPv6:(*ip.IP6)(nil), BackendType:"host-gw", BackendData:json.RawMessage{0x6e, 0x75, 0x6c, 0x6c}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0903 01:47:14.622651 1 subnet.go:152] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa610300, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xac1000e7, PublicIPv6:(*ip.IP6)(nil), BackendType:"host-gw", BackendData:json.RawMessage{0x6e, 0x75, 0x6c, 0x6c}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0903 01:47:14.622694 1 vxlan_network.go:100] Received Subnet Event with VxLan: BackendType: host-gw, PublicIP: 172.16.0.104, PublicIPv6: (nil), BackendData: null, BackendV6Data: (nil)
W0903 01:47:14.622710 1 vxlan_network.go:102] ignoring non-vxlan v4Subnet(10.97.1.0/24) v6Subnet(::/0): type=host-gw
I0903 01:47:14.622724 1 vxlan_network.go:100] Received Subnet Event with VxLan: BackendType: host-gw, PublicIP: 172.16.0.231, PublicIPv6: (nil), BackendData: null, BackendV6Data: (nil)
W0903 01:47:14.622727 1 vxlan_network.go:102] ignoring non-vxlan v4Subnet(10.97.3.0/24) v6Subnet(::/0): type=host-gw
I0903 01:47:14.622737 1 subnet.go:152] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa610000, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xac1000fa, PublicIPv6:(*ip.IP6)(nil), BackendType:"host-gw", BackendData:json.RawMessage{0x6e, 0x75, 0x6c, 0x6c}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0903 01:47:14.622759 1 vxlan_network.go:100] Received Subnet Event with VxLan: BackendType: host-gw, PublicIP: 172.16.0.250, PublicIPv6: (nil), BackendData: null, BackendV6Data: (nil)
W0903 01:47:14.622764 1 vxlan_network.go:102] ignoring non-vxlan v4Subnet(10.97.0.0/24) v6Subnet(::/0): type=host-gw
I0903 01:47:14.722083 1 main.go:421] Waiting for all goroutines to exit
I0903 01:47:15.123247 1 iptables.go:372] bootstrap done
I0903 01:47:15.525861 1 iptables.go:372] bootstrap done

文件内容是对了,但是看日志内部显示节点 annotation 注解不对劲:

1
2
3
4
5
6
kube.go:636] List of node(172.16.0.202) annotations: \
map[string]string{"flannel.alpha.coreos.com/backend-data":"null", \
"flannel.alpha.coreos.com/backend-type":"host-gw", \
"flannel.alpha.coreos.com/kube-subnet-manager":"true", \
"flannel.alpha.coreos.com/public-ip":"172.16.0.202", \
"node.alpha.kubernetes.io/ttl":"0", "volumes.kubernetes.io/controller-managed-attach-detach":"true"}

显示的是 host-gw ,看下节点注解:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$ kubectl get node -o yaml | grep flannel
flannel.alpha.coreos.com/backend-data: "null"
flannel.alpha.coreos.com/backend-type: host-gw
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.16.0.104
- reg.xxx.lan:5000/xxx/flannel@sha256:13cddb14533a10394aa9436bd96a4c866a139b7ef01e71526aae013e724acca7
- flannel/flannel:v0.25.4
- reg.xxx.lan:5000/xxx/flannel:v0.25.4
- reg.xxx.lan:5000/xxx/flannel-cni-plugin@sha256:85aa4c338969e97b1ab751fdc2c167af228a241a224e2d0e5b81ca0f3e93e1fa
- flannel/flannel-cni-plugin:v1.4.1-flannel1
- reg.xxx.lan:5000/xxx/flannel-cni-plugin:v1.4.1
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"f6:12:96:a0:5a:14"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.16.0.202
- reg.xxx.lan:5000/xxx/flannel@sha256:13cddb14533a10394aa9436bd96a4c866a139b7ef01e71526aae013e724acca7
- flannel/flannel:v0.25.4
- reg.xxx.lan:5000/xxx/flannel:v0.25.4
- reg.xxx.lan:5000/xxx/flannel-cni-plugin@sha256:85aa4c338969e97b1ab751fdc2c167af228a241a224e2d0e5b81ca0f3e93e1fa
- flannel/flannel-cni-plugin:v1.4.1-flannel1
- reg.xxx.lan:5000/xxx/flannel-cni-plugin:v1.4.1
flannel.alpha.coreos.com/backend-data: "null"
flannel.alpha.coreos.com/backend-type: host-gw
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.16.0.231
- reg.xxx.lan:5000/xxx/flannel@sha256:13cddb14533a10394aa9436bd96a4c866a139b7ef01e71526aae013e724acca7
- flannel/flannel:v0.25.4
- reg.xxx.lan:5000/xxx/flannel:v0.25.4
- reg.xxx.lan:5000/xxx/flannel-cni-plugin@sha256:85aa4c338969e97b1ab751fdc2c167af228a241a224e2d0e5b81ca0f3e93e1fa
- flannel/flannel-cni-plugin:v1.4.1-flannel1
- reg.xxx.lan:5000/xxx/flannel-cni-plugin:v1.4.1
flannel.alpha.coreos.com/backend-data: "null"
flannel.alpha.coreos.com/backend-type: host-gw
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.16.0.250

怎么两个 host-gw 模式,既然删除 pod 无用,就 edit 去掉 flannel.alpha.coreos.com/backend-type 注解了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
[root@vm172-16-0-202 ~]# kubectl edit node 172.16.0.202
[root@vm172-16-0-202 ~]# kubectl -n kube-system delete pod kube-flannel-ds-2cxkf kube-flannel-ds-hml7r
pod "kube-flannel-ds-2cxkf" deleted
pod "kube-flannel-ds-hml7r" deleted
[root@vm172-16-0-202 ~]# kubectl get no -o yaml | grep flannel
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"de:87:3d:b7:74:fc"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.16.0.104
- reg.xxx.lan:5000/xxx/flannel@sha256:13cddb14533a10394aa9436bd96a4c866a139b7ef01e71526aae013e724acca7
- flannel/flannel:v0.25.4
- reg.xxx.lan:5000/xxx/flannel:v0.25.4
- reg.xxx.lan:5000/xxx/flannel-cni-plugin@sha256:85aa4c338969e97b1ab751fdc2c167af228a241a224e2d0e5b81ca0f3e93e1fa
- flannel/flannel-cni-plugin:v1.4.1-flannel1
- reg.xxx.lan:5000/xxx/flannel-cni-plugin:v1.4.1
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"f6:12:96:a0:5a:14"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.16.0.202
- reg.xxx.lan:5000/xxx/flannel@sha256:13cddb14533a10394aa9436bd96a4c866a139b7ef01e71526aae013e724acca7
- flannel/flannel:v0.25.4
- reg.xxx.lan:5000/xxx/flannel:v0.25.4
- reg.xxx.lan:5000/xxx/flannel-cni-plugin@sha256:85aa4c338969e97b1ab751fdc2c167af228a241a224e2d0e5b81ca0f3e93e1fa
- flannel/flannel-cni-plugin:v1.4.1-flannel1
- reg.xxx.lan:5000/xxx/flannel-cni-plugin:v1.4.1
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"1e:24:24:5e:b8:84"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.16.0.231
- reg.xxx.lan:5000/xxx/flannel@sha256:13cddb14533a10394aa9436bd96a4c866a139b7ef01e71526aae013e724acca7
- flannel/flannel:v0.25.4
- reg.xxx.lan:5000/xxx/flannel:v0.25.4
- reg.xxx.lan:5000/xxx/flannel-cni-plugin@sha256:85aa4c338969e97b1ab751fdc2c167af228a241a224e2d0e5b81ca0f3e93e1fa
- flannel/flannel-cni-plugin:v1.4.1-flannel1
- reg.xxx.lan:5000/xxx/flannel-cni-plugin:v1.4.1
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"36:a5:28:4d:ec:e3"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 172.16.0.250

剩下几个 node 处理后:

1
2
3
4
5
6
7
8
9
10
11
12
[root@vm172-16-0-202 ~]# ip r s
default via 172.16.0.1 dev eth0 proto dhcp metric 100
10.97.0.0/24 via 10.97.0.0 dev flannel.1 onlink
10.97.1.0/24 via 10.97.1.0 dev flannel.1 onlink
10.97.2.0/24 via 172.16.0.250 dev eth0
10.97.2.0/24 dev cni0 proto kernel scope link src 10.97.2.1
10.97.3.0/24 via 10.97.3.0 dev flannel.1 onlink
10.185.0.0/16 dev docker0 proto kernel scope link src 10.185.0.1 linkdown
10.187.0.0/24 via 172.16.0.250 dev eth0
10.187.2.0/24 via 172.16.0.104 dev eth0
10.187.3.0/24 via 172.16.0.231 dev eth0
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.202 metric 100

重启后也没问题

1
2
3
4
5
6
7
8
[root@vm172-16-0-202 ~]# ip r
default via 172.16.0.1 dev eth0 proto dhcp metric 100
10.97.0.0/24 via 10.97.0.0 dev flannel.1 onlink
10.97.1.0/24 via 10.97.1.0 dev flannel.1 onlink
10.97.2.0/24 dev cni0 proto kernel scope link src 10.97.2.1
10.97.3.0/24 via 10.97.3.0 dev flannel.1 onlink
10.185.0.0/16 dev docker0 proto kernel scope link src 10.185.0.1 linkdown
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.202 metric 100

询问了是不是有人最开始部署改过模式了,说没有,奇怪了。

CATALOG
  1. 1. 由来
  2. 2. 排查
    1. 2.1. 路由不对
    2. 2.2. 模式错乱