zhangguanzhang's Blog

hostPort 访问进来的源 IP 是 cni0 地址的排查

字数统计: 2.9k阅读时长: 17 min
2024/06/17

hostPort 访问进来的源 IP 是 cni0 地址的问题排查

由来

内部网关的操作日志显示的来源 IP 都是 cni0 地址,网关开发和测试反馈之前是好的,还给了一个正常环境。

排查过程

基本信息

我们网关是 hostPort 80 暴漏的,只关注这块,其他的不需要关注,两套环境信息:

1
2
3
4
$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
...
10.xx.x6.112 Ready master,node 18d v1.27.12 10.xx.x6.112 <none> Red Hat Enterprise Linux Server 7.9 (Maipo) 3.10.0-1160.el7.x86_64 docker://25.0.5

上面是不正常的,下面是正常环境的:

1
2
3
$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
10.xx.x7.111 Ready master,node 9d v1.27.12 10.xx.x7.111 <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://25.0.5

两套环境上的 docker k8s 以及 cni-plugins 版本和 md5sum 看了下都是一样的,我在外部 10.2xx.xx.30 上 curl 两个环境 ip:80,俩机器上用 conntrack 查看:

1
2
3
4
5
$ conntrack -L |& grep 10.2xx.xx.30
tcp 6 9 CLOSE src=10.2xx.xx.30 dst=10.xx.x6.112 sport=41388 dport=80 src=10.18x.2.38 dst=10.18x.2.1 sport=80 dport=55471 [ASSURED] mark=0 use=1

$ conntrack -L |& grep 10.2xx.xx.30
tcp 6 8 CLOSE src=10.2xx.xx.30 dst=10.xx.x7.111 sport=55658 dport=80 src=10.18x.0.36 dst=10.2xx.xx.30 sport=80 dport=55658 [ASSURED] mark=0 use=1

第一个右侧的 dst IP 是错误的,网关层面打印日志 tcp source addr 和这个 IP 是一致的。涉及到地址转换自然是查看 nat 表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$ iptables -t nat -S | grep -Ev 'KUBE-(SVC|SEP)' | grep MASQ
-N CNI-HOSTPORT-MASQ
-N KUBE-MARK-MASQ
-A POSTROUTING -m comment --comment "CNI portfwd requiring masquerade" -j CNI-HOSTPORT-MASQ
-A POSTROUTING -s 10.185.0.0/16 ! -o docker0 -j MASQUERADE
-A CNI-HOSTPORT-MASQ -m mark --mark 0x2000/0x2000 -j MASQUERADE
-A FLANNEL-POSTRTG -s 10.18x.0.0/16 ! -d 224.0.0.0/4 -m comment --comment "flanneld masq" -j MASQUERADE
-A FLANNEL-POSTRTG ! -s 10.18x.0.0/16 -d 10.18x.0.0/16 -m comment --comment "flanneld masq" -j MASQUERADE
-A FLANNEL-POSTRTG -s 10.18x.0.0/16 ! -d 224.0.0.0/4 -m comment --comment "flanneld masq" -j MASQUERADE
-A FLANNEL-POSTRTG ! -s 10.18x.0.0/16 -d 10.18x.0.0/16 -m comment --comment "flanneld masq" -j MASQUERADE
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE

$ iptables -t nat -S | grep -Ev 'KUBE-(SVC|SEP|EXT)' | grep MASQ
-N CNI-HOSTPORT-MASQ
-N KUBE-MARK-MASQ
-A POSTROUTING -s 10.185.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -m comment --comment "CNI portfwd requiring masquerade" -j CNI-HOSTPORT-MASQ
-A CNI-HOSTPORT-MASQ -m mark --mark 0x2000/0x2000 -j MASQUERADE
-A FLANNEL-POSTRTG -s 10.18x.0.0/16 ! -d 224.0.0.0/4 -m comment --comment "flanneld masq" -j MASQUERADE
-A FLANNEL-POSTRTG ! -s 10.18x.0.0/16 -d 10.18x.0.0/16 -m comment --comment "flanneld masq" -j MASQUERADE
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE

对比发现故障环境的 FLANNEL-POSTRTG 重复添加了,查看下 flanneld 俩机器 docker 镜像使用是一样的,查看该故障机器的 flanneld 日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# docker logs 3743
I0605 10:08:06.702126 1 main.go:209] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true}
W0605 10:08:06.702229 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0605 10:08:06.898029 1 kube.go:139] Waiting 10m0s for node controller to sync
I0605 10:08:06.898109 1 kube.go:461] Starting kube subnet manager
I0605 10:08:06.905502 1 kube.go:482] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.18x.0.0/24]
I0605 10:08:06.905578 1 kube.go:482] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.18x.1.0/24]
I0605 10:08:06.905587 1 kube.go:482] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.18x.2.0/24]
I0605 10:08:07.898972 1 kube.go:146] Node controller sync successful
I0605 10:08:07.899005 1 main.go:229] Created subnet manager: Kubernetes Subnet Manager - 10.xx.x6.112
I0605 10:08:07.899011 1 main.go:232] Installing signal handlers
I0605 10:08:07.899099 1 main.go:452] Found network config - Backend type: vxlan
I0605 10:08:07.899119 1 match.go:210] Determining IP address of default interface
I0605 10:08:07.899406 1 match.go:263] Using interface with name ens192 and address 10.xx.x6.112
I0605 10:08:07.899429 1 match.go:285] Defaulting external address to interface address (10.xx.x6.112)
I0605 10:08:07.899499 1 vxlan.go:141] VXLAN config: VNI=1 Port=8475 GBP=false Learning=false DirectRouting=false
I0605 10:08:07.951679 1 kube.go:627] List of node(10.xx.x6.112) annotations: map[string]string{"flannel.alpha.coreos.com/backend-data":"{\"VNI\":1,\"VtepMAC\":\"b2:be:c3:c7:1b:c0\"}", "flannel.alpha.coreos.com/backend-type":"vxlan", "flannel.alpha.coreos.com/kube-subnet-manager":"true", "flannel.alpha.coreos.com/public-ip":"10.xx.x6.112", "node.alpha.kubernetes.io/ttl":"0", "volumes.kubernetes.io/controller-managed-attach-detach":"true"}
I0605 10:08:07.951761 1 vxlan.go:155] Setup flannel.1 mac address to b2:be:c3:c7:1b:c0 when flannel restarts
W0605 10:08:08.358170 1 main.go:505] no subnet found for key: FLANNEL_SUBNET in file: /run/flannel/subnet.env
W0605 10:08:08.358192 1 main.go:540] no subnet found for key: FLANNEL_IPV6_SUBNET in file: /run/flannel/subnet.env
I0605 10:08:08.358208 1 iptables.go:65] Current network or subnet (10.18x.0.0/16, 10.18x.2.0/24) is not equal to previous one (0.0.0.0/0, 0.0.0.0/0), trying to recycle old iptables rules
I0605 10:08:09.407576 1 iptables.go:75] Setting up masking rules
I0605 10:08:09.496912 1 iptables.go:214] Changing default FORWARD chain policy to ACCEPT
I0605 10:08:09.497996 1 iptables.go:373] generated 7 rules
I0605 10:08:09.498948 1 iptables.go:373] generated 3 rules
I0605 10:08:09.498957 1 main.go:396] Wrote subnet file to /run/flannel/subnet.env
I0605 10:08:09.499017 1 main.go:400] Running backend.
I0605 10:08:09.499138 1 vxlan_network.go:65] watching for new subnet leases
I0605 10:08:09.499200 1 subnet.go:160] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xabb0000, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xa0d0431, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x31, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x64, 0x36, 0x3a, 0x36, 0x63, 0x3a, 0x33, 0x34, 0x3a, 0x33, 0x34, 0x3a, 0x33, 0x31, 0x3a, 0x36, 0x30, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0605 10:08:09.499324 1 subnet.go:160] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xabb0100, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xa0d0434, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x31, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x34, 0x65, 0x3a, 0x36, 0x65, 0x3a, 0x32, 0x33, 0x3a, 0x32, 0x36, 0x3a, 0x64, 0x31, 0x3a, 0x33, 0x36, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0605 10:08:09.695633 1 main.go:421] Waiting for all goroutines to exit
I0605 10:08:09.700710 1 iptables.go:366] bootstrap done
I0605 10:08:09.898448 1 iptables.go:366] bootstrap done
I0611 17:53:28.697794 1 iptables.go:504] Some iptables rules are missing; deleting and recreating rules
I0611 17:53:29.496576 1 iptables.go:366] bootstrap done
E0611 18:40:00.096822 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0611 18:40:00.981247 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0611 19:10:06.900507 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0611 19:10:07.696829 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0611 19:10:11.904013 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0611 19:10:12.699112 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0611 19:10:16.906109 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0611 19:10:17.700350 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0611 19:10:21.907087 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0611 19:10:22.701113 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0611 23:26:26.010120 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 01:54:09.295176 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 01:55:10.203362 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:04:14.795485 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:05:44.800163 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:14:16.207339 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:14:16.312867 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:16:04.703252 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:18:05.999282 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:18:34.006303 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:18:39.009225 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:18:41.396200 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:18:44.011983 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:18:46.397122 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:18:49.014794 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:18:51.398528 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:18:54.016252 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:18:56.400014 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:18:59.019059 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:19:01.401163 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:19:04.021511 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:19:20.302554 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:19:59.906583 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:21:29.195932 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory
E0612 02:22:49.310222 1 iptables.go:403] Failed to ensure iptables rules: error checking rule existence: failed to check rule existence: fork/exec /sbin/iptables: cannot allocate memory

报错 iptables 无法分配内存,重启下 flannel,然后 curl 后看 conntrack 条目正常了,添加下重复规则测试也正常:

1
2
iptables -w -t nat -A FLANNEL-POSTRTG -s 10.18x.0.0/16 ! -d 224.0.0.0/4 -m comment --comment "flanneld masq" -j MASQUERADE
iptables -w -t nat -A FLANNEL-POSTRTG ! -s 10.18x.0.0/16 -d 10.18x.0.0/16 -m comment --comment "flanneld masq" -j MASQUERADE

反馈给网关同事说好了,他说还是有问题的 IP,等 conntrack 老化后就正常了。我去看了下主机监控,E0611 18:40:00.096822 内存和负载以及流量都正常,从报错来讲看着像是物理内存,但是其他服务没报错看着像是 kernel memory,但是系统日志里也没找到异常,也可能是 iptables bug。

CATALOG
  1. 1. 由来
  2. 2. 排查过程
    1. 2.1. 基本信息