hostPort 不通排查，以及挖掘问题根源

kubernetes hostPort iptables

字数统计: 1.4k阅读时长: 7 min

 2024/09/16 

记录最近碰到的一次 hostPort 不通排查的信息记录

由来

内部有服务通过 hostPort 暴漏的，之前每次出问题都是有人去清理相关，这次完整记录下我处理过程

过程

iptables 相关

只有一个服务 hostPort 暴漏的，测试反馈部署了后无法访问，查看了下 iptables 的 nat 表：

$ iptables -t nat -S CNI-HOSTPORT-DNAT
-N CNI-HOSTPORT-DNAT
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"cbr0\" id: \"fb7cdb0a4f8da673da0a9818ec9a1576f953a8e8decf87a081827c11e4aa7138\"" \
   -m multiport --dports 443,9001,80,9002,9003,19004,19003,50051,25,465,993,995,7010,7020,12321 -j CNI-DN-b24a4eb3a38dc5843c23a
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"cbr0\" id: \"fb7cdb0a4f8da673da0a9818ec9a1576f953a8e8decf87a081827c11e4aa7138\"" \
   -m multiport --dports 10001 -j CNI-DN-b24a4eb3a38dc5843c23a
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"cbr0\" id: \"0961bad27f1a52a28701b7038120b8188e7a78df21f57a08428f021ab3c071e2\"" \
   -m multiport --dports 443,9001,80,9002,9003,19004,19003,50051,25,465,993,995,7010,7020,12321 -j CNI-DN-f2390ee4e08c581b1ea73
-A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"cbr0\" id: \"0961bad27f1a52a28701b7038120b8188e7a78df21f57a08428f021ab3c071e2\"" \
   -m multiport --dports 10001 -j CNI-DN-f2390ee4e08c581b1ea73

排查相关

根据 cni 配置文件：

$ cat /etc/cni/net.d/10-flannel.conflist 
{
  "name": "cbr0",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}

使用的 cni-plugins 只有 flannel 和 portmap，portmap 已经是最新版本 1.5.1 了：

1
2
3

$ portmap --version
CNI portmap plugin v1.5.1
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0

集群和内核信息：

$ uname -a
Linux centos79 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ kubectl  get node -o wide
NAME          STATUS   ROLES         AGE     VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION           CONTAINER-RUNTIME
xx.xx.xx.37   Ready    master,node   4h10m   v1.27.16   xx.xx.xx.37   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   docker://26.1.4
xx.xx.xx.38   Ready    master,node   4h10m   v1.27.16   xx.xx.xx.38   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   docker://26.1.4
xx.xx.xx.26   Ready    master,node   4h10m   v1.27.16   xx.xx.xx.26   <none>        CentOS Linux 7 (Core)   3.10.0-1160.el7.x86_64   docker://26.1.4

大概看了下 cni-plugin 的 portmap 源码，添加和删除逻辑都没问题，删除的时候检测链存在否，存在就删除，不存在则跳过。是 container runtime 调用的 cni-plugins，按照前面 iptables 的注释里的容器 ID 查看下 cri-dockerd 日志：

1
2

$ journalctl -xe --no-pager -u cri-dockerd | grep fb7cdb0a4f8da
9月 14 11:39:49 centos79 cri-dockerd[9246]: {"cniVersion":"0.3.1","hairpinMode":true,"ipMasq":false,"ipam":{"ranges":[[{"subnet":"10.187.2.0/24"}]],"routes":[{"dst":"10.187.0.0/16"}],"type":"host-local"},"isDefaultGateway":true,"isGateway":true,"mtu":1450,"name":"cbr0","type":"bridge"}time="2024-09-14T11:39:49+08:00" level=info msg="Will attempt to re-write config file /data/kube/docker/containers/fb7cdb0a4f8da673da0a9818ec9a1576f953a8e8decf87a081827c11e4aa7138/resolv.conf as [nameserver 10.186.0.2 search default123.svc.cluster1.local. svc.cluster1.local. cluster1.local. options ndots:5]"

跳转的链名字 CNI-DN-XXX 看了下 portmap 源码是根据名字+ 容器ID sha512 生成的，手动计算如下：

1
2
3

$ echo -n 'cbr0fb7cdb0a4f8da673da0a9818ec9a1576f953a8e8decf87a081827c11e4aa7138' \
   | sha512sum | cut -c 1-21
b24a4eb3a38dc5843c23a

说明确实是这个容器。看了下 portmap 源码使用的 iptables 库封装的命令选项是有带 --wait 选项的，即使 iptables 数量多也没问题：

$ iptables -w -t nat -S | wc -l
1705
$ iptables -w -t nat -S | grep -Pv 'KUBE-SVC|KUBE-SEP' | wc -l
180

理论上发生的可能性是 cri-dockerd 清理掉老 Pod 的时候没清理，但是看代码是有清理的：

// https://github.com/Mirantis/cri-dockerd/blob/v0.3.14/core/sandbox_stop.go#L87-L92
func (ds *dockerService) StopPodSandbox(
   ...
	ready, ok := ds.getNetworkReady(podSandboxID)
	if !hostNetwork && (ready || !ok) {
		// Only tear down the pod network if we haven't done so already
		cID := config.BuildContainerID(runtimeName, podSandboxID)
		err := ds.network.TearDownPod(namespace, name, cID)

清理的时候会打日志，也存在：

1
2

$ journalctl -xe --no-pager -u docker | grep fb7cdb0a4f8da
9月 14 11:44:13 centos79 dockerd[2708]: time="2024-09-14T11:44:12.872235404+08:00" level=warning msg="cleaning up after shim disconnected" id=fb7cdb0a4f8da673da0a9818ec9a1576f953a8e8decf87a081827c11e4aa7138 namespace=moby

而查看部署日志，相关时间点 11:44:13 :

1	2024-09-14 11:44:02,249 - xxx INFO - 开始卸载docker...

询问了下测试人员，说这个时间点执行了卸载步骤，所以整个问题流程是:

执行卸载流程，会先kubectl delete 删掉所有资源，触发 pod 删除
kubelet 调用 cri-dockerd 清理下线相关容器，但是同步的我们卸载步骤会停止了 docker 和删了 docker 目录
cir-dockerd 清理过程中连不上 docker ，导致 cri-dockerd 清理容器的时候无法执行完整的 StopPodSandbox 流程清理掉 nat 表的 CNI-HOSTPORT-DNAT 链规则。
卸载过程会完成后再部署环境，最后 hostPort 老规则还在先匹配

避免就是卸载 docker 过程中，清理掉 nat 表的相关规则。