zhangguanzhang's Blog

k8s pod 没有 IP ,报错 failed to read pod IP from plugin/docker

字数统计: 1.2k阅读时长: 6 min
2022/07/12

由来

有事回到工位上,还没坐下同事就过来喊我,让我帮忙看个客户的生产环境问题,大致就是客户为了搞安全,开了 ipset,然后发现业务受影响了。

过程

1
2
3
4
5
6
$ kubectl get pod -o wide | grep -v Runn
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
xxxx-privilege-r97z4 0/1 CrashLoopBackOff 51 107m <none> 10.x.xx.xx <none> <none>
etcd1-10.x.xx.xx 0/1 CrashLoopBackOff 61 44m <none> 10.x.xx.xx <none> <none>
promtail-7jk8j 0/1 CrashLoopBackOff 51 107m <none> 10.x.xx.xx <none> <none>
zookeeper-1-10.x.xx.xx 0/1 CrashLoopBackOff 83 107m <none> 10.x.xx.xx <none> <none>

看下 etcd 日志,谁让 etcd 是 golang 写的,golang 服务的日志比 java 的日志更清晰 😉

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ docker ps -a |grep etcd | head -n 3
a01d26e5668a mirrorgooglecontainers/pause-amd64:3.1 "/pause" 1 second ago Created k8s_POD_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_3182
2434b43b2026 mirrorgooglecontainers/pause-amd64:3.1 "/pause" 3 seconds ago Exited (0) 1 second ago k8s_POD_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_3181
8e434211ee24 b5d94f31df3a "/app/etcd --name=et…" 5 seconds ago Exited (1) 4 seconds ago k8s_etcd1_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_62
$ docker logs 8e43
2022-07-12 09:11:45.659987 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP_PORT=2379
2022-07-12 09:11:45.660118 W | pkg/flags: unrecognized environment variable ETCD_SERVICE_PORT_ETCD_CLIENT_2379=2379
2022-07-12 09:11:45.660149 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP_ADDR=xxx.xx.145.219
2022-07-12 09:11:45.660161 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP_PROTO=tcp
2022-07-12 09:11:45.660180 W | pkg/flags: unrecognized environment variable ETCD_SERVICE_HOST=xxx.xx.145.219
2022-07-12 09:11:45.660202 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP=tcp://xxx.xx.145.219:2379
2022-07-12 09:11:45.660227 W | pkg/flags: unrecognized environment variable ETCD_SERVICE_PORT=2379
2022-07-12 09:11:45.660322 W | pkg/flags: unrecognized environment variable ETCD_PORT=tcp://xxx.xx.145.219:2379
2022-07-12 09:11:45.660376 E | etcdmain: error verifying flags, expected IP in URL for binding (http://:2380). See 'etcd --help'.

日志报错没有 IP,看了下 flannel 的 pod 都是正常运行的,看下 kubelet 日志:

1
2
3
4
5
6
7
8
9
10
$ journalctl -xe --no-pager -u kubelet
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: with error: exit status 1
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: I0712 17:13:31.384403 761 kubelet.go:1933] SyncLoop (PLEG): "xxxx-privilege-r97z4_default(3b3e512c-61e4-4e0f-ae19-217f9d23bdce)", event: &pleg.PodLifecycleEvent{ID:"3b3e512c-61e4-4e0f-ae19-217f9d23bdce", Type:"ContainerStarted", Data:"9584c21ffd29fcc723f9855b3235e652058c8f8dd66bcc1539fd8d213c059482"}
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: I0712 17:13:31.385046 761 kuberuntime_manager.go:434] Sandbox for pod "xxxx-privilege-r97z4_default(3b3e512c-61e4-4e0f-ae19-217f9d23bdce)" has no IP address. Need to start a new one
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: W0712 17:13:31.398508 761 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "zookeeper-1-10.x.xx.xx_default": Unexpected command output nsenter: failed to execute ip: No such file or directory
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: with error: exit status 1
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: I0712 17:13:31.418775 761 kubelet.go:1933] SyncLoop (PLEG): "zookeeper-1-10.x.xx.xx_default(9ede2a2352bf8cd0cf86767166391721)", event: &pleg.PodLifecycleEvent{ID:"9ede2a2352bf8cd0cf86767166391721", Type:"ContainerStarted", Data:"1f7169d3e335fd0e79c717290d8ac42de5d089dae5fa4647a13ebe53b8611c88"}
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: I0712 17:13:31.419167 761 kuberuntime_manager.go:434] Sandbox for pod "zookeeper-1-10.x.xx.xx_default(9ede2a2352bf8cd0cf86767166391721)" has no IP address. Need to start a new one
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: W0712 17:13:31.430948 761 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "xxxx-gateway-7dd6cdc85d-6hsz7_default": Unexpected command output nsenter: failed to execute ip: No such file or directory
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: with error: exit status 1

看报错意思是执行 ip netns 报错,看了下,果然没 ip 命令了,系统是 centos 7.9,需要安装 iproute 包,在 http://www.rpmfind.net/ 上下了个 centos7 的 rpm 后让人传上去安装后就好了。

1
2
3
4
5
6
$ docker ps -a |grep etcd  | head -n 5
52b7b3a6a5b2 b5d94f31df3a "/app/etcd --name=et…" 30 seconds ago Up 29 seconds k8s_etcd1_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_68
93f00c95f7ba mirrorgooglecontainers/pause-amd64:3.1 "/pause" 33 seconds ago Up 32 seconds k8s_POD_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_3378
fb75458a53b8 mirrorgooglecontainers/pause-amd64:3.1 "/pause" 36 seconds ago Exited (0) 34 seconds ago k8s_POD_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_3377
e291ec9ed471 mirrorgooglecontainers/pause-amd64:3.1 "/pause" 39 seconds ago Exited (0) 37 seconds ago k8s_POD_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_3376
e6eeed41ef31 b5d94f31df3a "/app/etcd --name=et…" 3 minutes ago Exited (1) 3 minutes ago k8s_etcd1_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_67

按理说不应该有人去卸载它,是不是有其他依赖给它卸载了,看下日志果然:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ grep -C 20  iprout /var/log/yum.log
Jul 04 22:43:24 Erased: plymouth-0.8.9-0.34.20140113.el7.centos.x86_64
Jul 04 22:43:24 Erased: plymouth-scripts-0.8.9-0.34.20140113.el7.centos.x86_64
Jul 04 22:43:24 Erased: iptables-services-1.4.21-35.el7.x86_64
Jul 04 22:43:25 Erased: kbd-1.15.5-15.el7.x86_64
Jul 04 22:43:25 Erased: kexec-tools-2.0.15-51.el7_9.3.x86_64
Jul 04 22:43:25 Erased: dracut-network-033-572.el7.x86_64
Jul 04 22:43:25 Erased: 12:dhclient-4.2.5-82.el7.centos.x86_64
Jul 04 22:43:26 Erased: initscripts-9.49.53-1.el7_9.1.x86_64
Jul 04 22:43:27 Erased: open-vm-tools-11.0.5-3.el7_9.3.x86_64
Jul 04 22:43:27 Erased: iproute-4.11.0-30.el7.x86_64
Jul 04 22:43:27 Erased: iptables-1.4.21-35.el7.x86_64
Jul 04 22:44:33 Installed: iptables-1.4.21-35.el7.x86_64
$ uptime -s
2022-07-12 13:37:19

上面的 -C 20 就这几行,说明日志文件内容就这么点,看就是客户之前自己去安装 iptables 的那个一次性导入规则服务导致的,客户自己的锅

CATALOG
  1. 1. 由来
  2. 2. 过程