docker login 和 pod 拉取镜像非常慢的排查
由来 现场客户单机测试环境,反馈 docker login 和 docker pull 和 pod 拉取镜像非常慢,现场大致查了下查不出来后我上去远程查了下。
过程 我们私有化都会部署一个镜像仓库的,镜像都推送到仓库上,现场说拉取镜像很慢和 docker login 很慢
环境信息 1 2 3 4 $ cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.8 (Maipo) $ uname -a Linux poc-xxxx 3.10.0-1127.el7.x86_64 #1 SMP Tue Feb 18 16:39:12 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
排查 上去看了下,login 非常慢:
1 docker login -u xxx -p xxxx reg.xxx.lan:5000
strace 看看:
1 2 3 4 5 6 7 8 $ strace docker login -u xxx -p xxxx reg.xxx.lan:5000 ... write(3, "HEAD /_ping HTTP/1.1\r\nHost: api."..., 92) = 92 futex(0xc000700148, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x1ac5448, FUTEX_WAIT_PRIVATE, 0, NULLWARNING! Using --password via the CLI is insecure. Use --password-stdin. ) = 0 futex(0x1ac5448, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0x1ac5448, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
发现卡在上面,但是 curl 发下 HEAD 请求正常:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 $ curl -v reg.xxx.lan:5000/_ping -I * About to connect() to reg.xxx.lan port 5000 (#0) * Trying 7.xx.x.125... * Connected to reg.xxx.lan (7.xx.x.125) port 5000 (#0) > GET /_ping HTTP/1.1 > User-Agent: curl/7.29.0 > Host: reg.xxx.lan:5000 > Accept: */* > < HTTP/1.1 404 Not Found < Content-Type: text/plain; charset=utf-8 < Docker-Distribution-Api-Version: registry/2.0 < X-Content-Type-Options: nosniff < Date: Wed, 08 Jan 2025 07:05:33 GMT < Content-Length: 19 < 404 page not found * Connection #0 to host reg.xxx.lan left intact
然后让客户准备第二台机器只安装同样的 docker 版本 ,上去 docker login 就很快,排除掉镜像仓库问题。看了下这块代码:
1 https://github.com/moby/moby/blob/6c523afaedcbb2e3e219dbf4d417efad5b9397b3/client/ping.go#L21
发现代码逻辑也没啥逻辑问题,先发 HEAD 请求有问题再尝试 GET 请求。机器上看了下也没安全软件啥的。最后尝试下 127 访问试试:
1 docker login -u xxx -p xxxx 127.0.0.1:5000
发现很快,感觉不会是 hosts 解析相关吧,tcpdump -nn -i ens224 port 53 -v | grep -A2 reg.xxx.lan
看了下果然发往外面了。查看下 /etc/nsswitch.conf
发现果然被修改了:
1 2 3 $ grep hosts /etc/nsswitch.conf # hosts: db files nisplus nis dns hosts: dns files myhostname
改回就正常了,询问客户没有人改,应该是他们制作虚机模板时候修改的。