zhangguanzhang's Blog

1.15 kubelet 在 nodefs 容量富裕下循环 reclaim ephemeral-storage

字数统计: 1.7k阅读时长: 9 min
2021/10/29

故障

现场 k8s node 很多 pod 都被硬性驱逐显示 Evicted ,现场人员查看分区容量和 inode 都正常,但是一直 reclaim ephemeral-storage

处理

环境信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
$ uname -a
Linux xxx-2 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/os-release
CentOS Linux release 7.4.1708 (Core)
$ kubectl version -o json
{
"clientVersion": {
"major": "1",
"minor": "15",
"gitVersion": "v1.15.5",
"gitCommit": "20c265fef0741dd71a66480e35bd69f18351daea",
"gitTreeState": "clean",
"buildDate": "2019-10-15T19:16:51Z",
"goVersion": "go1.12.10",
"compiler": "gc",
"platform": "linux/amd64"
},
"serverVersion": {
"major": "1",
"minor": "15",
"gitVersion": "v1.15.5",
"gitCommit": "20c265fef0741dd71a66480e35bd69f18351daea",
"gitTreeState": "clean",
"buildDate": "2019-10-15T19:07:57Z",
"goVersion": "go1.12.10",
"compiler": "gc",
"platform": "linux/amd64"
}
}
$ docker info
Containers: 5
Running: 4
Paused: 0
Stopped: 1
Images: 40
Server Version: 18.09.3
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e
runc version: 6635b4f0c6af3810594d2770f662f34ddc15b40d
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-693.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 62.91GiB
Name: SCJY-2
ID: XZ33:PHUQ:U2CI:7PXH:SYFG:Y6LK:3K3U:XXM6:QJWP:U3B3:MW4M:XPJS
Docker Root Dir: /data/kube/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
reg.xxx.lan:5000
treg.yun.xxx.cn
127.0.0.0/8
Registry Mirrors:
https://registry.docker-cn.com/
https://docker.mirrors.ustc.edu.cn/
Live Restore Enabled: false
Product License: Community Engine

过程

向日葵远程上去看了下,根分区容量都是正常的,inode 也是。看了下 uptime -s 重启过,现场说重启过还是没用。重启 kubelet 的话,看了下还是一直 reclaim ephemeral-storage

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
$ du -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rootvg-lvroot 30G 5.2G 25G 18% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 160K 32G 1% /dev/shm
tmpfs 32G 26M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sdb 600G 36G 565G 6% /data
/dev/sda1 1014M 160M 855M 16% /boot
/dev/mapper/rootvg-lvopt 10G 33M 10G 1% /opt
/dev/mapper/rootvg-lvhome 1014M 39M 976M 4% /home
/dev/mapper/rootvg-lvvar 2.0G 1.2G 888M 57% /var
overlay 600G 36G 565G 6% /data/kube/docker/overlay2/788ee4620da0a3f76ef5f4b24755a68de0e66c8f2425d8332d5a792116d7659f/merged
overlay 600G 36G 565G 6% /data/kube/docker/overlay2/d2b5f08e9873f5c9365aaf57eeca492734631a3842ccb2f379aa89998b0c7304/merged
overlay 600G 36G 565G 6% /data/kube/docker/overlay2/c4793b6c3f774cc960ef23e18b61405040698be698306ee993d4d501bdcf485a/merged
overlay 600G 36G 565G 6% /data/kube/docker/overlay2/b5a0fc544935db77c92bd978db9c1c7018e5e09bba9d2bf53bd300e96c656cec/merged
shm 64M 0 64M 0% /data/kube/docker/containers/ad86ab9b01e1ce0d62e1f98249274d9bfe75eca6efd8ce0e8f1c591d5570d75f/mounts/shm
shm 64M 0 64M 0% /data/kube/docker/containers/e3ebeac9a82264869429f44ea6834bcbc94b79013621490c071ef002b4b8e90e/mounts/shm
shm 64M 0 64M 0% /data/kube/docker/containers/a917bd3b8006198a58900efb5c82c6e162cfc4e732c7e588eaadfb59294ea22b/mounts/shm
shm 64M 0 64M 0% /data/kube/docker/containers/aa52df1894ad495f4f269d77ddd90954fdc7bbd0fbf25d9d4aa0674a76ff6a6c/mounts/shm
tmpfs 6.3G 12K 6.3G 1% /run/user/42
tmpfs 6.3G 0 6.3G 0% /run/user/1003
tmpfs 6.3G 0 6.3G 0% /run/user/1000

$ dh -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/rootvg-lvroot 15726592 184878 15541714 2% /
devtmpfs 8242230 527 8241703 1% /dev
tmpfs 8246150 41 8246109 1% /dev/shm
tmpfs 8246150 735 8245415 1% /run
tmpfs 8246150 16 8246134 1% /sys/fs/cgroup
/dev/sdb 314572800 303816 314268984 1% /data
/dev/sda1 524288 327 523961 1% /boot
/dev/mapper/rootvg-lvopt 5242880 7 5242873 1% /opt
/dev/mapper/rootvg-lvhome 524288 397 523891 1% /home
/dev/mapper/rootvg-lvvar 1048576 10179 1038397 1% /var
overlay 314572800 303816 314268984 1% /data/kube/docker/overlay2/788ee4620da0a3f76ef5f4b24755a68de0e66c8f2425d8332d5a792116d7659f/merged
overlay 314572800 303816 314268984 1% /data/kube/docker/overlay2/d2b5f08e9873f5c9365aaf57eeca492734631a3842ccb2f379aa89998b0c7304/merged
overlay 314572800 303816 314268984 1% /data/kube/docker/overlay2/c4793b6c3f774cc960ef23e18b61405040698be698306ee993d4d501bdcf485a/merged
overlay 314572800 303816 314268984 1% /data/kube/docker/overlay2/b5a0fc544935db77c92bd978db9c1c7018e5e09bba9d2bf53bd300e96c656cec/merged
shm 8246150 1 8246149 1% /data/kube/docker/containers/ad86ab9b01e1ce0d62e1f98249274d9bfe75eca6efd8ce0e8f1c591d5570d75f/mounts/shm
shm 8246150 1 8246149 1% /data/kube/docker/containers/e3ebeac9a82264869429f44ea6834bcbc94b79013621490c071ef002b4b8e90e/mounts/shm
shm 8246150 1 8246149 1% /data/kube/docker/containers/a917bd3b8006198a58900efb5c82c6e162cfc4e732c7e588eaadfb59294ea22b/mounts/shm
shm 8246150 1 8246149 1% /data/kube/docker/containers/aa52df1894ad495f4f269d77ddd90954fdc7bbd0fbf25d9d4aa0674a76ff6a6c/mounts/shm
tmpfs 8246150 9 8246141 1% /run/user/42
tmpfs 8246150 1 8246149 1% /run/user/1003
tmpfs 8246150 1 8246149 1% /run/user/1000

$ kubectl describe node xx.xx.112.135
...
Capacity:
cpu: 32
ephemeral-storage: 2038Mi
hugepages-2Mi: 0
memory: 65969200Ki
pods: 110
Allocatable:
cpu: 31800m
ephemeral-storage: 1014Mi
hugepages-2Mi: 0
memory: 65469200Ki
pods: 110
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning EvictionThresholdMet 3m57s (x1434 over 4h3m) kubelet, xx.xx.112.135 Attempting to reclaim ephemeral-storage
Normal Starting 37s kubelet, xx.xx.112.135 Starting kubelet.
Normal NodeHasSufficientMemory 37s kubelet, xx.xx.112.135 Node xx.xx.112.135 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 37s (x2 over 37s) kubelet, xx.xx.112.135 Node xx.xx.112.135 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 37s kubelet, xx.xx.112.135 Node xx.xx.112.135 status is now: NodeHasSufficientPID
Normal NodeNotReady 37s kubelet, xx.xx.112.135 Node xx.xx.112.135 status is now: NodeNotReady
Normal NodeAllocatableEnforced 37s kubelet, xx.xx.112.135 Updated Node Allocatable limit across pods
Normal NodeReady 37s kubelet, xx.xx.112.135 Node xx.xx.112.135 status is now: NodeReady
Normal NodeHasDiskPressure 27s kubelet, xx.xx.112.135 Node xx.xx.112.135 status is now: NodeHasDiskPressure
Warning EvictionThresholdMet 7s (x4 over 37s) kubelet, xx.xx.112.135 Attempting to reclaim ephemeral-storage

看了一会儿后发现上面的 ephemeral-storage 不对,Capacity 居然是 2038Mi

源码的一些探索

本地开发环境起了下 kubelet 调试了下,一些信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
./build/run.sh make kubelet GOFLAGS="-v -tags=nokmem" GOGCFLAGS="all=-N -l"  KUBE_BUILD_PLATFORMS=linux/amd64

cp _output/dockerized/bin/linux/amd64/kubelet .

dlv exec --check-go-version=false ./kubelet -- --cgroup-driver=systemd

# 推荐下面两个断点
vendor/github.com/google/cadvisor/container/docker/handler.go#L421

vendor/github.com/google/cadvisor/container/docker/handler.go:364

724: func (self *manager) GetFsInfo(label string) ([]v2.FsInfo, error) {
=> 725: var empty time.Time
726: // Get latest data from filesystems hanging off root container.
727: stats, err := self.memoryCache.RecentStats("/", empty, empty, 1)
728: if err != nil {
729: return nil, err
730: }
(dlv) so
> k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager.(*manager).getFsInfoByDeviceName() _output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager/manager.go:1311 (PC: 0x1fc7180)
Values returned:
~r1: []k8s.io/kubernetes/vendor/github.com/google/cadvisor/info/v2.FsInfo len: 2, cap: 2, [
{
Timestamp: (*time.Time)(0xc0002262d0),
Device: "/dev/sda1",
Mountpoint: "/",
Capacity: 75150372864,
Available: 36613033984,
Usage: 38537338880,
Labels: []string len: 2, cap: 2, [
"docker-images",
"root",
],
Inodes: *36699584,
InodesFree: *35850609,},
{
Timestamp: (*time.Time)(0xc000226348),
Device: "tmpfs",
Mountpoint: "/dev/shm",
Capacity: 1986203648,
Available: 1986203648,
Usage: 0,
Labels: []string len: 0, cap: 0, [],
Inodes: *484913,
InodesFree: *484912,},
]
~r2: error nil

容量这部分我现场通过特性 --feature-gates=LocalStorageCapacityIsolation=false 后删掉 node restart 后 describe 看不到 ephemeral-storage 了,但是还是问题还在,看了下源码,这个容量大小是 vendor/github.com/google/cadvisor/container/docker 下从 docker 获取的,嵌套的 interface 太多了,查看麻烦。现场是已经重启过机器了,docker 我重启和查看日志也没啥有用的地方。

最终解决

ephemeral-storage 这个 limit 是 1.15 alpha 的,暂时不想折腾了。 尝试换下 kubelet 的 root 目录。

1
2
3
4
5
6
7
8
9
$ systemctl cat kubelet
# /etc/systemd/system/kubelet.service
[Unit]
...
[Service]
WorkingDirectory=/var/lib/kubelet
ExecStart=/data/kube/bin/kubelet \
...

主要修改 WorkingDirectory 和给 kubelet 增加参数 --root-dir 以及 --docker-root ,现场 /data 是单独分区的,切到 /data/kube/kubelet 下,--docker-root 则是 docker 的 data-root:

1
2
3
4
5
6
$ vi /etc/systemd/system/kubelet.service
...
WorkingDirectory=/data/kube/kubelet
ExecStart=/data/kube/bin/kubelet \
--root-dir=/data/kube/kubelet \
--docker-root=/data/kube/docker \
1
2
systemctl daemon-reload
systemctl restart kubelet

问题解决。后面才发现 /var 是单独分区的,客户现场动过分区表,之前是 /var 没有单独分区,后面他们创建了个 lv 并写在 /etc/fstab 里,并没有挂载和重启。一周前他们重启了下,而且有一些服务在 /var/log 输出日志,所以造成了这次故障。

参考

CATALOG
  1. 1. 故障
  2. 2. 处理
    1. 2.1. 环境信息
    2. 2.2. 过程
    3. 2.3. 源码的一些探索
    4. 2.4. 最终解决
  3. 3. 参考