zhangguanzhang's Blog

磁盘问题宕机后 docker 的 containerd 无法启动

字数统计: 2.1k阅读时长: 12 min
2022/10/09

机器磁盘出问题后,重启后 docker 无法启动,和之前的 docker-panic 一样解决后,发现 containerd 无法启动

过程

包管理安装的 docker,所以会有一个 systemd 纳管的 containerd 进程,解决了 docker 启动 panic 后,containerd 也是启动后就 panic 了,前台调试下 containerd

1
2
3
4
5
6
$ systemctl cat containerd
...
[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/containerd
...

查看下 containerd 的 log level 参数

1
2
3
4
5
$ containerd --help
...
...
--log-level value, -l value set the logging level [trace, debug, info, warn, error, fatal, panic]
...

前台运行 debug level

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
$ systemctl stop containerd
$ containerd --log-level debug
INFO[2022-10-09T18:08:06.383235215+08:00] loading plugin "io.containerd.snapshotter.v1.aufs"... type=io.containerd.snapshotter.v1
INFO[2022-10-09T18:08:06.385029947+08:00] skip loading plugin "io.containerd.snapshotter.v1.aufs"... error="aufs is not supported (modprobe aufs failed: exit status 1 \"modprobe: FATAL: Module aufs not found.\\n\"): skip plugin" type=io.containerd.snapshotter.v1
INFO[2022-10-09T18:08:06.385109752+08:00] loading plugin "io.containerd.snapshotter.v1.devmapper"... type=io.containerd.snapshotter.v1
WARN[2022-10-09T18:08:06.385166683+08:00] failed to load plugin io.containerd.snapshotter.v1.devmapper error="devmapper not configured"
INFO[2022-10-09T18:08:06.385195079+08:00] loading plugin "io.containerd.snapshotter.v1.native"... type=io.containerd.snapshotter.v1
INFO[2022-10-09T18:08:06.385254847+08:00] loading plugin "io.containerd.snapshotter.v1.overlayfs"... type=io.containerd.snapshotter.v1
INFO[2022-10-09T18:08:06.385399412+08:00] loading plugin "io.containerd.snapshotter.v1.zfs"... type=io.containerd.snapshotter.v1
INFO[2022-10-09T18:08:06.385742371+08:00] skip loading plugin "io.containerd.snapshotter.v1.zfs"... error="path /var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
INFO[2022-10-09T18:08:06.385776799+08:00] loading plugin "io.containerd.metadata.v1.bolt"... type=io.containerd.metadata.v1
WARN[2022-10-09T18:08:06.385799868+08:00] could not use snapshotter devmapper in metadata plugin error="devmapper not configured"
INFO[2022-10-09T18:08:06.385811466+08:00] metadata content store policy set policy=shared
INFO[2022-10-09T18:08:06.385937321+08:00] loading plugin "io.containerd.differ.v1.walking"... type=io.containerd.differ.v1
INFO[2022-10-09T18:08:06.385958141+08:00] loading plugin "io.containerd.gc.v1.scheduler"... type=io.containerd.gc.v1
INFO[2022-10-09T18:08:06.386007407+08:00] loading plugin "io.containerd.service.v1.introspection-service"... type=io.containerd.service.v1
INFO[2022-10-09T18:08:06.386045975+08:00] loading plugin "io.containerd.service.v1.containers-service"... type=io.containerd.service.v1
INFO[2022-10-09T18:08:06.386060301+08:00] loading plugin "io.containerd.service.v1.content-service"... type=io.containerd.service.v1
INFO[2022-10-09T18:08:06.386075847+08:00] loading plugin "io.containerd.service.v1.diff-service"... type=io.containerd.service.v1
INFO[2022-10-09T18:08:06.386092598+08:00] loading plugin "io.containerd.service.v1.images-service"... type=io.containerd.service.v1
INFO[2022-10-09T18:08:06.386111730+08:00] loading plugin "io.containerd.service.v1.leases-service"... type=io.containerd.service.v1
INFO[2022-10-09T18:08:06.386129763+08:00] loading plugin "io.containerd.service.v1.namespaces-service"... type=io.containerd.service.v1
INFO[2022-10-09T18:08:06.386143081+08:00] loading plugin "io.containerd.service.v1.snapshots-service"... type=io.containerd.service.v1
INFO[2022-10-09T18:08:06.386156250+08:00] loading plugin "io.containerd.runtime.v1.linux"... type=io.containerd.runtime.v1
INFO[2022-10-09T18:08:06.386204083+08:00] loading plugin "io.containerd.runtime.v2.task"... type=io.containerd.runtime.v2
INFO[2022-10-09T18:08:06.386258107+08:00] loading plugin "io.containerd.monitor.v1.cgroups"... type=io.containerd.monitor.v1
INFO[2022-10-09T18:08:06.386855490+08:00] loading plugin "io.containerd.service.v1.tasks-service"... type=io.containerd.service.v1
INFO[2022-10-09T18:08:06.386949124+08:00] loading plugin "io.containerd.internal.v1.restart"... type=io.containerd.internal.v1
INFO[2022-10-09T18:08:06.387096230+08:00] loading plugin "io.containerd.grpc.v1.containers"... type=io.containerd.grpc.v1
INFO[2022-10-09T18:08:06.387135843+08:00] loading plugin "io.containerd.grpc.v1.content"... type=io.containerd.grpc.v1
INFO[2022-10-09T18:08:06.387171618+08:00] loading plugin "io.containerd.grpc.v1.diff"... type=io.containerd.grpc.v1
INFO[2022-10-09T18:08:06.387200578+08:00] loading plugin "io.containerd.grpc.v1.events"... type=io.containerd.grpc.v1
INFO[2022-10-09T18:08:06.387229772+08:00] loading plugin "io.containerd.grpc.v1.healthcheck"... type=io.containerd.grpc.v1
INFO[2022-10-09T18:08:06.387264273+08:00] loading plugin "io.containerd.grpc.v1.images"... type=io.containerd.grpc.v1
INFO[2022-10-09T18:08:06.387293089+08:00] loading plugin "io.containerd.grpc.v1.leases"... type=io.containerd.grpc.v1
INFO[2022-10-09T18:08:06.387321593+08:00] loading plugin "io.containerd.grpc.v1.namespaces"... type=io.containerd.grpc.v1
INFO[2022-10-09T18:08:06.387349820+08:00] loading plugin "io.containerd.internal.v1.opt"... type=io.containerd.internal.v1
INFO[2022-10-09T18:08:06.387426214+08:00] loading plugin "io.containerd.grpc.v1.snapshots"... type=io.containerd.grpc.v1
INFO[2022-10-09T18:08:06.387460111+08:00] loading plugin "io.containerd.grpc.v1.tasks"... type=io.containerd.grpc.v1
INFO[2022-10-09T18:08:06.387515152+08:00] loading plugin "io.containerd.grpc.v1.version"... type=io.containerd.grpc.v1
INFO[2022-10-09T18:08:06.387547837+08:00] loading plugin "io.containerd.grpc.v1.introspection"... type=io.containerd.grpc.v1
INFO[2022-10-09T18:08:06.387942931+08:00] serving... address=/run/containerd/containerd.sock.ttrpc
INFO[2022-10-09T18:08:06.388045242+08:00] serving... address=/run/containerd/containerd.sock
DEBU[2022-10-09T18:08:06.388086098+08:00] sd notification error="<nil>" notified=false state="READY=1"
INFO[2022-10-09T18:08:06.388117730+08:00] containerd successfully booted in 0.039426s
panic: invalid page type: 12: 10

goroutine 69 [running]:
github.com/containerd/containerd/vendor/go.etcd.io/bbolt.(*Cursor).search(0xc000530fe0, 0x7f7c3c366040, 0xa, 0xa, 0xc)
/go/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/cursor.go:250 +0x355
github.com/containerd/containerd/vendor/go.etcd.io/bbolt.(*Cursor).searchPage(0xc000530fe0, 0x7f7c3c366040, 0xa, 0xa, 0x7f7c3c36c000)
/go/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/cursor.go:308 +0x164
github.com/containerd/containerd/vendor/go.etcd.io/bbolt.(*Cursor).search(0xc000530fe0, 0x7f7c3c366040, 0xa, 0xa, 0x14)
/go/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/cursor.go:265 +0x18c
github.com/containerd/containerd/vendor/go.etcd.io/bbolt.(*Cursor).seek(0xc000530fe0, 0x7f7c3c366040, 0xa, 0xa, 0xe, 0x556761035f72, 0x18, 0x556761017762, 0x4, 0x556761035f8a, ...)
/go/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/cursor.go:159 +0x7f
github.com/containerd/containerd/vendor/go.etcd.io/bbolt.(*Bucket).Bucket(0xc00051c0c0, 0x7f7c3c366040, 0xa, 0xa, 0x55676102559d)
/go/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/bucket.go:105 +0xd6
github.com/containerd/containerd/metadata.scanRoots.func2(0x7f7c3c366040, 0xa, 0xa, 0x0, 0x0, 0x0, 0x0, 0x55675fff4520)
/go/src/github.com/containerd/containerd/metadata/gc.go:98 +0xba
github.com/containerd/containerd/vendor/go.etcd.io/bbolt.(*Bucket).ForEach(0xc00051c0c0, 0xc000531658, 0x6, 0x6)
/go/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/bucket.go:390 +0x100
github.com/containerd/containerd/metadata.scanRoots(0x556761ae0860, 0xc00051c000, 0xc000518000, 0xc000522000, 0xc000520000, 0x51a000)
/go/src/github.com/containerd/containerd/metadata/gc.go:94 +0x86c
github.com/containerd/containerd/metadata.(*DB).getMarked.func1(0xc000518000, 0x0, 0x0)
/go/src/github.com/containerd/containerd/metadata/db.go:384 +0x193
github.com/containerd/containerd/vendor/go.etcd.io/bbolt.(*DB).View(0xc0000f4200, 0xc000080850, 0x0, 0x0)
/go/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/db.go:725 +0xaa
github.com/containerd/containerd/metadata.(*DB).getMarked(0xc0000d0c40, 0x556761ae08a0, 0xc000040098, 0x203000, 0x203000, 0x0)
/go/src/github.com/containerd/containerd/metadata/db.go:367 +0x7e
github.com/containerd/containerd/metadata.(*DB).GarbageCollect(0xc0000d0c40, 0x556761ae08a0, 0xc000040098, 0x0, 0x656d2e6472656e01, 0x762e617461646174, 0x746c6f622e31)
/go/src/github.com/containerd/containerd/metadata/db.go:284 +0xa3
github.com/containerd/containerd/gc/scheduler.(*gcScheduler).run(0xc0000c0b40, 0x556761ae08a0, 0xc000040098)
/go/src/github.com/containerd/containerd/gc/scheduler/scheduler.go:310 +0x516
created by github.com/containerd/containerd/gc/scheduler.init.0.func1
/go/src/github.com/containerd/containerd/gc/scheduler/scheduler.go:132 +0x429

和之前的 docker panic 一样,可以从 panic 的信息确定 containerd 也使用了 boltdb 存储一些信息,从调用链看就是 metadata.(*DB).getMarked

1
2
3
4
5
6
7
8
9
10
11
12
$ ls -l /var/lib/containerd/
total 0
drwxr-xr-x. 4 root root 33 May 26 2021 io.containerd.content.v1.content
drwx--x--x. 2 root root 21 May 26 2021 io.containerd.metadata.v1.bolt
drwx--x--x. 2 root root 6 May 26 2021 io.containerd.runtime.v1.linux
drwx--x--x. 3 root root 18 Jun 18 2021 io.containerd.runtime.v2.task
drwx------. 3 root root 23 May 26 2021 io.containerd.snapshotter.v1.native
drwx------. 3 root root 23 May 26 2021 io.containerd.snapshotter.v1.overlayfs
drwx------. 2 root root 6 May 26 2021 tmpmounts
$ ls -l /var/lib/containerd/io.containerd.metadata.v1.bolt/
total 264
-rw-r--r--. 1 root root 270336 Oct 9 15:01 meta.db

改名 meta.db 启动

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ mv /var/lib/containerd/io.containerd.metadata.v1.bolt/meta.db{,.bak}
$ systemctl start containerd
$ systemctl status containerd
● containerd.service - containerd container runtime
Loaded: loaded (/usr/lib/systemd/system/containerd.service; disabled; vendor preset: disabled)
Active: active (running) since Sun 2022-10-09 18:08:36 CST; 10s ago
Docs: https://containerd.io
Process: 10573 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 10576 (containerd)
Tasks: 14
Memory: 27.5M
CGroup: /system.slice/containerd.service
└─10576 /usr/bin/containerd

Oct 09 18:08:36 localhost.localdomain containerd[10576]: time="2022-10-09T18:08:36.664056264+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
Oct 09 18:08:36 localhost.localdomain containerd[10576]: time="2022-10-09T18:08:36.664069435+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
Oct 09 18:08:36 localhost.localdomain containerd[10576]: time="2022-10-09T18:08:36.664082887+08:00" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1
Oct 09 18:08:36 localhost.localdomain containerd[10576]: time="2022-10-09T18:08:36.664135587+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
Oct 09 18:08:36 localhost.localdomain containerd[10576]: time="2022-10-09T18:08:36.664152107+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
Oct 09 18:08:36 localhost.localdomain containerd[10576]: time="2022-10-09T18:08:36.664167475+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
Oct 09 18:08:36 localhost.localdomain containerd[10576]: time="2022-10-09T18:08:36.664183225+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
Oct 09 18:08:36 localhost.localdomain containerd[10576]: time="2022-10-09T18:08:36.664386986+08:00" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc
Oct 09 18:08:36 localhost.localdomain containerd[10576]: time="2022-10-09T18:08:36.664433173+08:00" level=info msg=serving... address=/run/containerd/containerd.sock
Oct 09 18:08:36 localhost.localdomain containerd[10576]: time="2022-10-09T18:08:36.664508047+08:00" level=info msg="containerd successfully booted in 0.048852s"

没啥技术含量,写出来是给人看看排错过程和思路

CATALOG
  1. 1. 过程