zhangguanzhang's Blog

docker-18.06.3-ce启动panic: invalid page type: 0: 0的解决处理

字数统计: 4.4k阅读时长: 25 min
2020/01/08 Share

凌晨pod状态异常且没有恢复,上线查看后docker ps -a命令无响应,kubelet日志也刷和docker的rpc通信context deadline cancel
重启了docker后docker无法启动,前台运行开debug log level
刚开始刷了个/var/lib/docker/tmp是个文件,对比了下其他机器上这个路径应该是个目录,然后改名后创建个权限和正常docker下权限一样的tmp文件夹再起

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# pkill dockerd
# dockerd -D
DEBU[2020-01-08T04:41:29.600747707+08:00] Listener created for HTTP on unix (/var/run/docker.sock)
INFO[2020-01-08T04:41:29.601361374+08:00] libcontainerd: started new docker-containerd process pid=10411
INFO[2020-01-08T04:41:29.601396369+08:00] parsed scheme: "unix" module=grpc
INFO[2020-01-08T04:41:29.601409680+08:00] scheme "unix" not registered, fallback to default scheme module=grpc
INFO[2020-01-08T04:41:29.601457335+08:00] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}] module=grpc
INFO[2020-01-08T04:41:29.601475129+08:00] ClientConn switching balancer to "pick_first" module=grpc
INFO[2020-01-08T04:41:29.601537196+08:00] pickfirstBalancer: HandleSubConnStateChange: 0xc42021bf20, CONNECTING module=grpc
INFO[0000] starting containerd revision=468a545b9edcd5932818eb9de8e72413e616e86e version=v1.1.2
DEBU[0000] changing OOM score to -500
INFO[0000] loading plugin "io.containerd.content.v1.content"... type=io.containerd.content.v1
INFO[0000] loading plugin "io.containerd.snapshotter.v1.btrfs"... type=io.containerd.snapshotter.v1
WARN[0000] failed to load plugin io.containerd.snapshotter.v1.btrfs error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
INFO[0000] loading plugin "io.containerd.snapshotter.v1.aufs"... type=io.containerd.snapshotter.v1
WARN[0000] failed to load plugin io.containerd.snapshotter.v1.aufs error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found.\n": exit status 1"
INFO[0000] loading plugin "io.containerd.snapshotter.v1.native"... type=io.containerd.snapshotter.v1
INFO[0000] loading plugin "io.containerd.snapshotter.v1.overlayfs"... type=io.containerd.snapshotter.v1
INFO[0000] loading plugin "io.containerd.snapshotter.v1.zfs"... type=io.containerd.snapshotter.v1
WARN[0000] failed to load plugin io.containerd.snapshotter.v1.zfs error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
INFO[0000] loading plugin "io.containerd.metadata.v1.bolt"... type=io.containerd.metadata.v1
WARN[0000] could not use snapshotter aufs in metadata plugin error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found.\n": exit status 1"
WARN[0000] could not use snapshotter zfs in metadata plugin error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
WARN[0000] could not use snapshotter btrfs in metadata plugin error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
INFO[0000] loading plugin "io.containerd.differ.v1.walking"... type=io.containerd.differ.v1
INFO[0000] loading plugin "io.containerd.gc.v1.scheduler"... type=io.containerd.gc.v1
INFO[0000] loading plugin "io.containerd.service.v1.containers-service"... type=io.containerd.service.v1
INFO[0000] loading plugin "io.containerd.service.v1.content-service"... type=io.containerd.service.v1
INFO[0000] loading plugin "io.containerd.service.v1.diff-service"... type=io.containerd.service.v1
INFO[0000] loading plugin "io.containerd.service.v1.images-service"... type=io.containerd.service.v1
INFO[0000] loading plugin "io.containerd.service.v1.leases-service"... type=io.containerd.service.v1
INFO[0000] loading plugin "io.containerd.service.v1.namespaces-service"... type=io.containerd.service.v1
INFO[0000] loading plugin "io.containerd.service.v1.snapshots-service"... type=io.containerd.service.v1
INFO[0000] loading plugin "io.containerd.monitor.v1.cgroups"... type=io.containerd.monitor.v1
INFO[0000] loading plugin "io.containerd.runtime.v1.linux"... type=io.containerd.runtime.v1
INFO[0000] loading plugin "io.containerd.service.v1.tasks-service"... type=io.containerd.service.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.containers"... type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.content"... type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.diff"... type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.events"... type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.healthcheck"... type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.images"... type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.leases"... type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.namespaces"... type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.snapshots"... type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.tasks"... type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.version"... type=io.containerd.grpc.v1
INFO[0000] loading plugin "io.containerd.grpc.v1.introspection"... type=io.containerd.grpc.v1
INFO[0000] serving... address="/var/run/docker/containerd/docker-containerd-debug.sock"
INFO[0000] serving... address="/var/run/docker/containerd/docker-containerd.sock"
INFO[0000] containerd successfully booted in 0.004495s
INFO[2020-01-08T04:41:29.632221318+08:00] pickfirstBalancer: HandleSubConnStateChange: 0xc42021bf20, READY module=grpc
DEBU[2020-01-08T04:41:29.633204984+08:00] Golang's threads limit set to 231300
INFO[2020-01-08T04:41:29.633598981+08:00] parsed scheme: "unix" module=grpc
INFO[2020-01-08T04:41:29.633618132+08:00] scheme "unix" not registered, fallback to default scheme module=grpc
INFO[2020-01-08T04:41:29.633660093+08:00] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}] module=grpc
INFO[2020-01-08T04:41:29.633674419+08:00] ClientConn switching balancer to "pick_first" module=grpc
INFO[2020-01-08T04:41:29.633718699+08:00] pickfirstBalancer: HandleSubConnStateChange: 0xc42036a4c0, CONNECTING module=grpc
INFO[2020-01-08T04:41:29.633866743+08:00] pickfirstBalancer: HandleSubConnStateChange: 0xc42036a4c0, READY module=grpc
DEBU[2020-01-08T04:41:29.633956178+08:00] Using default logging driver json-file
DEBU[2020-01-08T04:41:29.633970269+08:00] [graphdriver] trying provided driver: overlay2
DEBU[2020-01-08T04:41:29.634657377+08:00] processing event stream module=libcontainerd namespace=plugins.moby
DEBU[2020-01-08T04:41:29.638820441+08:00] Cleaning up old mountid : start.
Error starting daemon: error initializing graphdriver: lstat /var/lib/docker/overlay2/8b41eae72b8bf6a80745b46bd052afed10bd8ed32a2722e41262bc88b9c32391: structure needs cleaning

进入目录手动删除试试

1
2
cd /var/lib/docker/overlay2
rm -rf '8b41eae72b8bf6a80745b46bd052afed10bd8ed32a2722e41262bc88b9c32391': Structure needs cleaning

结合前面的意识到应该是文件系统损坏了,信息丢失,所以/var/lib/docker/tmp从目录变成了文件,然后关机挂载centos的iso进rescue模式后修复开机依然无法启动

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# dockerd -D
DEBU[2020-01-08T05:27:43.975383491+08:00] Listener created for HTTP on unix (/var/run/docker.sock)
INFO[2020-01-08T05:27:43.976013056+08:00] libcontainerd: started new docker-containerd process" pid=10687
INFO[2020-01-08T05:27:43.976062081+08:00] parsed scheme: "unix" module=grpc
INFO[2020-01-08T05:27:43.976070261+08:00] scheme "unix" not registered, fallback to default scheme module=grpc
INFO[2020-01-08T05:27:43.976070261+08:00] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}] module=grpc
INFO[2020-01-08T05:27:43.976129855+08:00] ClientConn switching balancer to "pick_first" module=grpc
INFO[2020-01-08T05:27:43.976180248+08:00] pickfirstBalancer: HandleSubConnStateChange: 0xc42051f4d0, CONNECTING module=grpc
INFO[0000] starting containerd revision=468a545b9edcd5932818eb9de8e72413e616e86e version=v1.1.2
DEBU[0000] changing OOM score to -500
INFO[0000] loading plugin "io.containerd.content.v1.content"... type=io.containerd.content.v1
INFO[0000] loading plugin "io.containerd.snapshotter.v1.btrfs"... type=io.containerd.snapshotter.v1
WARN[0000] failed to load plugin io.containerd.snapshotter.v1.btrfs error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
INFO[0000] loading plugin "io.containerd.snapshotter.v1.aufs"... type=io.containerd.snapshotter.v1
WARN[0000] failed to load plugin io.containerd.snapshotter.v1.aufs error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found.\n": exit s
INFO[0000] loading plugin "io.containerd.snapshotter.v1.native"... type=io.containerd.snapshotter.v1
INFO[0000] loading plugin "io.containerd.snapshotter.v1.overlayfs"... type=io.containerd.snapshotter.v1
INFO[0000] loading plugin "io.containerd.snapshotter.v1.zfs"... type=io.containerd.snapshotter.v1
WARN[0000] failed to load plugin io.containerd.snapshotter.v1.zfs error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
INFO[0000] loading plugin "io.containerd.metadata.v1.bolt"... type=io.containerd.metadata.v1
WARN[0000] could not use snapshotter zfs in metadata plugin error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
WARN[0000] could not use snapshotter btrfs in metadata plugin error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
WARN[0000] could not use snapshotter aufs in metadata plugin" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found.\n": exit status 1"
panic: invalid page type: 0: 0

goroutine 1 [running]:
github.com/containerd/containerd/vendor/github.com/boltdb/bolt.(*Cursor).search(0xc4204ef2b8, 0xc4204ef3b8, 0x2, 0x20, 0x241)
/go/src/github.com/containerd/containerd/vendor/github.com/boltdb/bolt/cursor.go:256 +0x38a
github.com/containerd/containerd/vendor/github.com/boltdb/bolt.(*Cursor).seek(0xc4204ef2b8, 0xc4204ef3b8, 0x2, 0x20, 0x0, 0x0, 0x18, 0x564ff51d24a0, 0x0, 0xc420420000, ...)
/go/src/github.com/containerd/containerd/vendor/github.com/boltdb/bolt/cursor.go:159 +0xa7
github.com/containerd/containerd/vendor/github.com/boltdb/bolt.(*Bucket).Bucket(0xc4200e8018, 0xc4204ef3b8, 0x2, 0x20, 0xc4204ef3b8)
/go/src/github.com/containerd/containerd/vendor/github.com/boltdb/bolt/bucket.go:112 +0xe0
github.com/containerd/containerd/vendor/github.com/boltdb/bolt.(*Tx).Bucket(0xc4200e8000, 0xc4204ef3b8, 0x2, 0x20, 0x2)
/go/src/github.com/containerd/containerd/vendor/github.com/boltdb/bolt/tx.go:101 +0x51
github.com/containerd/containerd/metadata.(*DB).Init.func1(0xc4200e8000, 0x564ff5305da8, 0xc4200e8000)
/go/src/github.com/containerd/containerd/metadata/db.go:121 +0x174
github.com/containerd/containerd/vendor/github.com/boltdb/bolt.(*DB).Update(0xc4201fc3c0, 0xc4204ef540, 0x0, 0x0)
/go/src/github.com/containerd/containerd/vendor/github.com/boltdb/bolt/db.go:598 +0x92
github.com/containerd/containerd/metadata.(*DB).Init(0xc42029c540, 0x564ff5328b40, 0xc42003c018, 0xc4204ef660, 0xc42029c540)
/go/src/github.com/containerd/containerd/metadata/db.go:103 +0xb1
github.com/containerd/containerd/server.LoadPlugins.func2(0xc4201dc5b0, 0xc4200ca3c0, 0x21, 0xc42025cfe0, 0x1e)
/go/src/github.com/containerd/containerd/server/server.go:255 +0x53f
github.com/containerd/containerd/plugin.(*Registration).Init(0xc4200ceeb0, 0xc4201dc5b0, 0xc4200ceeb0)
/go/src/github.com/containerd/containerd/plugin/plugin.go:98 +0x3a
github.com/containerd/containerd/server.New(0x7f2931983158, 0xc42003c018, 0xc420100000, 0x1, 0xc4204efc80, 0x564ff40747cd)
/go/src/github.com/containerd/containerd/server/server.go:106 +0x600
github.com/containerd/containerd/cmd/containerd/command.App.func1(0xc420102160, 0xc420102160, 0xc4204efd07)
/go/src/github.com/containerd/containerd/cmd/containerd/command/main.go:132 +0x5fb
github.com/containerd/containerd/vendor/github.com/urfave/cli.HandleAction(0x564ff5115760, 0x564ff5305718, 0xc420102160, 0xc4205182a0, 0x0)
/go/src/github.com/containerd/containerd/vendor/github.com/urfave/cli/app.go:502 +0xca
github.com/containerd/containerd/vendor/github.com/urfave/cli.(*App).Run(0xc4201de000, 0xc42003a090, 0x3, 0x3, 0x0, 0x0)
/go/src/github.com/containerd/containerd/vendor/github.com/urfave/cli/app.go:268 +0x60e
main.main()
github.com/containerd/containerd/cmd/containerd/main.go:28 +0x51
Error containerd did not exit successfully error="exit status 2" module=libcontainerd

根据panic的堆栈内容是containerd/containerd的github代码,然后看下版本信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
$ docker info
...
Server Version: 18.06.3-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: a592beb5bc4c4092b1b1bac971afed27687340c5
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-957.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 47.25GiB
...

按照containerd的commit id 468a545b9edcd5932818eb9de8e72413e616e86e 和log的load plugins err找到了 https://github.com/containerd/containerd/blob/468a545b9edcd5932818eb9de8e72413e616e86e/server/server.go#L84-L125
看for逻辑在插件load出现err并不影响运行,我是用的overlay2存储忽略aufs的内核模块导入错误,继续找panic的代码 https://github.com/containerd/containerd/blob/468a545b9edcd5932818eb9de8e72413e616e86e/vendor/github.com/boltdb/bolt/cursor.go#L256

1
2
3
4
func (c *Cursor) search(key []byte, pgid pgid) {
p, n := c.bucket.pageNode(pgid)
if p != nil && (p.flags&(branchPageFlag|leafPageFlag)) == 0 {
panic(fmt.Sprintf("invalid page type: %d: %x", p.id, p.flags))

上层是同文件的159行调用

1
c.search(seek, c.bucket.root)

看样子是用文件提供了一个kv存储,从存储文件里面查找数据,然后对比不对则报错panic。堆栈信息找到/go/src/github.com/containerd/containerd/server/server.go:255 +0x53f也就是 https://github.com/containerd/containerd/blob/468a545b9edcd5932818eb9de8e72413e616e86e/server/server.go#L215-L247 行可以看到是metadata的db文件相关的注册信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
plugin.Register(&plugin.Registration{
Type: plugin.MetadataPlugin,
ID: "bolt",
Requires: []plugin.Type{
plugin.ContentPlugin,
plugin.SnapshotPlugin,
},
InitFn: func(ic *plugin.InitContext) (interface{}, error) {
if err := os.MkdirAll(ic.Root, 0711); err != nil {
return nil, err
}
cs, err := ic.Get(plugin.ContentPlugin)
if err != nil {
return nil, err
}

snapshottersRaw, err := ic.GetByType(plugin.SnapshotPlugin)
if err != nil {
return nil, err
}

snapshotters := make(map[string]snapshots.Snapshotter)
for name, sn := range snapshottersRaw {
sn, err := sn.Instance()
if err != nil {
log.G(ic.Context).WithError(err).
Warnf("could not use snapshotter %v in metadata plugin", name)
continue
}
snapshotters[name] = sn.(snapshots.Snapshotter)
}

path := filepath.Join(ic.Root, "meta.db")

上面代码的最后一行是join的,也就说存储文件最终落地的basename是meta.db,进入docker的目录查找下db文件

1
2
3
4
5
6
7
8
9
10
$ cd /var/lib/docker
$ find -type f -name '*.db' -exec md5sum {} \;
....
995edbf15ba217c7835ec28fb7295cb4 ./network/files/local-kv.db
9701191ad9ef12f668f3adfc3667d668 ./containerd/daemon/io.containerd.metadata.v1.bolt/meta.db
f27af34fccaea0094c8ff67242abdd05 ./builder/fscache.db
1e38cfd83bc66e216cfcbbd78c2ddd5c ./buildkit/cache.db
f27af34fccaea0094c8ff67242abdd05 ./buildkit/metadata.db
f27af34fccaea0094c8ff67242abdd05 ./buildkit/snapshots.db
ed13bbe3a2c3ab7ae9f60822241b4910 ./volumes/metadata.db

可以看出./containerd/daemon/io.containerd.metadata.v1.bolt/meta.db就是那个bolt的db文件
然后cat下发现内容和同版本的正常docker上的文件不一样,正常docker上的这个文件cat的话虽然乱码但是隐约看到很多json输出,而这个文件内容只有一行类似sha256的字符串,给他改下名再启动

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
$ cd containerd/daemon/io.containerd.metadata.v1.bolt/
$ mv meta.db{,.bak}
$ dockerd -D
...
...
DEBU[2020-01-08T06:23:14.604593395+08:00] Registering GET, /Plugins/privileges
DEBU[2020-01-08T06:23:14.604593395+08:00] Registering GET, /Plugins/privileges
DEBU[2020-01-08T06:23:14.685104585+08:00] Registering DELETE, /Plugins/{name:.*}
DEBU[2020-01-08T06:23:14.605624293+08:00] Registering POST, /Plugins/{name:.*}/enable
DEBU[2020-01-08T06:23:14.606147187+08:00] Registering POST, /Plugins/{name:.*}/disable
DEBU[2020-01-08T06:23:14.606680782+08:00] Registering POST, /plugins/pull
DEBU[2020-01-08T06:23:14.607348680+88:00] Registering POST, /Plugins/{name:.*}/push
DEBU[2020-01-08T06:23:14.607885151+88:00] Registering POST, /Plugins/{name:.*}/upgrade
DEBU[2020-01-08T06:23:14.608422094+08:00] Registering POST, /Plugins/{name:.*}/set
DEBU[2020-01-08T06:23:14.608942233+88:00] Registering POST, /Plugins/create
DEBU[2020-01-08T06:23:14.609468422+08:00] Registering GET, /distribution/{name:.*}/json
DEBU[2020-01-08T06:23:14.610006431+88:00] Registering GET, /networks
DEBU[2020-01-08T06:23:14.610523576+08:00] Registering GET, /networks/
DEBU[2020-01-08T06:23:14.611101665+08:00] Registering GET, /networks/fid:.+}
DEBU[2020-01-08T06:23:14.611691486+08:00] Registering POST, /networks/create
DEBU[2020-01-08T06:23:14.612211635408:00] Registering POST, /networks/{id:.*}/comnect
DEBU[2020-01-08T06:23:14.612755402+08:00] Registering POST, /networks/{id:.*}/discomect
DEBU[2020-01-08T06:23:14.613293237+08:00] Registering POST, /networks/prune
DEBU[2020-01-08T06:23:14.613818840+08:00] Registering DELETE, /networks/{id:.*}
INFO[2020-01-08T06:23:14.614587823+08:00] API listen on /var/run/docker.sock
^CINF0[2020-01-08T06:23:20.018648309+08:00] Processing signal 'interrupt'
DEBU[2020-01-08T06:23:20.019573968+08:00] daemon configured with a 15 seconds minimum shutdown timeout
DEBUL2020-01-08T86:23:20.020244626+08:00] start clean shutdown of all containers with a 15 seconds timeout...
DEBU[2020-01-08T06:23:20.021228579+88:00] Unix socket /run/docker/libnetwork/959ebdb74deb5e4b941deffa8c69cb1338874ef1e66b7964a13e79b0db3c6b32.sock doesn't exist. cannot accepet client connections
DEBU[2020-01-08T06:23:20.021506723+08:00] Cleaning up old mountid : start.
DEBU[2020-01-08T06:23:20.023363414+88:00] Cleaning up old mountid : done.
DEBU[2020-01-08T06:23:20.024124509+88:00] Clean shutdown succeeded
INF0[2020-01-08T06:23:20.024842690+08:00] stopping event stream following graceful shutdown error="context canceled" module=libcontainerd namespace=moby
DEBU[0018] received signa 1 signal=terminated
INF0[2020-01-08T06:23:20.024842296+08:00] stopping event stream following graceful shutdown error="context canceled" module=libcontainerd namespace=plugins.moby
INF0[2020-01-08T06:23:20.024847832+08:00] stopping healthcheck following graceful shutdown module=libcontainerd
INFO [2020-0108T86:23:20.027148599+08:00] pickfirstBalancer: HandleSubConnStateChange: 0xc420189620, TRANSIENT_FAILURE module=grpc
INFO[2020-01-08T06:23:20.027167365+88:00] pickfirstBalancer: HandleSubConnStateChange: 0xc420190d60, TRANSIENT_FAILURE module=grpc
INF0[2020-01-08T06:23:20.032788185+08:00] pickfirstBalancer: HandleSubConnStateChange: 0xc420190d60, CONNECTING module=grpc
INF0[2020-01-08T06:23:20.027224233+88:00] pickfirstBalancer: HandleSubConnStateChange: 0xc420398c00,TRANSIENT_FAILURE module=grpc
INF0[2020-01-08T06:23:20.035701179+08:00] PickfirstBalancer: HandleSubConnStateChange: 0xc420398c00, CONNECTING module=grpc

能正常启动,ctrl+c取消掉用正式启动也没问题了

1
$ systemctl start docker

后面我查找了下源码,docker daemon会调用containerd,类似os.exec那样执行,具体可以去看下面的代码
https://github.com/docker/docker-ce/blob/d7080c17a580919f5340a15a8e5e013133089680/components/engine/libcontainerd/remote_daemon.go#L244
https://github.com/docker/docker-ce/blob/d7080c17a580919f5340a15a8e5e013133089680/components/engine/cmd/dockerd/daemon.go#L147

1
2
3
$ find -type f -name '*.go' -exec grep -Pl 'containerd.New' {} \;
./components/engine/cmd/dockerd/daemon.go
./components/engine/libcontainerd/remote_daemon.go

containerd在老版本(19以下)名为docker-containerd,新版本是containerd,不要手欠在终端执行它,不然docker和kubelet会退出
讲句题外话,那个bolt的库老版本那行panic把许多知名应用给panic了,所以尽量使用新版本docker

后续看了下kubelet在刷错误日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Jan 08 20:43:29     docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "cf46e57be21fcce97e0b34f14df1174c9f43784ae6c871fe306559239afeb0c0.bak": invalid character 'd' in literal false (expecting 'a') 
Jan 08 20:43:30 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "05135ebf2e0a463207245ac71efefb9c25c255b73aef50bb7774ef9ef17818bf": invalid character 'q' in numeric literal
Jan 08 20:43:30 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "2aa54bfda449d33e0e69d8c972f9838d0653c8f04bb5e31a9a1bcb7efcea3f47": invalid character 'e' looking for beginning of value
Jan 08 20:43:30 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "42a46846bde5fc3bd713c5eb4a9b0781a8e8497472b2770b4ae557981c46f510": invalid character 'f' after top-level value
Jan 08 20:43:30 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "cf46e57be21fcce97e0b34f14df1174c9f43784ae6c871fe306559239afeb0c0.bak": invalid character 'd' in literal false (expecting 'a')
Jan 08 20:43:31 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "05135ebf2e0a463207245ac71efefb9c25c255b73aef50bb7774ef9ef17818bf": invalid character 'q' in numeric literal
Jan 08 20:43:31 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "2aa54bfda449d33e0e69d8c972f9838d0653c8f04bb5e31a9a1bcb7efcea3f47": invalid character 'e' looking for beginning of value
Jan 08 20:43:31 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "42a46846bde5fc3bd713c5eb4a9b0781a8e8497472b2770b4ae557981c46f510": invalid character 'f' after top-level value
Jan 08 20:43:31 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "cf46e57be21fcce97e0b34f14df1174c9f43784ae6c871fe306559239afeb0c0.bak": invalid character 'd' in literal false (expecting 'a')
Jan 08 20:43:32 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "05135ebf2e0a463207245ac71efefb9c25c255b73aef50bb7774ef9ef17818bf": invalid character 'q' in numeric literal
Jan 08 20:43:32 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "2aa54bfda449d33e0e69d8c972f9838d0653c8f04bb5e31a9a1bcb7efcea3f47": invalid character 'e' looking for beginning of value
Jan 08 20:43:32 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "42a46846bde5fc3bd713c5eb4a9b0781a8e8497472b2770b4ae557981c46f510": invalid character 'f' after top-level value
Jan 08 20:43:32 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "cf46e57be21fcce97e0b34f14df1174c9f43784ae6c871fe306559239afeb0c0.bak": invalid character 'd' in literal false (expecting 'a')
Jan 08 20:43:33 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "05135ebf2e0a463207245ac71efefb9c25c255b73aef50bb7774ef9ef17818bf": invalid character 'q' in numeric literal
Jan 08 20:43:33 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "2aa54bfda449d33e0e69d8c972f9838d0653c8f04bb5e31a9a1bcb7efcea3f47": invalid character 'e' looking for beginning of value
Jan 08 20:43:33 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "42a46846bde5fc3bd713c5eb4a9b0781a8e8497472b2770b4ae557981c46f510": invalid character 'f' after top-level value
Jan 08 20:43:33 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "cf46e57be21fcce97e0b34f14df1174c9f43784ae6c871fe306559239afeb0c0.bak": invalid character 'd' in literal false (expecting 'a')
Jan 08 20:43:34 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "05135ebf2e0a463207245ac71efefb9c25c255b73aef50bb7774ef9ef17818bf": invalid character 'q' in numeric literal
Jan 08 20:43:34 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "2aa54bfda449d33e0e69d8c972f9838d0653c8f04bb5e31a9a1bcb7efcea3f47": invalid character 'e' looking for beginning of value
Jan 08 20:43:34 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "42a46846bde5fc3bd713c5eb4a9b0781a8e8497472b2770b4ae557981c46f510": invalid character 'f' after top-level value
Jan 08 20:43:34 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "cf46e57be21fcce97e0b34f14df1174c9f43784ae6c871fe306559239afeb0c0.bak": invalid character 'd' in literal false (expecting 'a')
Jan 08 20:43:35 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "05135ebf2e0a463207245ac71efefb9c25c255b73aef50bb7774ef9ef17818bf": invalid character 'q' in numeric literal
Jan 08 20:43:35 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "2aa54bfda449d33e0e69d8c972f9838d0653c8f04bb5e31a9a1bcb7efcea3f47": invalid character 'e' looking for beginning of value
Jan 08 20:43:35 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "42a46846bde5fc3bd713c5eb4a9b0781a8e8497472b2770b4ae557981c46f510": invalid character 'f' after top-level value
Jan 08 20:43:35 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "cf46e57be21fcce97e0b34f14df1174c9f43784ae6c871fe306559239afeb0c0.bak": invalid character 'd' in literal false (expecting 'a')
Jan 08 20:43:36 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "05135ebf2e0a463207245ac71efefb9c25c255b73aef50bb7774ef9ef17818bf": invalid character 'q' in numeric literal
Jan 08 20:43:36 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "2aa54bfda449d33e0e69d8c972f9838d0653c8f04bb5e31a9a1bcb7efcea3f47": invalid character 'e' looking for beginning of value
Jan 08 20:43:36 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "42a46846bde5fc3bd713c5eb4a9b0781a8e8497472b2770b4ae557981c46f510": invalid character 'f' after top-level value
Jan 08 20:43:36 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "cf46e57be21fcce97e0b34f14df1174c9f43784ae6c871fe306559239afeb0c0.bak": invalid character 'd' in literal false (expecting 'a')
Jan 08 20:43:37 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "05135ebf2e0a463207245ac71efefb9c25c255b73aef50bb7774ef9ef17818bf": invalid character 'q' in numeric literal
Jan 08 20:43:37 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "2aa54bfda449d33e0e69d8c972f9838d0653c8f04bb5e31a9a1bcb7efcea3f47": invalid character 'e' looking for beginning of value
Jan 08 20:43:37 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "42a46846bde5fc3bd713c5eb4a9b0781a8e8497472b2770b4ae557981c46f510": invalid character 'f' after top-level value
Jan 08 20:43:37 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "cf46e57be21fcce97e0b34f14df1174c9f43784ae6c871fe306559239afeb0c0.bak": invalid character 'd' in literal false (expecting 'a')
Jan 08 20:43:38 docker_sandbox.go:538] Failed to retrieve checkpoint for sandbox "05135ebf2e0a463207245ac71efefb9c25c255b73aef50bb7774ef9ef17818bf": invalid character 'q' in numeric literal

解决办法是进目录删除相关文件还是目录来着

1
2
3
4
5
cd /var/lib/dockershim/sandbox/
rm -rf 05135ebf2e0a463207245ac71efefb9c25c255b73aef50bb7774ef9ef17818bf \
cf46e57be21fcce97e0b34f14df1174c9f43784ae6c871fe306559239afeb0c0 \
42a46846bde5fc3bd713c5eb4a9b0781a8e8497472b2770b4ae557981c46f510 \
2aa54bfda449d33e0e69d8c972f9838d0653c8f04bb5e31a9a1bcb7efcea3f47

CATALOG