zhangguanzhang's Blog

docker数据盘损坏后启动报错 Error starting daemon: Error initializing network controller...

字数统计: 860阅读时长: 5 min
2021/12/12 Share

前言

客户现场的数据盘损坏了,修复启动机器后 docker 无法启动

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@db1 docker]# /data/kube/bin/dockerd
WARN[0000] The "graph" config file option is deprecated. Please use "data-root" instead.
WARN[2021-12-11T21:16:07.917969366+08:00] could not change group /var/run/docker.sock to docker: group docker not found
WARN[2021-12-11T21:16:07.942745757+08:00] failed to load plugin io.containerd.snapshotter.v1.btrfs error="path /data/kube/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
WARN[2021-12-11T21:16:07.944020734+08:00] failed to load plugin io.containerd.snapshotter.v1.aufs error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found.\n": exit status 1"
WARN[2021-12-11T21:16:07.944275670+08:00] failed to load plugin io.containerd.snapshotter.v1.zfs error="path /data/kube/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
WARN[2021-12-11T21:16:07.944314186+08:00] could not use snapshotter btrfs in metadata plugin error="path /data/kube/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
WARN[2021-12-11T21:16:07.944324941+08:00] could not use snapshotter aufs in metadata plugin error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found.\n": exit status 1"
WARN[2021-12-11T21:16:07.944333098+08:00] could not use snapshotter zfs in metadata plugin error="path /data/kube/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
WARN[2021-12-11T21:16:09.131994686+08:00] Running modprobe bridge br_netfilter failed with message: modprobe: ERROR: could not insert 'bridge': Key was rejected by service
modprobe: ERROR: could not insert 'br_netfilter': Key was rejected by service
insmod /lib/modules/3.10.0-514.el7.x86_64/kernel/net/bridge/bridge.ko
insmod /lib/modules/3.10.0-514.el7.x86_64/kernel/net/bridge/bridge.ko
, error: exit status 1
Error starting daemon: Error initializing network controller: Error creating default "bridge" network: package not installed

排查

相关信息

1
2
3
4
5
6
$ dockerd --version
Docker version 18.09.3, build 774a1f4
$ uname -a
Linux db1 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)

处理过程

先看下是不是把内核模块禁止了,发现没禁止,手动加载也报错

1
2
3
4
$ grep -r black /etc/modprobe.d/*.conf
$ modprobe overlay
$ modprobe bridge
modprobe: ERROR: could not insert 'bridge': Key was rejected by service

查看下也没开启 enforcemodulesig

1
2
3
$ dmesg | grep enforcemodulesig=1
$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-514.el7.x86_64 root=UUID=5ab681a0-7e5c-4ab7-9c88-27d788f725b3 ro crashkernel=auto rhgb quiet LANG=en_US.UTF-8

但是能查看到内核模块信息

1
2
3
4
5
6
7
8
9
10
11
12
13
$ modinfo bridge
filename: /lib/modules/3.10.0-514.el7.x86_64/kernel/net/bridge/bridge.ko
alias: rtnl-link-bridge
version: 2.3
license: GPL
rhelversion: 7.3
srcversion: FF0448CD85C271287DE1963
depends: stp,llc
intree: Y
vermagic: 3.10.0-514.el7.x86_64 SMP mod_unload modversions
signer: CentOS Linux kernel signing key
sig_key: D4:88:63:A7:C1:6F:CC:27:41:23:E6:29:8F:74:F0:57:AF:19:FC:54
sig_hashalgo: sha256

感觉是内核签名对不上,查看下模块哈希

1
2
$ md5sum /lib/modules/3.10.0-514.el7.x86_64/kernel/net/bridge/bridge.ko
62001928100a30bace9bc6493b956e2f /lib/modules/3.10.0-514.el7.x86_64/kernel/net/bridge/bridge.ko

找了另一台机器对比下,发现模块损坏了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ uname -a
Linux db2 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ modinfo bridge
filename: /lib/modules/3.10.0-514.el7.x86_64/kernel/net/bridge/bridge.ko
alias: rtnl-link-bridge
version: 2.3
license: GPL
rhelversion: 7.3
srcversion: FF0448CD85C271287DE1963
depends: stp,llc
intree: Y
vermagic: 3.10.0-514.el7.x86_64 SMP mod_unload modversions
signer: CentOS Linux kernel signing key
sig_key: D4:88:63:A7:C1:6F:CC:27:41:23:E6:29:8F:74:F0:57:AF:19:FC:54
sig_hashalgo: sha256
$ md5sum /lib/modules/3.10.0-514.el7.x86_64/kernel/net/bridge/bridge.ko
41c62afa67e66d107cc2a9e471910726 /lib/modules/3.10.0-514.el7.x86_64/kernel/net/bridge/bridge.ko

修复

1
2
3
cd /lib/modules/3.10.0-514.el7.x86_64/kernel/net/bridge/
cp bridge.ko bridge.ko.bak
scp root@xxxx:/lib/modules/3.10.0-514.el7.x86_64/kernel/net/bridge/bridge.ko .

然后能启动了

参考

CATALOG
  1. 1. 前言
  2. 2. 排查
    1. 2.1. 相关信息
    2. 2.2. 处理过程
  3. 3. 参考