zhangguanzhang's Blog

docker 和 apparmor

字数统计: 2.8k阅读时长: 15 min
2024/01/08

从搜索和源码探索 docker 如何处理 apparmor

由来

suse12sp5 上古系统记得之前接手还是 17.05 rpm 安装的,后续找了个 19.03.15 的 rpm 安装上的,之所以使用 rpm 是因为 docker static bin 安装起不来(忘记啥报错了来着),然后现在尝试了 24.0.5 的能起来,但是容器无法启动,发现和 apparmor 有关。apparmor、seccomp和 SELinux 都是用于增强 Linux 系统安全性的安全性模块。它们提供了对进程、应用程序和系统资源的额外访问控制层。

过程

报错信息

系统信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
suse12sp5:~ # cat /etc/os-release 
NAME="SLES"
VERSION="12-SP5"
VERSION_ID="12.5"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP5"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12:sp5"
suse12sp5:~ # uname -a
Linux suse12sp5 4.12.14-120-default #1 SMP Thu Nov 7 16:39:09 UTC 2019 (fd9dc36) x86_64 x86_64 x86_64 GNU/Linux
suse12sp5:~ # apparmor_parser --version
AppArmor parser version 2.8.2
Copyright (C) 1999-2008 Novell Inc.
Copyright 2009-2012 Canonical Ltd.
suse12sp5:~ # rpm -qa | grep libsecc
libseccomp2-32bit-2.3.1-10.1.x86_64
libseccomp2-2.3.1-10.1.x86_64

24.0.5 的 docker static bin 安装好后导入镜像,无法起容器,报错:

1
Error response from daemon: AppArmor enabled on system but the docker-default profile could not be loaded: running `/sbin/apparmor_parser apparmor_parser -Kr /data/kube/docker/tmp/docker-default1881723382` failed with output: AppArmor parser error for /data/kube/docker/tmp/docker-default1881723382 in /data/kube/docker/tmp/docker-default1881723382 at line 16: syntax error, unexpected TOK_OPENPAREN, expecting TOK_MODE

代码里搜报错 AppArmor enabled on system but the 搜到 ensureDefaultAppArmorProfile 方法里:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// https://github.com/moby/moby/blob/v24.0.5/daemon/apparmor_default.go#L27
func ensureDefaultAppArmorProfile() error {
if apparmor.HostSupports() {
loaded, err := aaprofile.IsLoaded(defaultAppArmorProfile)
if err != nil {
return fmt.Errorf("Could not check if %s AppArmor profile was loaded: %s", defaultAppArmorProfile, err)
}

// Nothing to do.
if loaded {
return nil
}

// Load the profile.
if err := aaprofile.InstallDefault(defaultAppArmorProfile); err != nil {
return fmt.Errorf("AppArmor enabled on system but the %s profile could not be loaded: %s", defaultAppArmorProfile, err)
}
}

return nil
}

apparmor.HostSupportsos.Getenv("container") == ""/sys/module/apparmor/parameters/enabled 里是 Y 且存在命令 /sbin/apparmor_parser 则满足
aaprofile.IsLoaded 是等同于 grep 'docker-default ' /sys/kernel/security/apparmor/profiles 判断 apparmor 策略 docker-default 加载没,没加载则加载

docker 的 apparmor 生成和加载

然后就是 profiles/apparmor/apparmor.go 里的 generateDefault 和 InstallDefault:

  • docker daemon 进程读取 /proc/self/attr/current 为空则策略名为 unconfined
  • 创建 os.CreateTemp 临时文件,并用模板 profiles/apparmor/template.go 生成到临时文件里
  • 使用 /sbin/apparmor_parser -Kr <tmp_file> 加载策略
  • 无论加载成功还是失败,最后会把这个临时策略文件删掉

解决过程

上面报错就是策略文件内容有问题,但是 docker daemon 会把策略文件删掉,最开始 clone 源码修改不删掉临时文件后 make 替换,但是后面发现了个简单粗暴套路:

1
2
3
4
5
6
7
8
9
10
11
12
cp /sbin/apparmor_parser /sbin/apparmor_parser_REAL
cat > /sbin/apparmor_parser << 'EOF'
#!/bin/bash

echo "$*" >> /root/apparmor_parser_arg_log

if [ $# -eq 2 ]; then
cat $2 > /root/apparmor_parser_profile_log
fi

/sbin/apparmor_parser_REAL $*
EOF

然后 docker start 后 /root/apparmor_parser_profile_log 就生成内容了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43


#include <tunables/global>


profile docker-default flags=(attach_disconnected,mediate_deleted) {

#include <abstractions/base>


network,
capability,
file,
umount,
# Host (privileged) processes may send signals to container processes.
signal (receive) peer=unconfined,
# dockerd may send signals to container processes (for "docker kill").
signal (receive) peer=unconfined
,
# Container processes may send signals amongst themselves.
signal (send,receive) peer=docker-default,

deny @{PROC}/* w, # deny write for all files directly in /proc (not in a subdir)
# deny write to files not in /proc/<number>/** or /proc/sys/**
deny @{PROC}/{[^1-9],[^1-9][^0-9],[^1-9s][^0-9y][^0-9s],[^1-9][^0-9][^0-9][^0-9/]*}/** w,
deny @{PROC}/sys/[^k]** w, # deny /proc/sys except /proc/sys/k* (effectively /proc/sys/kernel)
deny @{PROC}/sys/kernel/{?,??,[^s][^h][^m]**} w, # deny everything except shm* in /proc/sys/kernel/
deny @{PROC}/sysrq-trigger rwklx,
deny @{PROC}/kcore rwklx,

deny mount,

deny /sys/[^f]*/** wklx,
deny /sys/f[^s]*/** wklx,
deny /sys/fs/[^c]*/** wklx,
deny /sys/fs/c[^g]*/** wklx,
deny /sys/fs/cg[^r]*/** wklx,
deny /sys/firmware/** rwklx,
deny /sys/kernel/security/** rwklx,

# suppress ptrace denials when using 'docker ps' or using 'ps' inside a container
ptrace (trace,read,tracedby,readby) peer=docker-default,
}

执行 /sbin/apparmor_parser -Kr /root/apparmor_parser_profile_log 和上面一样的报错,尝试后发现去掉 signalptrace 行才可以,然后在 rpm 19.03 环境上面这样 hack 后拿到的策略文件也是没有 signalptrace 行的,删掉这俩类型后加载就只报警告:

1
2
3
4
5
6
7
8
9
# -KR 卸载
$ apparmor_parser -Kr /root/apparmor_parser_profile_log
Warning from test (test line 38): profile docker-default mount rules not enforced
$ aa-status
...
docker-default (xxxx)
docker-default (xxxx)
docker-default (xxxx)
...

容器也能启动了,固化就把这个文件存放到 /etc/apparmor.d/docker-default ,这样提前加载后,docker daemon 判断加载了就不加载它的内置模板了。后续加了下下面几行:

1
@{PROC}/sys/kernel/ r,

apparmor 和一些其他信息

规则文件解释

#include 后面不是就绝对路径则是 /etc/apparmor.d/ 内的,提供一些变量命名和目录权限
profile <name|bin-path> flags=(xx,xxx,xxxx) { 名字(供外部 docker/k8s 使用该策略)或者二进制路径限制,flags 后面可以逗号或者空格分隔
- attach_disconnected 允许连接端口的进程
- mediate_deleted 允许进程访问已被删除的文件
- complain 默认是 enforce 模式,操作会被拒绝,complain 则会记录日志,kill 则 deny 的时候 kill 掉,查看进程模式可以通过 cat /proc/<pid>/attr/current 可以查看进程的策略名和模式,例如 dodcker-default enforce
还有其他属性就不介绍了,例如文件,网络,mount 啥的,自行搜下文档

开启打印后,可以系统日志里看到信息

1
2
echo 0 > /proc/sys/kernel/printk_ratelimit
echo -n "all" > /sys/module/apparmor/parameters/audit

docker 示例

搜相关 issue 的时候发现,之前的模板里有判断 apparmor_parser 的版本,后面又去掉了,是因为有人提交了

1
2
3
4
$ ls -l contrib/apparmor/
total 12
-rw-r--r-- 1 root root 916 Jan 8 21:25 main.go
-rw-r--r-- 1 root root 5357 Jan 8 21:24 template.go

看了下代码,可以生成参考的策略内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
$ go run contrib/apparmor/*.go 111
$ cat 111
@{DOCKER_GRAPH_PATH}=/var/lib/docker

profile /usr/bin/docker (attach_disconnected, complain) {
# Prevent following links to these files during container setup.
deny /etc/** mkl,
deny /dev/** kl,
deny /sys/** mkl,
deny /proc/** mkl,

mount -> @{DOCKER_GRAPH_PATH}/**,
mount -> /,
mount -> /proc/**,
mount -> /sys/**,
mount -> /run/docker/netns/**,
mount -> /.pivot_root[0-9]*/,

/ r,

umount,
pivot_root,
signal (receive) peer=@{profile_name},
signal (receive) peer=unconfined,
signal (send),
network,
capability,
owner /** rw,
@{DOCKER_GRAPH_PATH}/** rwl,
@{DOCKER_GRAPH_PATH}/network/files/boltdb.db k,
@{DOCKER_GRAPH_PATH}/network/files/local-kv.db k,
# For user namespaces:
@{DOCKER_GRAPH_PATH}/[0-9]*.[0-9]*/network/files/boltdb.db k,
@{DOCKER_GRAPH_PATH}/[0-9]*.[0-9]*/network/files/local-kv.db k,

# For non-root client use:
/dev/urandom r,
/dev/null rw,
/dev/pts/[0-9]* rw,
/run/docker.sock rw,
/proc/** r,
/proc/[0-9]*/attr/exec w,
/sys/kernel/mm/hugepages/ r,
/etc/localtime r,
/etc/ld.so.cache r,
/etc/passwd r,

ptrace peer=@{profile_name},
ptrace (read) peer=docker-default,
deny ptrace (trace) peer=docker-default,
deny ptrace peer=/usr/bin/docker///bin/ps,

/usr/lib/** rm,
/lib/** rm,

/usr/bin/docker pix,
/sbin/xtables-multi rCx,
/sbin/iptables rCx,
/sbin/modprobe rCx,
/sbin/auplink rCx,
/sbin/mke2fs rCx,
/sbin/tune2fs rCx,
/sbin/blkid rCx,
/bin/kmod rCx,
/usr/bin/xz rCx,
/bin/ps rCx,
/bin/tar rCx,
/bin/cat rCx,
/sbin/zfs rCx,
/sbin/apparmor_parser rCx,

# Transitions
change_profile -> docker-*,
change_profile -> unconfined,

profile /bin/cat (complain) {
/etc/ld.so.cache r,
/lib/** rm,
/dev/null rw,
/proc r,
/bin/cat mr,

# For reading in 'docker stats':
/proc/[0-9]*/net/dev r,
}
profile /bin/ps (complain) {
/etc/ld.so.cache r,
/etc/localtime r,
/etc/passwd r,
/etc/nsswitch.conf r,
/lib/** rm,
/proc/[0-9]*/** r,
/dev/null rw,
/bin/ps mr,

# We don't need ptrace so we'll deny and ignore the error.
deny ptrace (read, trace),

# Quiet dac_override denials
deny capability dac_override,
deny capability dac_read_search,
deny capability sys_ptrace,

/dev/tty r,
/proc/stat r,
/proc/cpuinfo r,
/proc/meminfo r,
/proc/uptime r,
/sys/devices/system/cpu/online r,
/proc/sys/kernel/pid_max r,
/proc/ r,
/proc/tty/drivers r,
}
profile /sbin/iptables (complain) {
signal (receive) peer=/usr/bin/docker,
capability net_admin,
}
profile /sbin/auplink flags=(attach_disconnected, complain) {
signal (receive) peer=/usr/bin/docker,
capability sys_admin,
capability dac_override,

@{DOCKER_GRAPH_PATH}/aufs/** rw,
@{DOCKER_GRAPH_PATH}/tmp/** rw,
# For user namespaces:
@{DOCKER_GRAPH_PATH}/[0-9]*.[0-9]*/** rw,

/sys/fs/aufs/** r,
/lib/** rm,
/apparmor/.null r,
/dev/null rw,
/etc/ld.so.cache r,
/sbin/auplink rm,
/proc/fs/aufs/** rw,
/proc/[0-9]*/mounts rw,
}
profile /sbin/modprobe /bin/kmod (complain) {
signal (receive) peer=/usr/bin/docker,
capability sys_module,
/etc/ld.so.cache r,
/lib/** rm,
/dev/null rw,
/apparmor/.null rw,
/sbin/modprobe rm,
/bin/kmod rm,
/proc/cmdline r,
/sys/module/** r,
/etc/modprobe.d{/,/**} r,
}
# xz works via pipes, so we do not need access to the filesystem.
profile /usr/bin/xz (complain) {
signal (receive) peer=/usr/bin/docker,
/etc/ld.so.cache r,
/lib/** rm,
/usr/bin/xz rm,
deny /proc/** rw,
deny /sys/** rw,
}
profile /sbin/xtables-multi (attach_disconnected, complain) {
/etc/ld.so.cache r,
/lib/** rm,
/sbin/xtables-multi rm,
/apparmor/.null w,
/dev/null rw,

/proc r,

capability net_raw,
capability net_admin,
network raw,
}
profile /sbin/zfs (attach_disconnected, complain) {
file,
capability,
}
profile /sbin/mke2fs (complain) {
/sbin/mke2fs rm,

/lib/** rm,

/apparmor/.null w,

/etc/ld.so.cache r,
/etc/mke2fs.conf r,
/etc/mtab r,

/dev/dm-* rw,
/dev/urandom r,
/dev/null rw,

/proc/swaps r,
/proc/[0-9]*/mounts r,
}
profile /sbin/tune2fs (complain) {
/sbin/tune2fs rm,

/lib/** rm,

/apparmor/.null w,

/etc/blkid.conf r,
/etc/mtab r,
/etc/ld.so.cache r,

/dev/null rw,
/dev/.blkid.tab r,
/dev/dm-* rw,

/proc/swaps r,
/proc/[0-9]*/mounts r,
}
profile /sbin/blkid (complain) {
/sbin/blkid rm,

/lib/** rm,
/apparmor/.null w,

/etc/ld.so.cache r,
/etc/blkid.conf r,

/dev/null rw,
/dev/.blkid.tab rl,
/dev/.blkid.tab* rwl,
/dev/dm-* r,

/sys/devices/virtual/block/** r,

capability mknod,

mount -> @{DOCKER_GRAPH_PATH}/**,
}
profile /sbin/apparmor_parser (complain) {
/sbin/apparmor_parser rm,

/lib/** rm,

/etc/ld.so.cache r,
/etc/apparmor/** r,
/etc/apparmor.d/** r,
/etc/apparmor.d/cache/** w,

/dev/null rw,

/sys/kernel/security/apparmor/** r,
/sys/kernel/security/apparmor/.replace w,

/proc/[0-9]*/mounts r,
/proc/sys/kernel/osrelease r,
/proc r,

capability mac_admin,
}
}

suse 的一些信息

suse secrets 的 patch

一些故障

UOS 系统上容器无法启动:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
root@user-PC:~# cat /etc/os-release 
PRETTY_NAME="UnionTech OS Server 20 Enterprise"
NAME="UnionTech OS Server 20 Enterprise"
VERSION_ID="20"
VERSION="20"
ID=UOS
HOME_URL="https://www.chinauos.com/"
BUG_REPORT_URL="http://bbs.chinauos.com"
VERSION_CODENAME=fou
root@user-PC:~# uname -a
Linux user-PC 4.19.0-arm64-server #3211 SMP Thu Apr 15 10:21:53 CST 2021 aarch64 GNU/Linux
root@user-PC:~# lscpu
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 1
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 4
Vendor ID: 0x48
Model: 0
Model name: HUAWEI Kunpeng 920 5231K
Stepping: 0x1
BogoMIPS: 200.00
L1d cache: 4 MiB
L1i cache: 4 MiB
L2 cache: 32 MiB
L3 cache: 128 MiB
NUMA node0 CPU(s): 0-15
NUMA node1 CPU(s): 16-31
NUMA node2 CPU(s): 32-47
NUMA node3 CPU(s): 48-63
Flags: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpae evtstrm
root@user-PC:~# docker info
Client:
Version: 25.0.5
Context: default
Debug Mode: false

Server:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 1
Server Version: 25.0.5
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: true
Native Overlay Diff: false
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7c3aca7a610df76212171d200ca3811ff6096eb8
runc version: v1.1.12-0-g51d5e94
init version: de40ad0
Security Options:
seccomp
Profile: builtin
Kernel Version: 4.19.0-arm64-server
Operating System: UnionTech OS Server 20 Enterprise
OSType: linux
Architecture: aarch64
CPUs: 64
Total Memory: 255.4GiB
Name: user-PC
ID: 36c34e5e-d83f-4b96-902d-a5b3b605cec2
Docker Root Dir: /data3/kube/docker
Debug Mode: false
Experimental: false
Insecure Registries:
reg.xxx.lan:5000
treg.yun.xxx.cn
0.0.0.0/0
127.0.0.0/8
Registry Mirrors:
https://registry.docker-cn.com/
https://docker.mirrors.ustc.edu.cn/
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

启动报错:

1
2
3
$ docker run -d --name t1 2e72
50e7dc221804297487e057055663c8315cd3a3218aacb631394798cdf9e9f8da
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: unable to apply apparmor profile: apparmor failed to apply profile: write /proc/self/attr/exec: invalid argument: unknown.

搜了几个相关 issue,docker 和 runc 版本都足够新,机器上也有安装 /sbin/apparmor_parser ,之前有在 4.1x 内核上发现 apparmor 功能不完整, 关闭 apparmor 尝试下就好了:

1
2
3
# GRUB_CMDLINE_LINUX 或者 GRUB_CMDLINE_LINUX_DEFAULT
# 添加 apparmor=0 security=
vi '+:set mouse-=a' /etc/default/grub

更新 grub

1
2
3
4
5
6
7
systemctl disable --now apparmor
# apt
update-grub
# yum
grub2-mkconfig -o /etc/grub2.cfg

reboot

重启后的:

1
2
$ cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-4.19.0-arm64-server root=UUID=a17eff26-xxxx-4899-xxxx-975b0ec48ca4 ro splash quiet console=tty plymouth.ignore-serial-consoles apparmor=0 security= DEEPIN_GFXMODE=

参考

CATALOG
  1. 1. 由来
  2. 2. 过程
    1. 2.1. 报错信息
    2. 2.2. docker 的 apparmor 生成和加载
    3. 2.3. 解决过程
  3. 3. apparmor 和一些其他信息
    1. 3.1. 规则文件解释
    2. 3.2. docker 示例
    3. 3.3. suse 的一些信息
  4. 4. 一些故障
  5. 5. 参考