zhangguanzhang's Blog

私有化下(CentOS 7)Podman调研

字数统计: 1.8k阅读时长: 9 min
2025/08/06

CentOS 7 上 Podman调研…..

由来

内部需要调研 podman 替换掉 docker,因为我们私有化要适配很多操作系统(很多客户内部规定了必须使用啥系统,所以要支持),使用最常见的 CentOS 7:

1
2
3
4
$ cat /etc/redhat-release 
CentOS Linux release 7.8.2003 (Core)
$ uname -a
Linux xxx 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

过程

离线安装

官方安装文档上并没有 centos7 的安装,因为已经 EOL 好几年。考虑到客户会无网,需要类似 docker-static 那样,搜索谷歌和 github 找到 podman-static。podman 最新版本是v5,下载 v5.5.2 的压缩包后解压。

daemon 进程相关

因为我们还需要调用 API,所以需要起 daemon 进程监听 socket 和 tcp,需要以下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ podman system service -h
Run API service

Description:
Run an API service

Enable a listening service for API access to Podman commands.


Usage:
podman system service [options] [URI]

Examples:
podman system service --time=0 unix:///tmp/podman.sock
podman system service --time=0 tcp://localhost:8888

Options:
--cors string Set CORS Headers
-t, --time uint Time until the service session expires in seconds. Use 0 to disable the timeout (default 5)

发现无法同时监听 tcp 和 socket,并且 tcp 不支持tls选项,搜索issue Support (m)TLS API socket发现暂未支持,只能使用 socket 监听。

然后发现无法监听指定路径 socket文件:

1
API service listening on \"/run/podman/podman.sock\". URI: \"unix:///var/run/docker.sock\""

查阅 podman 源码,发现走到以下逻辑:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// https://github.com/containers/podman/blob/v5.5.2/cmd/podman/system/service_abi.go#L58-L77
switch uri.Scheme {
case "unix":
path, err := filepath.Abs(uri.Path)
if err != nil {
return err
}
if os.Getenv("LISTEN_FDS") != "" {
// If it is activated by systemd, use the first LISTEN_FD (3)
// instead of opening the socket file.
f := os.NewFile(uintptr(3), "podman.sock")
listener, err = net.FileListener(f)
if err != nil {
return err
}
} else {
listener, err = net.Listen(uri.Scheme, path)
if err != nil {
return fmt.Errorf("unable to create socket: %w", err)
}
}

通过查看 env 发现确实有 LISTEN_FDS 的 env:

1
2
3
4
5
6
$ xargs -n1 -0 < /proc/$(pgrep podman)/environ
LANG=zh_CN.UTF-8
PATH="/data/kube/bin:/bin:/sbin:/usr/bin:/usr/sbin"
LISTEN_PID=9047
LISTEN_FDS=1
LOGGING="--log-level=info"

看了下代码说明,是支持 systemd 的 socket 主动激活,我们不需要,去掉压缩包里的:

  1. system/podman.socket
  2. system/podman.service 内 require和 after podman.socket

然后启动可行:

1
API service listening on "/var/run/docker.sock". URI: "unix:///var/run/docker.sock"

info 的 format 差异

我们使用到了部分 info 里的 format 存在差异:

  1. '{{.OSType}}' -> '{{.Host.OS}}'
  2. '{{.DockerRootDir}}' -> '{{.Store.GraphRoot}}'

非 host 网络容器

部署后发现无法启动非 host 网络容器:

1
2
$ docker run --name registry_pass --entrypoint htpasswd registry:2.7.1
Error: creating network namespace for container 3de0fd230fd7693a107de4b56e5ab1444a558ae1e835e78e292ed364915a6362: failed to create namespace: failed to bind mount ns at /run/netns/netns-b0f807fe-e630-3182-e16b-5b5837e2b1a3: no such file or directory

golang 代码的 Error 是信息叠加的,所以可以直接搜索报错 creating network namespace for container ,找到报错代码:

1
2
3
4
5
6
// https://github.com/containers/podman/blob/v5.5.2/libpod/networking_linux.go#L77-L81
func (r *Runtime) createNetNS(ctr *Container) (n string, q map[string]types.StatusBlock, retErr error) {
ctrNS, err := netns.NewNS()
if err != nil {
return "", nil, fmt.Errorf("creating network namespace for container %s: %w", ctr.ID(), err)
}

跳转到 netns.NewNS() 内:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
func NewNS() (ns.NetNS, error) {
nsRunDir, err := GetNSRunDir()
if err != nil {
return nil, err
}

// Create the directory for mounting network namespaces
// This needs to be a shared mountpoint in case it is mounted in to
// other namespaces (containers)
err = makeNetnsDir(nsRunDir)
if err != nil {
return nil, err
}

for range 10000 {
nsName, err := getRandomNetnsName()
if err != nil {
return nil, err
}
nsPath := path.Join(nsRunDir, nsName)
ns, err := newNSPath(nsPath)
if err == nil {
return ns, nil
}
// retry when the name already exists
if errors.Is(err, os.ErrExist) {
continue
}
return nil, err
}
return nil, errNoFreeName
}

是那行 makeNetnsDir() 报错,内部都是基础的文件和 ns mount 操作:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
func makeNetnsDir(nsRunDir string) error {
err := os.MkdirAll(nsRunDir, 0o755)
if err != nil {
return err
}
// Important, the bind mount setup is racy if two process try to set it up in parallel.
// This can have very bad consequences because we end up with two duplicated mounts
// for the netns file that then might have a different parent mounts.
// Also because as root netns dir is also created by ip netns we should not race against them.
// Use a lock on the netns dir like they do, compare the iproute2 ip netns add code.
// https://github.com/iproute2/iproute2/blob/8b9d9ea42759c91d950356ca43930a975d0c352b/ip/ipnetns.c#L806-L815

dirFD, err := unix.Open(nsRunDir, unix.O_RDONLY|unix.O_DIRECTORY|unix.O_CLOEXEC, 0)
if err != nil {
return &os.PathError{Op: "open", Path: nsRunDir, Err: err}
}
// closing the fd will also unlock so we do not have to call flock(fd,LOCK_UN)
defer unix.Close(dirFD)

err = unix.Flock(dirFD, unix.LOCK_EX)
if err != nil {
return fmt.Errorf("failed to lock %s dir: %w", nsRunDir, err)
}

// Remount the namespace directory shared. This will fail with EINVAL
// if it is not already a mountpoint, so bind-mount it on to itself
// to "upgrade" it to a mountpoint.
err = unix.Mount("", nsRunDir, "none", unix.MS_SHARED|unix.MS_REC, "")
if err == nil {
return nil
}
if err != unix.EINVAL {
return fmt.Errorf("mount --make-rshared %s failed: %q", nsRunDir, err)
}

// Recursively remount /run/netns on itself. The recursive flag is
// so that any existing netns bindmounts are carried over.
err = unix.Mount(nsRunDir, nsRunDir, "none", unix.MS_BIND|unix.MS_REC, "")
if err != nil {
return fmt.Errorf("mount --rbind %s %s failed: %q", nsRunDir, nsRunDir, err)
}

// Now we can make it shared
err = unix.Mount("", nsRunDir, "none", unix.MS_SHARED|unix.MS_REC, "")
if err != nil {
return fmt.Errorf("mount --make-rshared %s failed: %q", nsRunDir, err)
}

return nil
}

下载代码后编译 podman 调试看看 makeNetnsDir() 具体哪个步骤出问题,发现无法跳到断点:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ dlv exec bin/podman --  run  -ti --entrypoint ls docker.io/library/registry:2.7.1
Type 'help' for list of commands.
(dlv) b libpod/networking_linux.go:78
Breakpoint 1 set at 0x1555494 for github.com/containers/podman/v5/libpod.(*Runtime).createNetNS() ./libpod/networking_linux.go:78
(dlv) c
WARN[0000] Using cgroups-v1 which is deprecated in favor of cgroups-v2 with Podman v5 and will be removed in a future version. Set environment variable `PODMAN_IGNORE_CGROUPSV1_WARNING` to hide this warning.
WARN[0000] The input device is not a TTY. The --tty and --interactive flags might not work properly
received SIGINT, stopping process (will not forward signal)
> runtime.futex() /usr/local/go/src/runtime/sys_linux_amd64.s:558 (PC: 0x492243)
Warning: debugging optimized function
553: MOVQ ts+16(FP), R10
554: MOVQ addr2+24(FP), R8
555: MOVL val3+32(FP), R9
556: MOVL $SYS_futex, AX
557: SYSCALL
=> 558: MOVL AX, ret+40(FP)

然后发现 podman info 也卡住:

1
2
3
$ bin/podman  info
WARN[0000] Using cgroups-v1 which is deprecated in favor of cgroups-v2 with Podman v5 and will be removed in a future version. Set environment variable `PODMAN_IGNORE_CGROUPSV1_WARNING` to hide this warning.
^C

看了下 podman info 调用链,调试了下发现:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
(dlv) n
> github.com/containers/podman/v5/libpod.(*Runtime).hostInfo() ./libpod/info.go:106 (PC: 0x21d942e)
101: cpuUtil, err := getCPUUtilization()
102: if err != nil {
103: return nil, err
104: }
105:
=> 106: locksFree, err := r.lockManager.AvailableLocks()
107: if err != nil {
108: return nil, fmt.Errorf("getting free locks: %w", err)
109: }
110:
111: info := define.HostInfo{
(dlv) n

看了下代码是卡在 cgo 的 shm lock 那里,无法调试找到网络ns创建问题。

v4版本尝试

v4 已经不维护,官方主干版本是 v5,尝试下载了最新的v4 v4.9.5 启动报错不支持:

1
2
$ podman run --entrypoint ls alpine:latest
Error: netavark: create veth pair: Netlink error: Not supported (os error 95)

对于v5和v4的报错均搜索到类似的问题:

centos7的3.10内核不满足最低的4.18

结论

  1. podman 现阶段不支持 tls,并且 socket 和 tcp 无法同时监听,安全问题只能使用 socket 文件,无法远程管理
  2. 私有化需要支持众多操作系统下,内核版本跨度从老到新都有,无法使用 podman,如果不是私有化自己单一环境且有网的情况下,可以使用
CATALOG
  1. 1. 由来
  2. 2. 过程
    1. 2.1. 离线安装
    2. 2.2. daemon 进程相关
    3. 2.3. info 的 format 差异
    4. 2.4. 非 host 网络容器
    5. 2.5. v4版本尝试
  3. 3. 结论