zhangguanzhang's Blog

ansible reload user group

字数统计: 891阅读时长: 4 min
2020/11/23

记录下今天被 ansible 会话持久坑到的一个 user group 问题

由来

2022/03/08发生,改下文章的 date,让排名在前面。

我们部署 docker 的步骤是这样:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
- name: "add group: 'docker'"
group:
name: docker
state: present
- name: adding existing user to group docker
user:
name: '{{ item }}'
groups: docker
append: yes
with_items:
- root
- "{{ ansible_ssh_user | default('root') }}"
# if use ansible_connection=local, the var ansible_ssh_user will not defined
when: ansible_ssh_user is defined
...
启动 docker

然后部署完 registry 的时候 docker login:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
- include_tasks: install_registry.yml
when:
- not registry.third_enable
- inventory_hostname == groups['node'][0]

- name: perform docker login
environment:
PATH: "{{ bin_dir }}:{{ ansible_env.PATH }}"
shell: docker login {{ REGISTRY_DOMAIN }}:{{ REGISTRY_PORT }} -u xxx -p xxx
# 非 root 执行-u xxx -b --become-method=sudo 的时候 登录
- name: perform docker login for ansible_ssh_user
shell: docker login {{ REGISTRY_DOMAIN }}:{{ REGISTRY_PORT }} -u xxx -p xxx
become: yes
become_user: "{{ ansible_ssh_user }}"
when:
# if use ansible_connection=local, the var ansible_ssh_user will not defined
- ansible_ssh_user is defined
- ansible_ssh_user != 'root'

但是执行完测试同学在非 root 用户 docker pull 报错没权限

解决过程

查看了下执行日志,并没有执行 perform docker login for ansible_ssh_user。找个干净环境单独执行下面 task 调试:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
- hosts:
- all
tasks:
- name: test
shell: docker login {{ REGISTRY_DOMAIN }}:{{ REGISTRY_PORT }} -u xxx -p xxx
when:
# if use ansible_connection=local, the var ansible_ssh_user will not defined
- ansible_ssh_user is defined
- ansible_ssh_user != 'root'
- name: test2
shell: docker login {{ REGISTRY_DOMAIN }}:{{ REGISTRY_PORT }} -u xxx -p xxx
become: yes
become_user: "{{ ansible_ssh_user }}"
when:
# if use ansible_connection=local, the var ansible_ssh_user will not defined
- ansible_ssh_user is defined
- ansible_ssh_user != 'root'

我们是使用非 root 带 sudo 部署的:

1
ansible-playbook -i conf/k8s.conf tasks/test.yml --private-key=/root/.ssh/id_rsa -u xxx -b --become-method=sudo 

发现第二个报错:

1
Post http://%2Fvar%2Frun%2Fdocker.sock/v1.40/auth: dial unix /var/run/docker.sock: connect: permission denied

ssh 上去看了下,/var/run/docker.sock 的组是 docker,ansible_ssh_user 也包含 docker 组:

1
2
$ id
uid=1000(xxx) gid=1000(xxx) groups=1000(xxx),1001(docker)

最后改了下前面加下 id 命令调试看出眉目:

1
2
3
4
5
6
7
8
9
10
...
- name: test2
shell: |
id;docker login {{ REGISTRY_DOMAIN }}:{{ REGISTRY_PORT }} -u xxx -p xxx
become: yes
become_user: "{{ ansible_ssh_user }}"
when:
# if use ansible_connection=local, the var ansible_ssh_user will not defined
- ansible_ssh_user is defined
- ansible_ssh_user != 'root'

输出:

1
"stdout": "uid=1000(xxx) gid=1000(xxx) 组=1000(xxx)", "stdout_lines": ["uid=1000(xxx) gid=1000(xxx) 组=1000(xxx)"]

我们的安装步骤时间线是:

  • 会话持久开始,执行 ansible,最开始 xxx 用户并不在 docker 组里
  • 安装 docker 前,添加 docker 组,xxx 用户追加到 docker 组里,启动 docker
  • root 和非 root docker login,但是非 root docker login 报错
  • rootdocker login 前面加 id; 命令,输出结果显示 xxx 用户并不在 docker 组里

突然意识到是 ansible 配置的 ssh 会话持久导致的,删掉会话持久的文件路径后是可以的,但是这样不人性化。搜了下 ansible reload user group 找到了有个老哥和我一样的问题,但是里面的:

1
2
- name: reset ssh connection
meta: reset_connection

执行会报错:

1
[WARNING]: Reset is not implemented for this connection

后面自己解决了该问题,使用 su -c 不使用 become ,同时发现偶尔 ansible_ssh_user 莫名奇妙会消失,在之前就 set_facts 下 ansible_ssh_user 到另一个变量 HOST_USER (下面代码没改,如果遇到了就提前自行 set_facts 下):

1
2
3
4
5
6
7
- name: test2
shell: |
su -c 'docker login {{ REGISTRY_DOMAIN }}:{{ REGISTRY_PORT }} -u xxx -p xxx' ansible_ssh_user
when:
# if use ansible_connection=local, the var ansible_ssh_user will not defined
- ansible_ssh_user is defined
- ansible_ssh_user != 'root'

参考

CATALOG
  1. 1. 由来
    1. 1.1. 解决过程
  2. 2. 参考