zhangguanzhang's Blog

openshift 4.5.9 etcd损坏+脑裂修复过程

字数统计: 4.1k阅读时长: 22 min
2021/06/08

前言介绍

内部机器和环境都是在 vcenter 里,之前的 ocp 集群是 3 master + 1 worker,也就是之前的openshift 4.5.9 离线安装后的环境,后面有几台宿主机负载太高,同事看我机器负载最高,关了几台,这几天需要用下 openshift 环境。登录到 bastion 上 get 超时,看了下 haproxy 的 stat web,全部红了。。然后把所有机器开机后发现还是起不来。

操作

openshift 的 master 节点和 kubeadm 很像,几个组件都是 staticPod 形式起的。客户端也不是 docker,使用 crictl 就行了

查看 kube-apiserver

ssh 到 master1 上,查看日志发现是 etcd 无法起来

1
2
3
ssh -i ~/.ssh/new_rsa core@10.x.45.251
crictl ps -a | grep kube-apiserver
crictl logs xxx

etcd 只有一台正常,一台日志报错 snap 文件损坏,一台报错 revision 太低,先进入正常的那台上面 etcd 容器(下文所有 etcdctl 都是在 etcd 容器里执行的,进容器就是下面命令):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
crictl ps -a | grep etcd
crictl exec -ti xxx bash

$ env | grep ETCDCTL
ETCDCTL_CERT=/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-master2.openshift4.example.com.crt
ETCDCTL_ENDPOINTS=https://10.x.45.251:2379,https://10.x.45.252:2379,https://10.x.45.222:2379
ETCDCTL_CACERT=/etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt
ETCDCTL_API=3
ETCDCTL_KEY=/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-master2.openshift4.example.com.key

$ etcd --version
etcd Version: 3.4.9
Git SHA: 4657b9e
Go Version: go1.13.4
Go OS/Arch: linux/amd64

故障的开始

这里有个知识点就是 etcd 和 etcdctl 的一些命令行选项都可以被环境变量替代,例如上面的这些。忘了从哪个版本开始了。 etcdctl snapshot save 时候 endpoints 只能指定一个节点,尝试备份,结果卡住:

1
2
$ ETCDCTL_ENDPOINTS=https://10.x.45.252:2379 etcdctl snapshot save 0608-etcd.db
{"level":"info","ts":1623124958.5931418,"caller","snapshot/v3_snapshot.go:119","msg":"create temporary db file","path":"0608-etcd.db.part"}

一点进展

这套集群当初以为就用一下,没考虑备份。然后在机器上乱逛,发现了高版本集群是自带了备份的(至少我这个版本是自带了)。在目录 /etc/kubernetes/rollbackcopy 下:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ cd /etc/kubernetes/
$ ll rollbackcopy/*
rollbackcopy/currentVersion.latest:
total 707152
-rw-r--r--. 1 root root 58 Jun 8 09:17 backupenv.json
-rw-------. 1 root root 724054048 Jun 8 09:17 snapshot_2021-06-08_091655.db
-rw-------. 1 root root 59426 Jun 8 09:17 static_kuberesources_2021-06-08_091655.tar.gz

rollbackcopy/currentVersion.prev:
total 707152
-rw-r--r--. 1 root root 58 Jun 8 08:11 backupenv.json
-rw-------. 1 root root 724054048 Jun 8 08:11 snapshot_2021-06-08_081148.db
-rw-------. 1 root root 59426 Jun 8 08:11 static_kuberesources_2021-06-08_081148.tar.gz

然后像用 etcdctl 恢复备份,从容器里拷贝出来,结果 crictl 没 cp 命令,查看了下 etcdctl 的挂载,想进 etcd 容器里把 etcdctl 复制到挂载的路径上,这样宿主机上就有了。结果搞出来之后,习惯性的把二进制文件移到/usr/local/bin/下,结果 tab 按键补全看到有下面几个脚本:

1
2
$ ls /usr/local/bin/
cluster-backup.sh cluster-restore.sh recover-kubeconfig.sh

自带的备份恢复

查看了下 cluster-restore.sh 脚本,脚本第一个参数是指定备份目录,也就是上面发现的目录,尝试在不正常的两个节点上 运行下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ cd /etc/kubernetes/rollbackcopy/currentVersion.latest
$ cluster-restore.sh .
...stopping kube-apiserver-pod.yml
...stopping kube-controller-manager-pod.yml
...stopping kube-scheduler-pod.yml
...stopping etcd-pod.yml
Waiting for container etcd to stop
complete
Waiting for container etcdctl to stop
...................................complete
Waiting for container etcd-metrics to stop
complete
Waiting for container kube-controller-manager to stop
complete
Waiting for container kube-apiserver to stop
.........................complete
Waiting for container kube-scheduler to stop
complete
starting restore-etcd static pod
starting kube-apiserver-pod.yml
static-pod-resource/kube-apiserver-pod-50/kube-apiserver-pod.yaml
starting kube-controller-manager-pod.yml
static-pod-resource/kube-controller-manager-pod-7/kube-controller-manager-pod.yml
starting kube-scheduler-pod.yml
static-pod-resource/kube-scheduler-pod-7/kube-scheduler-pod.yml

这个脚本运行期间的等待容器停止要根据实际情况可能需要自己去手动stop,可以使用下面的去 stop 相关容器:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

STATIC_POD_CONTAINERS=("etcd" "etcdctl" "etcd-metrics" "kube-controller-manager" "kube-apiserver" "kube-scheduler")
function wait_for_containers_to_stop(){
local CONTAINERS=("$@") ctrID

for NAME in "${CONTAINERS[@]}"; do
echo "Waiting for container ${NAME} to stop"
ctrID="$(crictl ps --label io.kubernetes.container.name=${NAME} -q)"
if [ -n "$ctrID" ];then
crictl stop $ctrID
fi
done
}

wait_for_containers_to_stop ${STATIC_POD_CONTAINERS[*]}

执行完后的状态:

1
2
3
4
5
6
7
8
$ crictl ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
44f09a621803e 5e0c1e21da05b4b6455632cb70d9e29c76a5710e0bfa129ebef00d1cc1d5ee85 39 seconds ago Running etcd 0 ad5b9a73adb3a
b728530b11ea7 b7838c3ae6383695ca8c6b3e900e9b9ce221d843bf16a7c61fe1a5e13f58f4a6 40 seconds ago Running kube-scheduler 1 af1e63ab536fd
ec9b583321b64 b7838c3ae6383695ca8c6b3e900e9b9ce221d843bf16a7c61fe1a5e13f58f4a6 40 seconds ago Running kube-apiserver 26 f68ce6fa73ad6
05431a2d159b9 b7838c3ae6383695ca8c6b3e900e9b9ce221d843bf16a7c61fe1a5e13f58f4a6 40 seconds ago Running kube-controller-manager 1 cc1a03361154d
4a1d734b14faf b7838c3ae6383695ca8c6b3e900e9b9ce221d843bf16a7c61fe1a5e13f58f4a6 36 minutes ago Exited kube-apiserver 25 f68ce6fa73ad6
...

kube-apiserver起来了,然后能使用 oc 了:

1
2
3
4
5
6
$ oc get node
NAME STATUS ROLES AGE VERSION
master1.openshift4.example.com Ready master,worker 262d v1.18.3+6c42de8
master2.openshift4.example.com Ready master,worker 262d v1.18.3+6c42de8
master3.openshift4.example.com Ready master,worker 262d v1.18.3+6c42de8
worker1.openshift4.example.com Ready worker 259d v1.18.3+6c42de8

etcd 的脑裂

然后我的开发 namespaces 下有个 pod pending,kubectl 删了下报错,大致是 etcd 删不掉啥的。然后看了下 etcd 的状态。

1
2
3
4
5
6
7
8
$ etcdctl endpoint status --write-out=table
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.x.45.251:2379 | 7ad933dac58f4549 | 3.4.9 | 724 MB | true | false | 2 | 30260441 | 30260441 | |
| https://10.x.45.252:2379 | f4351098cae1d407 | 3.4.9 | 726 MB | true | false | 2 | 40195520 | 40195520 | |
| https://10.x.45.222:2379 | 2399ef0cea33ebf3 | 3.4.9 | 724 MB | true | false | 2 | 50408739 | 50408739 | |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

没错,脑裂了。三个都找不到其他的:

1
2
3
4
5
6
$ ETCDCTL_ENDPOINTS=https://10.x.45.251:2379 etcdctl member list
7ad933dac58f4549, started, master1.openshift4.example.com, https://10.x.45.251:2380, https://10.x.45.251:2379, false
$ ETCDCTL_ENDPOINTS=https://10.x.45.252:2379 etcdctl member list
f7f6c198cb519536, started, master2.openshift4.example.com, https://10.x.45.252:2380, https://10.x.45.252:2379, false
$ ETCDCTL_ENDPOINTS=https://10.x.45.222:2379 etcdctl member list
ac62bec820f40228, started, master3.openshift4.example.com, https://10.x.45.222:2380, https://10.x.45.222:2379, false

处理脑裂

前置准备

尝试下 move-leader 命令看看能否操作:

1
2
3

$ ETCDCTL_ENDPOINTS=https://10.x.45.222:2379 etcdctl move-leader 7ad933dac58f4549
2021-06-08 02:59:35.782766 C | pkg/flags: conflicting environment variable "ETCDCTL_ENDPOINTS" is shadowed by corresponding command-line flag (either unset environment variable or disable flag)

说环境变量和命令行同时设置了 endpoints ,unset 下它后尝试

1
2
3
4
$ unset ETCDCTL_ENDPOINTS
$ export ETCDCTL_ENDPOINTS=https://10.x.45.222:2379
$ etcdctl move-leader 7ad933dac58f4549
2021-06-08 03:00:11.019150 C | pkg/flags: conflicting environment variable "ETCDCTL_CERT" is shadowed by corresponding command-line flag (either unset environment variable or disable flag)

然后另一个变量报错,搜了下这个是 etcdctl move-leader 的 bug,见 pr ,手动 unset 相关 ETCDCTL_xxx 变量后执行下报错:

1
2
3
4
5
6
$ etcdctl --endpoints https://10.x.45.222:2379 \
--cacert=/etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt \
--cert=/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-master3.openshift4.example.com.crt \
--key=/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-master3.openshift4.example.com.key \
move-leader 7ad933dac58f4549
{"level":"warn","ts":"2021-06-08T03:07:08.767Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-27e0f779-7d90-4be3-9491-f0e915374f3c/10.xxx.45.222:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: bad leader transferee"}

好吧,move-leader对这种场景用不了。不过现在三个都起来了,应该是能备份了。准备在其他节点上用备份恢复下,先看了下 etcd staticPod 的 yaml /etc/kubernetes/manifests/etcd-pod.yaml 的内容启动参数是否需要调整:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
...
command:
- /bin/sh
- -c
- |
#!/bin/sh
set -euo pipefail
...
if [ ! -z $(ls -A "/var/lib/etcd") ]; then
echo "please delete the contents of data directory before restoring, running the restore script will do this for you"
exit 1
fi

# check if we have backup file to be restored
# if the file exist, check if it has not changed size in last 5 seconds
if [ ! -f /var/lib/etcd-backup/snapshot.db ]; then
echo "please make a copy of the snapshot db file, then move that copy to /var/lib/etcd-backup/snapshot.db"
exit 1
else
...

从逻辑看是启动的时候如果数据目录必须不为空,然后/var/lib/etcd-backup/目录得存在备份的 db 文件,看了下挂载目录,宿主机上也是这个目录。打算先在第一个 master 节点的 etcd 容器里备份。然后用备份文件在其他节点恢复备份。

1
2
3
4
5
6
7
[root@master1 /]$ ETCDCTL_ENDPOINTS=https://10.x.45.251:2379 etcdctl  snapshot save /var/lib/etcd-backup/snapshot.db
{"level":"info","ts":1623143517.6130972,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/var/lib/etcd-backup/snapshot.db.part"}
{"level":"info","ts":"2021-06-08T09:11:57.621Z","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1623143517.6213868,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://10.x.45.251:2379"}
{"level":"info","ts":"2021-06-08T09:12:03.315Z","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1623143524.4544568,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://10.x.45.251:2379","size":"724 MB","took":6.841299825}
{"level":"info","ts":1623143524.4545853,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/var/lib/etcd-backup/snapshot.db"}

然后停掉后面的两台 master 节点的 etcd。改名 /var/lib/etcd 目录(不要一上来就删除目录,改名是永远最稳妥的手段)。

1
2
3
4
5
6
7
8
9
cd /etc/kubernetes/
cp manifests/etcd-pod.yaml .
mv manifests/etcd-pod.yaml /tmp/

etcdID="$(crictl ps --label io.kubernetes.container.name=etcd -q)"
if [ -n "$etcdID" ];then
crictl stop $etcdID
fi
mv /var/lib/etcd /var/lib/etcd-bak

然后用密钥 ssh 到其他机器上把 ssh 的 root 和密码登录开了来让我们可以使用 scp 过去,然后备份文件/var/lib/etcd-backup/snapshot.db scp 过去到其余 master 上的同样路径。

处理脑裂

细心观察看前面的每个 endpoint 下 member list 是看到的自己的。所以是在 master1 上逐渐添加其他 member。

1
2
3
4
5
6
7
8

$ ETCDCTL_ENDPOINTS=https://10.x.45.251:2379 etcdctl member add master2.openshift4.example.com --peer-urls=https://10.x.45.252:2380
Member 3e27197aa4521ea0 added to cluster 1c2134e7d41c45b1

ETCD_NAME="master2.openshift4.example.com"
ETCD_INITIAL_CLUSTER="master2.openshift4.example.com=https://10.x.45.252:2380,master1.openshift4.example.com=https://10.x.45.251:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.x.45.252:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

然后在第二个 master 上改下 etcd 的 staticPod yaml /tmp/etcd-pod.yaml。根据自身的实际更改,删掉备份和恢复相关的逻辑。主要是更改 ETCD_INITIAL_CLUSTER 为所有集群,格式为 ${name1}=https://${ip1}:2380,${name2}=https://${ip2}:2380...

1
2
3
4
5
6
7
8
9
...
#export ETCD_INITIAL_CLUSTER=xxx
NAME_ETCD_ARRAY=()
for i in $(env | grep -Po '(?<=NODE_).+(?=_ETCD_NAME)' | sort );do
etcd_name=NODE_${i}_ETCD_NAME
url_host_var=NODE_${i}_ETCD_URL_HOST
NAME_ETCD_ARRAY+=("${!etcd_name}=https://${!url_host_var}:2380")
done
export ETCD_INITIAL_CLUSTER=$( echo ${NAME_ETCD_ARRAY[*]} | tr ' ' ',' )

改好后在 master2 上启动 etcd :

1
mv /tmp/etcd-pod.yaml /etc/kubernetes/manifests/

在 master1 上查看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@master1 /]$ ETCDCTL_ENDPOINTS=https://10.x.45.251:2379 etcdctl member list
3e27197aa4521ea0, unstarted, , https://10.x.45.252:2380, , false
831fd1ef9bc83a2b, started, master1.openshift4.example.com, https://10.x.45.251:2380, https://10.x.45.251:2379, false
[root@master1 /]$ ETCDCTL_ENDPOINTS=https://10.x.45.251:2379 etcdctl member list
3e27197aa4521ea0, unstarted, , https://10.x.45.252:2380, , false
831fd1ef9bc83a2b, started, master1.openshift4.example.com, https://10.x.45.251:2380, https://10.x.45.251:2379, false
[root@master1 /]$ ETCDCTL_ENDPOINTS=https://10.x.45.251:2379 etcdctl member list
3e27197aa4521ea0, started, master2.openshift4.example.com, https://10.x.45.252:2380, https://10.x.45.252:2379, false
831fd1ef9bc83a2b, started, master1.openshift4.example.com, https://10.x.45.251:2380, https://10.x.45.251:2379, false
[root@master1 /]$ etcdctl endpoint status -w table
{"level":"warn","ts":"2021-06-08T09:26:16.669Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://10.x.45.222:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 10.x.45.222:2379: connect: connection refused\""}
Failed to get the status of endpoint https://10.x.45.222:2379 (context deadline exceeded)
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.x.45.251:2379 | 831fd1ef9bc83a2b | 3.4.9 | 724 MB | true | false | 632 | 1041 | 1041 | |
| https://10.x.45.252:2379 | 3e27197aa4521ea0 | 3.4.9 | 724 MB | false | false | 632 | 1041 | 1041 | |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

好消息,然后恢复第三个,添加第三个member:

1
2
3
4
5
6
7
[root@master1 /]$ ETCDCTL_ENDPOINTS=https://10.x.45.251:2379 etcdctl member add master3.openshift4.example.com --peer-urls=https://10.x.45.222:2380
Member d319fa1cbb0e28fe added to cluster 1c2134e7d41c45b1

ETCD_NAME="master3.openshift4.example.com"
ETCD_INITIAL_CLUSTER="master2.openshift4.example.com=https://10.x.45.252:2380,master1.openshift4.example.com=https://10.x.45.251:2380,master3.openshift4.example.com=https://10.x.45.222:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.x.45.222:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

第三个 etcd yaml 也像之前一样更改。启动后在持续观察:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[root@master1 /]$ etcdctl endpoint status -w table
{"level":"warn","ts":"2021-06-08T09:32:55.197Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://10.x.45.222:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint https://10.x.45.222:2379 (context deadline exceeded)
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.x.45.251:2379 | 831fd1ef9bc83a2b | 3.4.9 | 724 MB | true | false | 632 | 2388 | 2388 | |
| https://10.x.45.252:2379 | 3e27197aa4521ea0 | 3.4.9 | 724 MB | false | false | 632 | 2388 | 2388 | |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@master1 /]$ etcdctl endpoint status -w table
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.x.45.251:2379 | 831fd1ef9bc83a2b | 3.4.9 | 724 MB | true | false | 632 | 2593 | 2593 | |
| https://10.x.45.252:2379 | 3e27197aa4521ea0 | 3.4.9 | 724 MB | false | false | 632 | 2593 | 2593 | |
| https://10.x.45.222:2379 | d319fa1cbb0e28fe | 3.4.9 | 724 MB | false | false | 632 | 2596 | 2596 | |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@master1 /]$ etcdctl endpoint status -w table
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.x.45.251:2379 | 831fd1ef9bc83a2b | 3.4.9 | 724 MB | true | false | 632 | 2939 | 2939 | |
| https://10.x.45.252:2379 | 3e27197aa4521ea0 | 3.4.9 | 724 MB | false | false | 632 | 2939 | 2939 | |
| https://10.x.45.222:2379 | d319fa1cbb0e28fe | 3.4.9 | 724 MB | false | false | 632 | 2939 | 2939 | |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

单独看每个 endpoint 的 member list 正常否:

1
2
3
4
5
6
7
8
9
10
11
12
[root@master1 /]$ ETCDCTL_ENDPOINTS=https://10.x.45.251:2379 etcdctl  member list
3e27197aa4521ea0, started, master2.openshift4.example.com, https://10.x.45.252:2380, https://10.x.45.252:2379, false
831fd1ef9bc83a2b, started, master1.openshift4.example.com, https://10.x.45.251:2380, https://10.x.45.251:2379, false
d319fa1cbb0e28fe, started, master3.openshift4.example.com, https://10.x.45.222:2380, https://10.x.45.222:2379, false
[root@master1 /]$ ETCDCTL_ENDPOINTS=https://10.x.45.252:2379 etcdctl member list
3e27197aa4521ea0, started, master2.openshift4.example.com, https://10.x.45.252:2380, https://10.x.45.252:2379, false
831fd1ef9bc83a2b, started, master1.openshift4.example.com, https://10.x.45.251:2380, https://10.x.45.251:2379, false
d319fa1cbb0e28fe, started, master3.openshift4.example.com, https://10.x.45.222:2380, https://10.x.45.222:2379, false
[root@master1 /]$ ETCDCTL_ENDPOINTS=https://10.x.45.222:2379 etcdctl member list
3e27197aa4521ea0, started, master2.openshift4.example.com, https://10.x.45.252:2380, https://10.x.45.252:2379, false
831fd1ef9bc83a2b, started, master1.openshift4.example.com, https://10.x.45.251:2380, https://10.x.45.251:2379, false
d319fa1cbb0e28fe, started, master3.openshift4.example.com, https://10.x.45.222:2380, https://10.x.45.222:2379, false

然后看了下 node not ready了,approve 了所有 csr后就好了。

1
2
oc get csr
oc adm certificate approve xxx

然后测试了下,kubectl 能删掉 pod 了。后续等稳定后再手动备份下

无法调度和 logs 报错 remote error: tls: internal error

同时也无法调度,master 上去看 kube-apiserver日志刷:

1
authentication.go:53] Unable to authenticate the request due to an error: x509: certificate signed by unkown authority

搜了下都没解决办法,最后自己在 master 上屏直觉找到解决办法了。

1
2
3
4
while :;do
sleep 2
oc get csr -o name | xargs -r oc adm certificate approve
done

另一个窗口 ssh 到 master上停掉 cert-syncer 相关容器:

1
crictl ps -a | awk '/Running/&&/-cert-syncer/{print $1}' | xargs -r crictl stop

一些疑惑

后面尝试自带的 yaml 文件+那个备份脚本恢复的就是脑裂集群,询问了个 4.7.13 集群的,看了下 etcd 的 yaml 文件是下面的。没有启动前恢复备份啥的了。可能我这个版本才存在这种问题吧。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
...
#!/bin/sh
set -euo pipefail

etcdctl member list || true

# this has a non-zero return code if the command is non-zero. If you use an export first, it doesn't and you
# will succeed when you should fail.
ETCD_INITIAL_CLUSTER=$(discover-etcd-initial-cluster \\
--cacert=/etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt \\
--cert=/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-master11.cluster.lonlife.dev.crt \\
--key=/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-master11.cluster.lonlife.dev.key \\
--endpoints=${ALL_ETCD_ENDPOINTS} \\
--data-dir=/var/lib/etcd \\
--target-peer-url-host=${NODE_master11_cluster_lonlife_dev_ETCD_URL_HOST} \\
--target-name=master11.cluster.lonlife.dev)
export ETCD_INITIAL_CLUSTER

# we cannot use the \"normal\" port conflict initcontainer because when we upgrade, the existing static pod will never yield,
# so we do the detection in etcd container itsefl.
echo -n \"Waiting for ports 2379, 2380 and 9978 to be released.\"
while [ -n \"$(ss -Htan '( sport = 2379 or sport = 2380 or sport = 9978 )')\" ]; do
echo -n \".\"
sleep 1
done

export ETCD_NAME=${NODE_master11_cluster_lonlife_dev_ETCD_NAME}
env | grep ETCD | grep -v NODE

set -x
# See https://etcd.io/docs/v3.4.0/tuning/ for why we use ionice
exec ionice -c2 -n0 etcd \\
--log-level=info \\
--initial-advertise-peer-urls=https://${NODE_master11_cluster_lonlife_dev_IP}:2380 \\
--cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-serving/etcd-serving-master11.cluster.lonlife.dev.crt \\
--key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-serving/etcd-serving-master11.cluster.lonlife.dev.key \\
--trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt \\
--client-cert-auth=true \\
--peer-cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-master11.cluster.lonlife.dev.crt \\
--peer-key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-master11.cluster.lonlife.dev.key \\
--peer-trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-peer-client-ca/ca-bundle.crt \\
--peer-client-cert-auth=true \\
--advertise-client-urls=https://${NODE_master11_cluster_lonlife_dev_IP}:2379 \\
--listen-client-urls=https://0.0.0.0:2379 \\
--listen-peer-urls=https://0.0.0.0:2380 \\
--listen-metrics-urls=https://0.0.0.0:9978 || mv /etc/kubernetes/etcd-backup-dir/etcd-member.yaml /etc/kubernetes/manifests

参考

CATALOG
  1. 1. 前言介绍
  2. 2. 操作
    1. 2.1. 查看 kube-apiserver
    2. 2.2. 故障的开始
    3. 2.3. 一点进展
      1. 2.3.1. 自带的备份恢复
    4. 2.4. etcd 的脑裂
      1. 2.4.1. 处理脑裂
        1. 2.4.1.1. 前置准备
        2. 2.4.1.2. 处理脑裂
        3. 2.4.1.3. 无法调度和 logs 报错 remote error: tls: internal error
      2. 2.4.2. 一些疑惑
  3. 3. 参考