线上的k8s环境普遍版本很低,今天同事心血来潮去一个新节点上去看了下发现根分区满了,清理了后发有一个pod没有创建出来。清理后把因为文件系统满了而down掉的k8s相关进程起来后发现kubelet的log一直报错cannot remove /var/lib/kubelet/xxxxconfig/key resource busy now 啥的。查看了下describe对应rc还没有任何Event
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 $ kubectl describe rc rabbit3rc Name: rabbit3rc Namespace: default Selector: app=rabbitmq-cluster,node=rabbit3 Labels: app=rabbitmq-cluster node=rabbit3 Annotations: <none> Replicas: 1 current / 1 desired Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app=rabbitmq-cluster node=rabbit3 Containers: rabbit3: Image: cloud-base/rabbitmq-3.6.5:E3103-PUB-20181015-RC1 Ports: 4369/TCP, 5672/TCP, 15672/TCP, 25672/TCP Limits: cpu: 16 memory: 8Gi Requests: cpu: 400m memory: 500Mi Liveness: exec [health_check.sh] delay=600s timeout=10s period=15s #success=1 #failure=3 Environment: RABBITMQ_DEFAULT_USER: xxx RABBITMQ_DEFAULT_PASS: xxx RABBITMQ_ERLANG_COOKIE: xxx Mounts: /etc/localtime from time (rw) /var/lib/rabbitmq from rabbitmqvar (rw) Volumes: time: Type: HostPath (bare host directory volume) Path: /etc/localtime rabbitmqvar: Type: HostPath (bare host directory volume) Path: /opt/cloud/rabbitmq Events: <none>
在对应的node上看了下发现没有rabbitmq的容器,只有一个pause的容器,但是状态是Dead并且很多这样的而且删不掉。然后重启了下docker后虽然Dead的没了但是等被拉起来后对应的rabiitmq和它的pause容器都没有。怀疑调度有问题。 这个低版本集群的HA这块儿方案一直很迷,从进程存活来判断leader的机器后用systemctl查看上面的kube-controller-manager发现如下信息
1 Failed to update lock: Operation cannot be fulfilled on endpoints "kube-controller-manager": the object has been modified; please apply your changes to the latest version and try again
其他节点的controller认为它是leader但是它一直报错无法获取选举锁,然后复制systemd启动参数手动启动调整loglevel为8发现信息为如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 I0322 20:26:36.966931 34218 round_trippers.go:395] PUT https://100.68.24.2:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager I0322 20:26:36.966938 34218 round_trippers.go:402] Request Headers: I0322 20:26:36.966944 34218 round_trippers.go:405] Accept: application/vnd.kubernetes.protobuf, */* I0322 20:26:36.966951 34218 round_trippers.go:405] Content-Type: application/vnd.kubernetes.protobuf I0322 20:26:36.966956 34218 round_trippers.go:405] User-Agent: kube-controller-manager/v1.6.7+095136c3078cc (linux/amd64) kubernetes/095136c/leader-election I0322 20:26:36.967726 34218 round_trippers.go:420] Response Status: 409 Conflict in 0 milliseconds I0322 20:26:36.967738 34218 round_trippers.go:423] Response Headers: I0322 20:26:36.967744 34218 round_trippers.go:426] Content-Type: application/vnd.kubernetes.protobuf I0322 20:26:36.967749 34218 round_trippers.go:426] Content-Length: 259 I0322 20:26:36.967754 34218 round_trippers.go:426] Date: Fri, 22 Mar 2019 12:26:36 GMT I0322 20:26:36.967888 34218 request.go:988] Response Body: 00000000 6b 38 73 00 0a 0c 0a 02 76 31 12 06 53 74 61 74 |k8s.....v1..Stat| 00000010 75 73 12 ea 01 0a 04 0a 00 12 00 12 07 46 61 69 |us...........Fai| 00000020 6c 75 72 65 1a a1 01 4f 70 65 72 61 74 69 6f 6e |lure...Operation| 00000030 20 63 61 6e 6e 6f 74 20 62 65 20 66 75 6c 66 69 | cannot be fulfi| 00000040 6c 6c 65 64 20 6f 6e 20 65 6e 64 70 6f 69 6e 74 |lled on endpoint| 00000050 73 20 22 6b 75 62 65 2d 63 6f 6e 74 72 6f 6c 6c |s "kube-controll| 00000060 65 72 2d 6d 61 6e 61 67 65 72 22 3a 20 74 68 65 |er-manager": the| 00000070 20 6f 62 6a 65 63 74 20 68 61 73 20 62 65 65 6e | object has been| 00000080 20 6d 6f 64 69 66 69 65 64 3b 20 70 6c 65 61 73 | modified; pleas| 00000090 65 20 61 70 70 6c 79 20 79 6f 75 72 20 63 68 61 |e apply your cha| 000000a0 6e 67 65 73 20 74 6f 20 74 68 65 20 6c 61 74 65 |nges to the late| 000000b0 73 74 20 76 65 72 73 69 6f 6e 20 61 6e 64 20 74 |st version and t| 000000c0 72 79 20 61 67 61 69 6e 22 08 43 6f 6e 66 6c 69 |ry again".Confli| 000000d0 63 74 2a 28 0a 17 6b 75 62 65 2d 63 6f 6e 74 72 |ct*(..kube-contr| 000000e0 6f 6c 6c 65 72 2d 6d 61 6e 61 67 65 72 12 00 1a |oller-manager...| 000000f0 09 65 6e 64 70 6f 69 6e 74 73 28 00 30 99 03 1a |.endpoints(.0...| 00000100 00 22 00 |.".| E0322 20:26:36.967960 34218 leaderelection.go:263] Failed to update lock: Operation cannot be fulfilled on endpoints "kube-controller-manager": the object has been modified; please apply your changes to the latest version and try again I0322 20:26:36.967971 34218 leaderelection.go:185] failed to acquire lease kube-system/kube-controller-manager ^C
去官方代码仓库准备搜下代码看看选举逻辑是如何获得锁的,结果根本看不懂,于是想着看看有没有人研究过选举逻辑啥的结果。按照关键词搜到了一个国外文章 http://gogosatellite.blogspot.com/2017/07/how-to-setup-high-availability.html ,里面一行输出日志给了我排查方向
1 2 3 4 5 I0607 11:04:32.485502 17291 leaderelection.go:248] lock is held by kuberm and has not yet expired I0607 11:04:32.485506 17291 leaderelection.go:185] failed to acquire lease kube-system/kube-controller-manager I0607 11:04:36.263032 17291 round_trippers.go:417] GET http://172.16.155.165:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager 200 OK in 1 milliseconds I0607 11:04:36.263122 17291 leaderelection.go:248] lock is held by kuberm and has not yet expired I0607 11:04:36.263125 17291 leaderelection.go:185] failed to acquire lease kube-system/kube-controller-manager
猜测到是所有kube-controller-manager请求apiserver然后竞争这个ep来获得锁,后续查看确实是利用endpoint来竞选leader
1 2 3 $ kube-controller-manager --help |& grep leader ... --leader-elect-resource-lock endpoints The type of resource object that is used for locking during leader election. Supported options are endpoints (default) and `configmaps`. (default "endpoints")
用kubectl请求了下这个ep发现如下信息并且一直不会改变,annotations
的里的holderIdentity
字段是当前leader的node
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 $ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "87e9ff0a-388b-11e9-949b-0cda411d3f00", "resourceVersion": "36217274", "creationTimestamp": "2019-02-24T23:25:54Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"xxxxx{nodename}xxxxxx\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-02-24T23:25:54Z\",\"renewTime\":\"2019-03-17T11:20:08Z\",\"leaderTransitions\":0}" } }, "subsets": [] } $ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "87e9ff0a-388b-11e9-949b-0cda411d3f00", "resourceVersion": "36217274", "creationTimestamp": "2019-02-24T23:25:54Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"Xxxxxxx-S02\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-02-24T23:25:54Z\",\"renewTime\":\"2019-03-17T11:20:08Z\",\"leaderTransitions\":0}" } }, "subsets": [] }
在我自己搭建的新集群上查看了下发现每时每刻resourceVersion
一直在改变,而对比上面线上的却没有任何变化
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 [root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a", "resourceVersion": "52752", "creationTimestamp": "2019-03-22T02:48:56Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}" } } } [root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a", "resourceVersion": "52772", "creationTimestamp": "2019-03-22T02:48:56Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}" } } }
为了确定字段holderIdentity
是leader,便尝试通过关闭kube-controller-manager
看看字段会不会变,发现真的改变并且会有新的ep又出现了(注意看creationTimestamp
实际上不是生成)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 [root@k8s-m1 Kubernetes-ansible]# systemctl stop kube-controller-manager.service [root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a", "resourceVersion": "52819", "creationTimestamp": "2019-03-22T02:48:56Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}" } } } [root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a", "resourceVersion": "52819", "creationTimestamp": "2019-03-22T02:48:56Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}" } } }
基本就确定了某些原因(可能文件系统满导致的也可能其他),于是生产环境上删除ep来踢出无法修改的leader信息,发现虽然还是他自己获取到leader,但是resourceVersion
开始刷新了,而controller的日志里也没报错了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 $ kubectl -n kube-system delete ep kube-controller-manager endpoints "kube-controller-manager" deleted $ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "dec669dd-4c9f-11e9-949b-0cda411d3f00", "resourceVersion": "37542637", "creationTimestamp": "2019-03-22T12:41:53Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"Xxxxxxx-S02\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-03-22T12:41:53Z\",\"renewTime\":\"2019-03-22T12:41:53Z\",\"leaderTransitions\":0}" } }, "subsets": [] } $ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "dec669dd-4c9f-11e9-949b-0cda411d3f00", "resourceVersion": "37542785", "creationTimestamp": "2019-03-22T12:41:53Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"Xxxxxxx-S02\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-03-22T12:41:53Z\",\"renewTime\":\"2019-03-22T12:41:59Z\",\"leaderTransitions\":0}" } }, "subsets": []
查看rc描述和对应的pod都起来了
1 2 3 4 5 6 7 8 9 10 kubectl describe rc rabbit3rc Name: rabbit3rc Namespace: default Selector: app=rabbitmq-cluster,node=rabbit3 Labels: app=rabbitmq-cluster node=rabbit3 Annotations: <none> Replicas: 1 current / 1 desired Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed ...