zhangguanzhang's Blog

高版本client-go在serviceAccountToken下invalid bearer token?

字数统计: 1.9k阅读时长: 10 min
2024/02/06

1.27.4 k8s 下,coredns flannel 均报错401,kube-apiserver 刷 "Unable to authenticate the request" err="[invalid bearer token, service account token is not valid yet]

由来

今天同事说他的 pod 无法操作 k8s 集群资源了,查看了下,发现 flannel coredns 均报错 401

1
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.26.1/tools/cache/reflector.go:169: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized

kube-apiserver 刷日志:

1
authentication.go:70] "Unable to authenticate the request" err="[invalid bearer token, service account token is not valid yet]"

过程

搜了下,为了提升安全性,Kubernetes 社区在 1.21 版本默认启动了 BoundServiceAccountTokenVolume 功能。该功能设定了 ServiceAccount Token 的过期时间、自动刷新机制和删除 Pod 后的 Token 失效机制。client-go 客户端版本大于等于 v11.0.0 和 v0.15.0 时,kubelet 组件会定时刷新 Token,定时从挂载盘重新加载Token,实现Token更新。为了保持向前兼容,过期后,您可以继续使用该Token一年。但一年后,如果 client-go 版本小于 v11.0.0 或者 v0.15.0 ,系统不会自动重新加载并更新Token,导致Token过期风险。

但是我们 k8s 集群是 1.27 了,并且 coredns flannel 的 client-go 版本都很新,而且其他集群环境都没出现。

排查

同事说他重启了一个 coredns 的 pod 后,这个 coredns pod 就没刷 401 了,大概看了下 ServiceAccountToken 的文档,就是 kubelet 给 pod 挂载的:

1
2
3
4
5
6
7
8
9
...
volumes:
- name: token-vol
projected:
sources:
- serviceAccountToken:
audience: api
expirationSeconds: 3600
path: token

查看下 kubelet 日志,发现一直刷:

1
2
3
4
5
6
7
8
9
10
11
12
cri_stats_provider.go:757] "Failed updating cpu usage nano core" err="zero or negative interval (1707196606940341356 - 1711900953406532664)"
cri_stats_provider.go:757] "Failed updating cpu usage nano core" err="zero or negative interval (1707196606941261574 - 1711900953467321378)"
cri_stats_provider.go:757] "Failed updating cpu usage nano core" err="zero or negative interval (1707196606942318471 - 1711900953423406042)"
cri_stats_provider.go:757] "Failed updating cpu usage nano core" err="zero or negative interval (1707196606943293769 - 1711900953450007891)"
cri_stats_provider.go:757] "Failed updating cpu usage nano core" err="zero or negative interval (1707196606944280584 - 1711900953461851576)"
cri_stats_provider.go:757] "Failed updating cpu usage nano core" err="zero or negative interval (1707196606945180585 - 1711900953427074341)"
cri_stats_provider.go:757] "Failed updating cpu usage nano core" err="zero or negative interval (1707196606946762456 - 1711900953414233495)"
cri_stats_provider.go:757] "Failed updating cpu usage nano core" err="zero or negative interval (1707196606947733070 - 1711900953456650293)"
cri_stats_provider.go:757] "Failed updating cpu usage nano core" err="zero or negative interval (1707196606948747910 - 1711900953430313306)"
cri_stats_provider.go:757] "Failed updating cpu usage nano core" err="zero or negative interval (1707196606949808485 - 1711900953436742140)"
cri_stats_provider.go:757] "Failed updating cpu usage nano core" err="zero or negative interval (1707196606951948415 - 1711900953479775885)"
cri_stats_provider.go:757] "Failed updating cpu usage nano core" err="zero or negative interval (1707196606952993367 - 1711900953458452657)"

在 k8s 源码搜了下 zero or negative interval 找到是下面这个函数触发的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// https://github.com/kubernetes/kubernetes/blob/v1.27.4/pkg/kubelet/stats/cri_stats_provider.go#L715-L760
func (p *criStatsProvider) getAndUpdateContainerUsageNanoCores(stats *runtimeapi.ContainerStats) *uint64 {
if stats == nil || stats.Attributes == nil || stats.Cpu == nil {
return nil
}
// Bypass the cache if the CRI implementation specified the UsageNanoCores.
if stats.Cpu.UsageNanoCores != nil {
return &stats.Cpu.UsageNanoCores.Value
}
// If there is no UsageNanoCores, nor UsageCoreNanoSeconds, there is no information to use
if stats.Cpu.UsageCoreNanoSeconds == nil {
return nil
}
id := stats.Attributes.Id
usage, err := func() (*uint64, error) {
p.mutex.Lock()
defer p.mutex.Unlock()

cached, ok := p.cpuUsageCache[id]
if !ok || cached.stats.UsageCoreNanoSeconds == nil || stats.Cpu.UsageCoreNanoSeconds.Value < cached.stats.UsageCoreNanoSeconds.Value {
// Cannot compute the usage now, but update the cached stats anyway
p.cpuUsageCache[id] = &cpuUsageRecord{stats: stats.Cpu, usageNanoCores: nil}
return nil, nil
}

newStats := stats.Cpu
cachedStats := cached.stats
nanoSeconds := newStats.Timestamp - cachedStats.Timestamp
if nanoSeconds <= 0 {
return nil, fmt.Errorf("zero or negative interval (%v - %v)", newStats.Timestamp, cachedStats.Timestamp)
}
usageNanoCores := uint64(float64(newStats.UsageCoreNanoSeconds.Value-cachedStats.UsageCoreNanoSeconds.Value) /
float64(nanoSeconds) * float64(time.Second/time.Nanosecond))

// Update cache with new value.
usageToUpdate := usageNanoCores
p.cpuUsageCache[id] = &cpuUsageRecord{stats: newStats, usageNanoCores: &usageToUpdate}

return &usageNanoCores, nil
}()

if err != nil {
// This should not happen. Log now to raise visibility
klog.ErrorS(err, "Failed updating cpu usage nano core")
}
return usage
}

大致逻辑就是调用 cri 的 grpc ContainerStats 接口,获取容器返回的 cpu.Timestamp ,如果 cpuUsageCache[id] 第一次为空,就此次获取存储了,下次请求后,对比上次存储的,下次减去上次得到 nanoSeconds ,错误就是走到了 nanoSeconds <= 0 的里面了。

我们是使用的 cri-dockerd + docker 的,写个 grpc 调用 ContainerStats 试试:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
package main

import (
"context"
"encoding/json"
"fmt"
"log"
"os"

"google.golang.org/grpc"
runtimeapi "k8s.io/cri-api/pkg/apis/runtime/v1"
)

func main() {
// 创建 gRPC 连接
conn, err := grpc.Dial("unix:///var/run/cri-dockerd.sock", grpc.WithInsecure())
if err != nil {
log.Fatalf("Failed to connect: %v", err)
}
defer conn.Close()

// 创建 CRI 客户端
criClient := runtimeapi.NewRuntimeServiceClient(conn)

// 调用 ContainerStats 方法
req := &runtimeapi.ContainerStatsRequest{
ContainerId: os.Args[1],
// Verbose: true,
}

resp, err := criClient.ContainerStats(context.Background(), req)
if err != nil {
log.Fatalf("Failed to call ContainerStats: %v", err)
}

jsonData, err := json.MarshalIndent(resp, "", " ")
if err != nil {
log.Fatalf("Failed to marshal container stats: %v", err)
}

fmt.Println(string(jsonData))
}

编译后拷贝到故障机器上选个 coredns container id 执行:

1
2
3
4
5
6
7
8
9
10
11
$ ./stats-test f01c
{
"stats": {
...
"cpu": {
"timestamp": 1707204235253509570,
"usage_core_nano_seconds": {
"value": 2117105450225
}
},
...

cpu.Timestamp 获取是正常的,那就是之前获取到的值比每次 cpu.Timestamp 的值大,试试把日志后面的 1711900953458452657 转换下时间是:

1
Mon Apr 01 2024 00:02:33 GMT+0800

居然是 4 月,看下系统日志发现时间错乱过:

1
2
3
4
5
Apr  1 00:02:21 redhat79 etcd: {"level":"warn","ts":"2024-04-01T00:02:21.54608+0800","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"1354c3808ef4eeb9","clock-drift":"38.480218393s","rtt":"6.133426ms"}
Apr 1 00:02:25 redhat79 kubelet: I0401 00:02:25.365462 3553 kubelet_pods.go:894] "Unable to retrieve pull secret, the image pull may not succeed." pod="default123/xxxx-browsersvc-100-55dc46dd54-2vlbf" secret="" err="secret \"regcred\" not found"
Apr 1 00:02:26 redhat79 kube-apiserver: E0401 00:02:26.604914 1390 controller.go:193] "Failed to update lease" err="Operation cannot be fulfilled on leases.coordination.k8s.io \"apiserver-s7gqnvvc7gfrpthq3uyibizgba\": the object has been modified; please apply your changes to the latest version and try again"
Feb 4 23:44:00 redhat79 systemd: Time has been changed
Feb 4 23:44:00 redhat79 chronyd[917]: Backward time jump detected!

看了下 coredns 的 token 信息:

1
2
3
4
5
6
7
8
9
10
11
$ docker inspect f01 | grep :/var/run
"/data/kube/kubelet/pods/d8abc4c1-48f9-4a2f-9ce7-a649126bb43c/volumes/kubernetes.io~projected/kube-api-access-mzqjx:/var/run/secrets/kubernetes.io/serviceaccount:ro",
$ ls -l /data/kube/kubelet/pods/d8abc4c1-48f9-4a2f-9ce7-a649126bb43c/volumes/kubernetes.io~projected/kube-api-access-mzqjx
total 0
lrwxrwxrwx 1 root root 13 Feb 1 18:16 ca.crt -> ..data/ca.crt
lrwxrwxrwx 1 root root 16 Feb 1 18:16 namespace -> ..data/namespace
lrwxrwxrwx 1 root root 12 Feb 1 18:16 token -> ..data/token
$ readlink -f /data/kube/kubelet/pods/d8abc4c1-48f9-4a2f-9ce7-a649126bb43c/volumes/kubernetes.io~projected/kube-api-access-mzqjx/token
/data/kube/kubelet/pods/d8abc4c1-48f9-4a2f-9ce7-a649126bb43c/volumes/kubernetes.io~projected/kube-api-access-mzqjx/..2024_03_31_15_57_54.3867697203/token
$ awk -F "." '{print $2}' /data/kube/kubelet/pods/d8abc4c1-48f9-4a2f-9ce7-a649126bb43c/volumes/kubernetes.io~projected/kube-api-access-mzqjx/token | base64 -d
{"aud":["https://kubernetes.default.svc"],"exp":1743436695,"iat":1711900695,"iss":"https://kubernetes.default.svc","kubernetes.io":{"namespace":"kube-system","pod":{"name":"coredns-f7bf7f79b-2brfr","uid":"d8abc4c1-48f9-4a2f-9ce7-a649126bb43c"},"serviceaccount":{"name":"coredns","uid":"b6255167-f016-48be-a118-9e6a9dcf3d7a"},"warnafter":1711904302},"nbf":1711900695,"sub":"system:serviceaccount:kube-system:coredns"}base64: invalid input

token 文件第二段是 jwt 信息,信息为:

  • iss(issuer):令牌的签发者
  • sub(subject):令牌所代表的主题(通常是 ServiceAccount 的名称)
  • aud(audience):令牌的预期接收者
  • exp(expiration time):令牌的过期时间
  • nbf(not before):令牌的生效时间
  • iat(issued at):令牌的签发时间

nbf 1711900695 转换为时间戳是 Sun Mar 31 2024 23:58:15 GMT+0800 在未来,所以 coredns 报错 401,解决办法就是两个:

  • 重启受影响的 pod/容器
  • 重启 kubelet

如果没有使用 csi,也就是 kubelet 没调用 mount ,可以重启 kubelet,有的情况下还是 restart ctr 稳妥

CATALOG
  1. 1. 由来
  2. 2. 过程
    1. 2.1. 排查