zhangguanzhang's Blog

为什么我不用nodePort之偶然中奖几率

字数统计: 755阅读时长: 3 min
2019/07/08

嗯,1.6.7 的老集群。早上反应数据库实例创建不出来。发现接口 port 的 tcp 不可达,查看 kube-proxy 的日志如下

1
2
3
4
5
6
[root@HB1-xxxxx-S03 kubernetes]# tailf  kube-proxy.INFO 
E0708 09:44:03.437284 394350 proxier.go:1062] can't open "nodePort for cloudify/cloudae-cfy-nginx:port1" (:16081/tcp), skipping this nodePort: listen tcp :16081: bind: address already in use
E0708 09:44:03.439285 394350 proxier.go:1062] can't open "nodePort for default/cloudae-app-grafana-svc:cloudae-app-grafana-svc" (:16101/tcp), skipping this nodePort: listen tcp :16101: bind: address already in use
E0708 09:44:04.485556 394350 proxier.go:1062] can't open "nodePort for cloudify/cloudae-cfy-nginx:port2" (:16123/tcp), skipping this nodePort: listen tcp :16123: bind: address already in use
E0708 09:44:04.487749 394350 proxier.go:1062] can't open "nodePort for cloudify/cloudae-cfy-influxdb:web" (:16083/tcp), skipping this nodePort: listen tcp :16083: bind: address already in use
E0708 09:44:04.493137 394350 proxier.go:1062] can't open "nodePort for cloudify/cloudae-cfy-nginx:port1" (:16081/tcp), skipping this nodePort: listen tcp :16081: bind: address already in use

netstat 查看端口压根没有进程bind这俩端口,然后用 lsof 发现了是 client 端使用的端口

1
2
3
4
5
6
7
8
[root@HB1-xxxxx-S03 kubernetes]# lsof -i :16081
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
etcd3 1871 root 42u IPv6 38338 0t0 TCP HB1-TJ1-Cloudos-S03:2379->HB1-TJ1-Cloudos-S03:16081 (ESTABLISHED)
kube-apis 1945 root 26u IPv4 21932 0t0 TCP HB1-TJ1-Cloudos-S03:16081->HB1-TJ1-Cloudos-S03:2379 (ESTABLISHED)
[root@HB1-xxxxx-S03 kubernetes]# lsof -i :16123
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
etcd3 1871 root 62u IPv6 42179 0t0 TCP HB1-TJ1-Cloudos-S03:2379->HB1-TJ1-Cloudos-S03:16123 (ESTABLISHED)
kube-apis 1945 root 46u IPv4 49719 0t0 TCP HB1-TJ1-Cloudos-S03:16123->HB1-TJ1-Cloudos-S03:2379 (ESTABLISHED)

居然是 apiserver 作为 client 端去访问 etcd 占据了端口,这几个几率感觉可以去买彩票了。。。

我们的 net.ipv4.ip_local_port_range 开得很大,所以 client 的端口范围很大,它的默认是32768 60999。数据库那边研发说要解决这个事情,改端口的 range,我看下这套老集群的 nodeport 范围

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@HB1-xxxxx-S03 kubernetes]# netstat -nlpt  | grep -Po ':::\K\d+(?=.+kube-proxy)' | sort -rn | xargs -n6
63200 53333 53229 48606 35357 33330
30834 30418 30408 30186 30166 30158
30157 30156 30155 30154 30152 30151
30150 30140 30120 29754 28000 27017
25906 21182 21181 21180 21160 21141
21140 21120 21102 21101 21100 21080
21060 21040 21020 21000 19002 18088
16400 16310 16300 16210 16200 16123
16101 16100 16086 16083 16081 16021
16000 15672 15320 15300 15102 15101
15100 15002 15000 12700 12345 12200
12000 11900 11820 11700 11680 11660
11620 11600 11520 11500 11460 11440
11420 11400 11360 11320 11300 11200
11100 10001 9779 9696 9443 9311
9292 9191 9030 9029 9028 9027
9026 9025 9024 9023 9022 9021
9020 9019 9018 9017 9016 9015
9014 9013 9012 9011 9000 8786
8779 8778 8776 8775 8774 8386
8082 8042 8041 8004 8001 7099
6385 6083 6080 5672 5432 5000
3306 3000 2359 21 20

产品部门太多,基本上都是web七层应用,但是向外暴露都选择 nodeport,没有统一导致nodeport range分散开,头痛

同时出现这个抢占的时候node_sockstat_TCP特别高,而且inuse还大于alloc

1
2
3
4
5
# TYPE node_sockstat_TCP_alloc gauge
node_sockstat_TCP_alloc 33276
# HELP node_sockstat_TCP_inuse Number of TCP sockets in state inuse.
# TYPE node_sockstat_TCP_inuse gauge
node_sockstat_TCP_inuse 33059

处理后就降下来了

后续

2021/06/16 在 1.15.5 集群测试了下,发现 kube-proxy 还会给 nat 表加规则,也就是说即使 nodePort 没有 bind 住,nat表的规则存在下也能访问到。也不是说 nodePort的 bind没用。详情见issue:https://github.com/kubernetes/kubernetes/issues/75443

CATALOG
  1. 1. 后续