zhangguanzhang's Blog

为什么我不用nodePort之偶然中奖几率

字数统计: 600阅读时长: 3 min
2019/07/08 Share

嗯,1.6.7的老集群。早上反应数据库实例创建不出来。发现接口port的tcp不可达,查看kube-proxy的日志如下

1
2
3
4
5
6
[root@HB1-xxxxx-S03 kubernetes]# tailf  kube-proxy.INFO 
E0708 09:44:03.437284 394350 proxier.go:1062] can't open "nodePort for cloudify/cloudae-cfy-nginx:port1" (:16081/tcp), skipping this nodePort: listen tcp :16081: bind: address already in use
E0708 09:44:03.439285 394350 proxier.go:1062] can't open "nodePort for default/cloudae-app-grafana-svc:cloudae-app-grafana-svc" (:16101/tcp), skipping this nodePort: listen tcp :16101: bind: address already in use
E0708 09:44:04.485556 394350 proxier.go:1062] can't open "nodePort for cloudify/cloudae-cfy-nginx:port2" (:16123/tcp), skipping this nodePort: listen tcp :16123: bind: address already in use
E0708 09:44:04.487749 394350 proxier.go:1062] can't open "nodePort for cloudify/cloudae-cfy-influxdb:web" (:16083/tcp), skipping this nodePort: listen tcp :16083: bind: address already in use
E0708 09:44:04.493137 394350 proxier.go:1062] can't open "nodePort for cloudify/cloudae-cfy-nginx:port1" (:16081/tcp), skipping this nodePort: listen tcp :16081: bind: address already in use

netstat查看端口压根没有进程bind这俩端口,然后用lsof发现了是c端使用的端口

1
2
3
4
5
6
7
8
[root@HB1-xxxxx-S03 kubernetes]# lsof -i :16081
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
etcd3 1871 root 42u IPv6 38338 0t0 TCP HB1-TJ1-Cloudos-S03:2379->HB1-TJ1-Cloudos-S03:16081 (ESTABLISHED)
kube-apis 1945 root 26u IPv4 21932 0t0 TCP HB1-TJ1-Cloudos-S03:16081->HB1-TJ1-Cloudos-S03:2379 (ESTABLISHED)
[root@HB1-xxxxx-S03 kubernetes]# lsof -i :16123
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
etcd3 1871 root 62u IPv6 42179 0t0 TCP HB1-TJ1-Cloudos-S03:2379->HB1-TJ1-Cloudos-S03:16123 (ESTABLISHED)
kube-apis 1945 root 46u IPv4 49719 0t0 TCP HB1-TJ1-Cloudos-S03:16123->HB1-TJ1-Cloudos-S03:2379 (ESTABLISHED)

居然是apiserver作为client端去访问etcd占据了端口,这几个几率感觉可以去买彩票了。。。,数据库那边研发说要解决这个事情,改端口的range,我看下这套老集群的nodeport范围

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@HB1-xxxxx-S03 kubernetes]# netstat -nlpt  | grep -Po ':::\K\d+(?=.+kube-proxy)' | sort -rn | xargs -n6
63200 53333 53229 48606 35357 33330
30834 30418 30408 30186 30166 30158
30157 30156 30155 30154 30152 30151
30150 30140 30120 29754 28000 27017
25906 21182 21181 21180 21160 21141
21140 21120 21102 21101 21100 21080
21060 21040 21020 21000 19002 18088
16400 16310 16300 16210 16200 16123
16101 16100 16086 16083 16081 16021
16000 15672 15320 15300 15102 15101
15100 15002 15000 12700 12345 12200
12000 11900 11820 11700 11680 11660
11620 11600 11520 11500 11460 11440
11420 11400 11360 11320 11300 11200
11100 10001 9779 9696 9443 9311
9292 9191 9030 9029 9028 9027
9026 9025 9024 9023 9022 9021
9020 9019 9018 9017 9016 9015
9014 9013 9012 9011 9000 8786
8779 8778 8776 8775 8774 8386
8082 8042 8041 8004 8001 7099
6385 6083 6080 5672 5432 5000
3306 3000 2359 21 20

产品部门太多,基本上都是web七层应用,但是向外暴露都选择nodeport,没有统一导致nodeport range分散开,头痛

CATALOG