zhangguanzhang's Blog

应大多数人要求写下kubeadm的基础使用

字数统计: 7k阅读时长: 34 min
2019/11/24 Share

市面上很多kubeadm的文章都是错误示范或者不够详细,大多数都没写系统设置之类的就直接kubeadm init导致很多跟着做的人会报错

我期望看到本文的读者最少具备以下知识:

  • Linux一些目录规范和systemd
  • 学过一点docker
  • 懂dns和/etc/hosts、curl互相结合来测试一些web的接口响应状态
  • 不要求github有自己项目,至少会浏览github

本教学将以下列节点数与规格来进行部署Kubernetes集群,系统CentOS 7.6+,有条件7.7,不要使用centos7.4以及一下,容器技术依赖于内核技术,低版本系统部署和运行后问题会非常多。有读者用debian10测试过了,apt系列的ubuntu应该16.04以上也行。总之本教程使用与yum系列的系统和apt的系统

IP Hostname role CPU Memory
172.19.0.2 K8S-M1 master 4 8G
172.19.0.3 K8S-M2 master 4 8G
172.19.0.4 K8S-M3 master 4 8G
172.19.0.5 K8S-N1 node 2 4G
  • kubeadm好像要求最低配置2c2g还是多少来着
  • 所有操作全部用root使用者进行,系统盘尽量大点,不然到时候镜像多了例如到了85%会被gc回收镜像
  • 高可用一般建议大于等于3台的奇数台,我使用3台master来做高可用
  • 一台也可以,但是差距不大,差异性我会在文章中注明的,并且单台master的话其他的master ip不用写即可

事前准备(每台机器)

系统层面设置

假设系统是刚用官方iso安装完成未作任何配置(网络和dns自行去配置),apt系列的系统可能需要自行配制下国内的包管理源。对于各系统的差异性我会在命令前加系统区别开,没有生命系统的就是通用的命名

  • 所有防火墙与SELinux 已关闭。如CentOS:
    否则后续 K8S 挂载目录时可能报错 Permission denied,有些云厂商的ip是被NetworkManager纳管的(例如青云),停了它会网络不通,可以不停。
    yum系列系统:
    1
    2
    3
    4
    systemctl disable --now firewalld NetworkManager
    #关闭selinux
    setenforce 0
    sed -ri '/^[^#]*SELINUX=/s#=.+$#=disabled#' /etc/selinux/config

apt系列系统:

1
ufw disable

设置时区

1
ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

  • 关闭 dnsmasq (可选)
    linux 系统开启了 dnsmasq 后(如 GUI 环境),将系统 DNS Server 设置为 127.0.0.1,这会导致 docker 容器无法解析域名,需要关闭它

    1
    systemctl disable --now dnsmasq
  • Kubernetes 建议关闭系统Swap,在所有机器使用以下指令关闭swap并注释掉/etc/fstab中swap的行,不想关闭可以不执行,后面会应对的配置选项:

    1
    2
    swapoff -a && sysctl -w vm.swappiness=0
    sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab
  • 安装一些基础依赖和工具
    yum系列系统:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    yum install epel-release -y
    yum install -y \
    curl \
    wget \
    git \
    conntrack-tools \
    psmisc \
    nfs-utils \
    jq \
    socat \
    bash-completion \
    ipset \
    ipvsadm \
    conntrack \
    libseccomp \
    net-tools \
    crontabs \
    sysstat \
    unzip \
    bind-utils \
    tcpdump \
    telnet \
    lsof \
    htop

apt系列系统:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apt-get update && apt-get install -y wget \
git \
psmisc \
nfs-kernel-server \
nfs-common \
jq \
socat \
bash-completion \
ipset \
ipvsadm \
conntrack \
libseccomp2 \
net-tools \
cron \
sysstat \
unzip \
dnsutils \
tcpdump \
telnet \
lsof \
htop \
curl \
apt-transport-https \
ca-certificates

  • 如果集群kube-proxy想使用ipvs模式的话需要开机加载下列模块儿,按照规范使用systemd-modules-load来加载而不是在/etc/rc.local里写modprobe
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    :> /etc/modules-load.d/ipvs.conf
    module=(
    ip_vs
    ip_vs_rr
    ip_vs_wrr
    ip_vs_sh
    nf_conntrack
    br_netfilter
    )
    for kernel_module in ${module[@]};do
    /sbin/modinfo -F filename $kernel_module |& grep -qv ERROR && echo $kernel_module >> /etc/modules-load.d/ipvs.conf || :
    done

apt系列的系统先使用systemctl cat systemd-modules-load看下有没有Install段,没有则执行下面

1
2
3
4
cat>>/usr/lib/systemd/system/systemd-modules-load.service<<EOF
[Install]
WantedBy=multi-user.target
EOF

启动该模块管理服务

1
2
systemctl daemon-reload
systemctl enable --now systemd-modules-load.service

上面如果systemctl enable命令报错可以systemctl status -l systemd-modules-load.service看看哪个内核模块加载不了,在/etc/modules-load.d/ipvs.conf里注释掉它再enable试试
确认内核模块加载

1
2
3
4
5
6
7
$ lsmod | grep ip_v
ip_vs_sh 12688 0
ip_vs_wrr 12697 0
ip_vs_rr 12600 11
ip_vs 145497 17 ip_vs_rr,ip_vs_sh,ip_vs_wrr
nf_conntrack 133095 7 ip_vs,nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4
libcrc32c 12644 3 ip_vs,nf_nat,nf_conntrack

  • 所有机器需要设定/etc/sysctl.d/k8s.conf的系统参数,目前对ipv6支持不怎么好,所以里面也关闭ipv6了。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    cat <<EOF > /etc/sysctl.d/k8s.conf
    net.ipv6.conf.all.disable_ipv6 = 1
    net.ipv6.conf.default.disable_ipv6 = 1
    net.ipv6.conf.lo.disable_ipv6 = 1
    net.ipv4.neigh.default.gc_stale_time = 120
    net.ipv4.conf.all.rp_filter = 0
    net.ipv4.conf.default.rp_filter = 0
    net.ipv4.conf.default.arp_announce = 2
    net.ipv4.conf.lo.arp_announce = 2
    net.ipv4.conf.all.arp_announce = 2
    net.ipv4.ip_forward = 1
    net.ipv4.tcp_max_tw_buckets = 5000
    net.ipv4.tcp_syncookies = 1
    net.ipv4.tcp_max_syn_backlog = 1024
    net.ipv4.tcp_synack_retries = 2
    # 要求iptables不对bridge的数据进行处理
    net.bridge.bridge-nf-call-ip6tables = 1
    net.bridge.bridge-nf-call-iptables = 1
    net.bridge.bridge-nf-call-arptables = 1
    net.netfilter.nf_conntrack_max = 2310720
    fs.inotify.max_user_watches=89100
    fs.may_detach_mounts = 1
    fs.file-max = 52706963
    fs.nr_open = 52706963
    vm.overcommit_memory=1
    vm.panic_on_oom=0
    EOF

    sysctl --system

如果选择关闭swap也要在内核里关闭,不关闭可以不执行

1
echo 'vm.swappiness = 0' >> /etc/sysctl.d/k8s.conf

如果kube-proxy使用ipvs的话为了防止timeout需要设置下tcp参数

1
2
3
4
5
6
7
8
9
cat <<EOF >> /etc/sysctl.d/k8s.conf
# https://github.com/moby/moby/issues/31208
# ipvsadm -l --timout
# 修复ipvs模式下长连接timeout问题 小于900即可
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 10
EOF
sysctl --system

  • 优化设置 journal 日志相关,避免日志重复搜集,浪费系统资源。修改systemctl启动的最小文件打开数量。关闭ssh反向dns解析

    1
    2
    3
    4
    5
    6
    7
    8
    # 下面两句apt系列系统没有,执行不影响
    sed -ri 's/^\$ModLoad imjournal/#&/' /etc/rsyslog.conf
    sed -ri 's/^\$IMJournalStateFile/#&/' /etc/rsyslog.conf

    sed -ri 's/^#(DefaultLimitCORE)=/\1=100000/' /etc/systemd/system.conf
    sed -ri 's/^#(DefaultLimitNOFILE)=/\1=100000/' /etc/systemd/system.conf

    sed -ri 's/^#(UseDNS )yes/\1no/' /etc/ssh/sshd_config
  • 文件最大打开数,按照规范,在子配置文件写

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    cat>/etc/security/limits.d/kubernetes.conf<<EOF
    * soft nproc 131072
    * hard nproc 131072
    * soft nofile 131072
    * hard nofile 131072
    root soft nproc 131072
    root hard nproc 131072
    root soft nofile 131072
    root hard nofile 131072
    EOF

集群的HA依赖于时间一致性,安装并配置chrony
yum:

1
yum install -y chrony

apt:

1
apt-get install -y chrony

配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
cat>/etc/chrony.conf<<EOF
server cn.pool.ntp.org iburst minpoll 4 maxpoll 10
server s1b.time.edu.cn iburst minpoll 4 maxpoll 10
# Ignor source level
stratumweight 0

# Record the rate at which the system clock gains/losses time.
driftfile /var/lib/chrony/chrony.drift

# This directive enables kernel synchronisation (every 11 minutes) of the
# real-time clock. Note that it can’t be used along with the 'rtcfile' directive.
rtcsync

# Allow the system clock to be stepped in the first three updates
# if its offset is larger than 1 second.
makestep 1.0 3


# Enable hardware timestamping on all interfaces that support it.
#hwtimestamp *

# Increase the minimum number of selectable sources required to adjust
# the system clock.
#minsources 2

bindcmdaddress 127.0.0.1

#bindcmdaddress ::1

# Specify file containing keys for NTP authentication.
keyfile /etc/chrony/chrony.keys

logdir /var/log/chrony
# adjust time big than 1 sec will log to file
logchange 1
EOF

systemctl enable --now chronyd

  • 修改hostname
    kubelet和kube-proxy上报node信息默认是取hostname的,除非通过--hostname-override指定,这里自行设置hostname

    1
    hostnamectl set-hostname xxx
  • docker官方的内核检查脚本建议(RHEL7/CentOS7: User namespaces disabled; add 'user_namespace.enable=1' to boot command line),如果是yum系列的系统使用下面命令开启,apt类型的系统不需要
    yum:

    1
    grubby --args="user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"

重启系统

1
reboot

安装docker

  • 检查系统内核和模块是否适合运行 docker (仅适用于 linux 系统),该脚本可能因为墙的原因无法生存,可以先去掉重定向看看能不能访问到脚本
    1
    2
    curl -s https://raw.githubusercontent.com/docker/docker/master/contrib/check-config.sh > check-config.sh
    bash ./check-config.sh

现在docker存储驱动都是使用的overlay2(不要使用devicemapper,这个坑非常多),我们重点关注overlay2是否不是绿色

这里我们使用年份命名版本的docker-ce,假设我们要安装v1.16.3的k8s,我们去 https://github.com/kubernetes/kubernetes 里进对应版本的CHANGELOG-1.16.md里搜The list of validated docker versions remain查找支持的docker版本,docker版本不一定得在支持列表里,实际上19.03也能使用,这里我们使用docker官方的安装脚本安装docker(该脚本支持centos和ubuntu)

1
2
export VERSION=19.03
curl -fsSL "https://get.docker.com/" | bash -s -- --mirror Aliyun

  • 所有机器配置加速源并配置docker的启动参数使用systemd,使用systemd是官方的建议,详见 https://kubernetes.io/docs/setup/cri/
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    mkdir -p /etc/docker/
    cat>/etc/docker/daemon.json<<EOF
    {
    "exec-opts": ["native.cgroupdriver=systemd"],
    "registry-mirrors": [
    "https://fz5yth0r.mirror.aliyuncs.com",
    "http://hub-mirror.c.163.com/",
    "https://docker.mirrors.ustc.edu.cn/",
    "https://registry.docker-cn.com"
    ],
    "storage-driver": "overlay2",
    "storage-opts": [
    "overlay2.override_kernel_check=true"
    ],
    "log-driver": "json-file",
    "log-opts": {
    "max-size": "100m",
    "max-file": "3"
    }
    }
    EOF

Live Restore Enabled这个千万别开,某些极端情况下容器Dead状态之类的必须重启docker daemon才能解决,开了就只能重启机器解决了

  • 设置docker开机启动,CentOS安装完成后docker需要手动设置docker命令补全:
1
yum install -y epel-release bash-completion

apt系列操作为取消文件/etc/bash.bashrc内下面行的注释

1
2
3
4
5
6
7
8
# enable bash completion in interactive shells
#if ! shopt -oq posix; then
# if [ -f /usr/share/bash-completion/bash_completion ]; then
# . /usr/share/bash-completion/bash_completion
# elif [ -f /etc/bash_completion ]; then
# . /etc/bash_completion
# fi
#fi

复制补全脚本

1
cp /usr/share/bash-completion/completions/docker /etc/bash_completion.d/

  • 防止FORWARD的DROP策略影响转发,给docker daemon添加下列参数修正,当然暴力点也可以iptables -P FORWARD ACCEPT
    1
    2
    3
    4
    5
    6
    mkdir -p /etc/systemd/system/docker.service.d/
    cat>/etc/systemd/system/docker.service.d/10-docker.conf<<EOF
    [Service]
    ExecStartPost=/sbin/iptables -I FORWARD -s 0.0.0.0/0 -j ACCEPT
    ExecStopPost=/bin/bash -c '/sbin/iptables -D FORWARD -s 0.0.0.0/0 -j ACCEPT &> /dev/null || :'
    EOF

启动docker并看下信息是否正常

1
2
systemctl enable --now docker
docker info

如果enable docker的时候报错开启debug,如何开见 https://github.com/zhangguanzhang/Kubernetes-ansible/wiki/systemctl-running-debug

kubeadm部署

安装kubeadm相关

默认源在国外会无法安装,我们使用国内的镜像源,所有机器都要操作
yum:

1
2
3
4
5
6
7
cat <<EOF >/etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
EOF

apt:

1
2
3
4
5
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
cat>/etc/apt/sources.list.d/kubernetes.list<<EOF
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main"
EOF
apt-get update

master部分

k8s的node就是kubelet+cri(一般是docker),kubectl是一个agent读取kubeconfig去访问kube-apiserver来操作集群,kubeadm是部署,所以master节点需要安装三个,node一般不需要kubectl
安装相关软件
yum:

1
2
3
4
5
6
yum install -y \
kubeadm-1.16.3 \
kubectl-1.16.3 \
kubelet-1.16.3 \
--disableexcludes=kubernetes && \
systemctl enable kubelet

apt:

1
2
3
4
apt-get install -y \
kubeadm=1.16.3-00 \
kubectl-1.16.3-00 \
kubelet-1.16.3-00

node

1
2
3
4
5
yum install -y \
kubeadm-1.16.3 \
kubelet-1.16.3 \
--disableexcludes=kubernetes && \
systemctl enable kubelet
1
2
3
apt-get install -y \
kubeadm=1.16.3-00 \
kubelet-1.16.3-00

配置kubelet的参数方法(有需要的话)

查看kubelet的systemd文件

1
systemctl cat kubelet

我们可以看到/etc/sysconfig/kubeletEnvironmentFile,里面注释也写明了我们应该在该文件里写KUBELET_EXTRA_ARGS来给kubelet配置运行参数,下面是个例子,具体参数啥的可以kubelet --help

1
2
3
cat >/etc/sysconfig/kubelet<<EOF
KUBELET_EXTRA_ARGS="--xxx=yyy --aaa=bbb"
EOF

文件/var/lib/kubelet/kubeadm-flags.env也一样

配置HA,所有机器(单个master跳过此步)

关于HA我博客 https://zhangguanzhang.github.io/2019/03/11/k8s-ha/ 说得很清楚,这里我用nginx实现local proxy来玩,因为localproxy是每台机器上的,可以不用SLB和vpc无法使用vip的限制,需要每个机器上运行nginx实现
每台机器配置hosts

1
2
3
4
5
6
cat >>/etc/hosts << EOF
127.0.0.1 apiserver.k8s.local
172.19.0.2 apiserver01.k8s.local
172.19.0.3 apiserver02.k8s.local
172.19.0.4 apiserver03.k8s.local
EOF

每台机器生成配置文件,上面的三个hosts可以不写,写下面配置文件里域名写ip即可,但是这样更改ip需要重新加载

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
mkdir -p /etc/kubernetes
cat > /etc/kubernetes/nginx.conf << EOF
user nginx nginx;
worker_processes auto;
events {
worker_connections 20240;
use epoll;
}
error_log /var/log/nginx_error.log info;

stream {
upstream kube-servers {
hash $remote_addr consistent;
server apiserver01.k8s.local:6443 weight=5 max_fails=1 fail_timeout=3s;
server apiserver02.k8s.local:6443 weight=5 max_fails=1 fail_timeout=3s;
server apiserver03.k8s.local:6443 weight=5 max_fails=1 fail_timeout=3s;
}

server {
listen 8443 reuseport;
proxy_connect_timeout 3s;
# 加大timeout
proxy_timeout 3000s;
proxy_pass kube-servers;
}
}
EOF

因为localproxy是每台机器上的,可以不用SLB和vpc无法使用vip的限制,这里我使用容器运行nginx,当然自己也可以写成staticPod的yaml在init的阶段放入目录里

1
2
3
4
5
6
7
docker run --restart=always \
-v /etc/kubernetes/nginx.conf:/etc/nginx/nginx.conf \
-v /etc/localtime:/etc/localtime:ro \
--name k8s \
--net host \
-d \
nginx:alpine

配置集群信息(第一个master上配置)

  • 打印默认init的配置信息
    1
    kubeadm config print init-defaults > initconfig.yaml

我们看下默认init的集群参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 1.2.3.4
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: k8s-m1
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.16.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
scheduler: {}

我们主要关注和只保留ClusterConfiguration的段,然后修改下,可以参考下列的v1beta2文档,如果是低版本可能是v1beta1,某些字段和新的是不一样的,自行查找godoc看
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#hdr-Basics
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#pkg-constants
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#ClusterConfiguration
ip啥的自行更改成和自己的一致,cidr不懂咋计算就别乱改。controlPlaneEndpoint写域名(内网没dns所有机器写hosts也行)或者SLB,VIP,原因和注意事项见 https://zhangguanzhang.github.io/2019/03/11/k8s-ha/ 这个文章我把HA解释得很清楚了,不要再问我了,下面是最终的yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
imageRepository: gcr.azk8s.cn/google_containers
kubernetesVersion: v1.16.3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
networking: #https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#Networking
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
podSubnet: 10.244.0.0/16
controlPlaneEndpoint: apiserver.k8s.local:8443 # 单个master的话写master的ip或者不写
apiServer: # https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#APIServer
timeoutForControlPlane: 4m0s
extraArgs:
authorization-mode: "Node,RBAC"
enable-admission-plugins: "NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeClaimResize,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,Priority,PodPreset"
runtime-config: api/all,settings.k8s.io/v1alpha1=true
storage-backend: etcd3
etcd-servers: https://172.19.0.2:2379,https://172.19.0.3:2379,https://172.19.0.4:2379
certSANs:
- 10.96.0.1 # service cidr的第一个ip
- 127.0.0.1 # 多个master的时候负载均衡出问题了能够快速使用localhost调试
- localhost
- apiserver.k8s.local # 负载均衡的域名或者vip
- 172.19.0.2
- 172.19.0.3
- 172.19.0.4
- apiserver01.k8s.local
- apiserver02.k8s.local
- apiserver03.k8s.local
- master
- kubernetes
- kubernetes.default
- kubernetes.default.svc
- kubernetes.default.svc.cluster.local
extraVolumes:
- hostPath: /etc/localtime
mountPath: /etc/localtime
name: localtime
readOnly: true
controllerManager: # https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#ControlPlaneComponent
extraArgs:
bind-address: "0.0.0.0"
extraVolumes:
- hostPath: /etc/localtime
mountPath: /etc/localtime
name: localtime
readOnly: true
scheduler:
extraArgs:
bind-address: "0.0.0.0"
extraVolumes:
- hostPath: /etc/localtime
mountPath: /etc/localtime
name: localtime
readOnly: true
dns: # https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#DNS
type: CoreDNS # or kube-dns
# imageRepository: coredns/coredns
imageTag: 1.6.3
etcd: # https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#Etcd
local:
imageRepository: quay.azk8s.cn/coreos
imageTag: v3.3.17
dataDir: /var/lib/etcd
serverCertSANs: # server和peer的localhost,127,::1都默认自带的不需要写
- master
- 172.19.0.2
- 172.19.0.3
- 172.19.0.4
- etcd01.k8s.local
- etcd02.k8s.local
- etcd03.k8s.local
peerCertSANs:
- master
- 172.19.0.2
- 172.19.0.3
- 172.19.0.4
- etcd01.k8s.local
- etcd02.k8s.local
- etcd03.k8s.local
extraArgs: # 暂时没有extraVolumes
auto-compaction-retention: "1h"
max-request-bytes: "33554432"
quota-backend-bytes: "8589934592"
enable-v2: "false" # disable etcd v2 api
# external: //外部etcd的时候这样配置 https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#Etcd
# endpoints:
# - "https://172.19.0.2:2379"
# - "https://172.19.0.3:2379"
# - "https://172.19.0.4:2379"
# caFile: "/etc/kubernetes/pki/etcd/ca.crt"
# certFile: "/etc/kubernetes/pki/etcd/etcd.crt"
# keyFile: "/etc/kubernetes/pki/etcd/etcd.key"
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration # https://godoc.org/k8s.io/kube-proxy/config/v1alpha1#KubeProxyConfiguration
mode: ipvs # or iptables
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: "rr" # 调度算法
strictARP: false
syncPeriod: 15s
iptables:
masqueradeAll: true
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration # https://godoc.org/k8s.io/kubelet/config/v1beta1#KubeletConfiguration
cgroupDriver: systemd
failSwapOn: true # 如果开启swap则设置为false

swap的话看最后一行,apiserver的exterArgs是为了开启podPreset,单台master的话把controlPlaneEndpoint的值改为第一个master的ip
etcd的支持版本可以代码里查看 https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/constants/constants.go#L421-L427

  • 检查文件是否错误,忽略warning,错误的话会抛出error,没错则会输出到包含字符串kubeadm join xxx啥的

    1
    kubeadm init --config initconfig.yaml --dry-run
  • 检查镜像是否正确

    1
    kubeadm config images list --config initconfig.yaml
  • 预先拉取镜像

    1
    2
    3
    4
    5
    6
    7
    8
    kubeadm config images pull --config initconfig.yaml # 下面是输出
    [config/images] Pulled gcr.azk8s.cn/google_containers/kube-apiserver:v1.16.3
    [config/images] Pulled gcr.azk8s.cn/google_containers/kube-controller-manager:v1.16.3
    [config/images] Pulled gcr.azk8s.cn/google_containers/kube-scheduler:v1.16.3
    [config/images] Pulled gcr.azk8s.cn/google_containers/kube-proxy:v1.16.3
    [config/images] Pulled gcr.azk8s.cn/google_containers/pause:3.1
    [config/images] Pulled quay.azk8s.cn/coreos/etcd:v3.3.17
    [config/images] Pulled coredns/coredns:1.6.3

kubeadm init

下面init只在第一个master上面操作

1
kubeadm init --config initconfig.yaml

如果超时了看看是不是kubelet没起来,调试见 https://github.com/zhangguanzhang/Kubernetes-ansible/wiki/systemctl-running-debug

记住init后打印的token,复制kubectl的kubeconfig,kubectl的kubeconfig路径默认是~/.kube/config

1
2
3
mkdir -p $HOME/.kube
sudo \cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

init的yaml信息实际上会存在集群的configmap里,我们可以随时查看,该yaml在其他node和master join的时候会使用到

1
kubectl -n kube-system get cm kubeadm-config -o yaml

如果单个master,也不想整其他的node,需要去掉master节点上的污点,下一步的多master操作不需要整

1
kubectl taint nodes --all node-role.kubernetes.io/master-

配置其他master的k8s管理组件

第一个master上拷贝ca证书到其他master节点上,因为交互输入密码,我们安装sshpass,zhangguanzhang是root密码

1
2
3
yum install sshpass -y
alias ssh='sshpass -p zhangguanzhang ssh -o StrictHostKeyChecking=no'
alias scp='sshpass -p zhangguanzhang scp -o StrictHostKeyChecking=no'

复制ca证书到其他master节点

1
2
3
4
5
6
7
for node in 172.19.0.3 172.19.0.4;do
ssh $node 'mkdir -p /etc/kubernetes/pki/etcd'
scp -r /etc/kubernetes/pki/ca.* $node:/etc/kubernetes/pki/
scp -r /etc/kubernetes/pki/sa.* $node:/etc/kubernetes/pki/
scp -r /etc/kubernetes/pki/front-proxy-ca.* $node:/etc/kubernetes/pki/
scp -r /etc/kubernetes/pki/etcd/ca.* $node:/etc/kubernetes/pki/etcd/
done

其他master join进来

1
2
3
kubeadm join apiserver.k8s.local:8443 \
--token xxx.zzzzzzzzz \
--discovery-token-ca-cert-hash sha256:xxxxxxxxxxx --control-plane

token忘记的话可以kubeadm token list查看,可以通过kubeadm token create创建
sha256的值可以通过下列命令获取

1
2
3
4
openssl x509 -pubkey -in \
/etc/kubernetes/pki/ca.crt | \
openssl rsa -pubin -outform der 2>/dev/null | \
openssl dgst -sha256 -hex | sed 's/^.* //'

所有master配置kubectl

准备kubectl的kubeconfig

1
2
3
mkdir -p $HOME/.kube
sudo \cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

设置kubectl的补全脚本

1
kubectl completion bash > /etc/bash_completion.d/kubectl

所有master配置etcdctl

复制出容器里的etcdctl

1
docker cp `docker ps -a | awk '/k8s_etcd/{print $1}'`:/usr/local/bin/etcdctl /usr/local/bin/etcdctl

1.13还是具体哪个版本后k8s默认使用v3 api的etcd,这里我们配置下etcdctl的参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cat >/etc/profile.d/etcd.sh<<'EOF'
ETCD_CERET_DIR=/etc/kubernetes/pki/etcd/
ETCD_CA_FILE=ca.crt
ETCD_KEY_FILE=healthcheck-client.key
ETCD_CERT_FILE=healthcheck-client.crt
ETCD_EP=https://172.19.0.2:2379,https://172.19.0.3:2379,https://172.19.0.4:2379

alias etcd_v2="etcdctl --cert-file ${ETCD_CERET_DIR}/${ETCD_CERT_FILE} \
--key-file ${ETCD_CERET_DIR}/${ETCD_KEY_FILE} \
--ca-file ${ETCD_CERET_DIR}/${ETCD_CA_FILE} \
--endpoints $ETCD_EP"

alias etcd_v3="ETCDCTL_API=3 \
etcdctl \
--cert ${ETCD_CERET_DIR}/${ETCD_CERT_FILE} \
--key ${ETCD_CERET_DIR}/${ETCD_KEY_FILE} \
--cacert ${ETCD_CERET_DIR}/${ETCD_CA_FILE} \
--endpoints $ETCD_EP"
EOF

重新ssh下或者手动加载下环境变量. /etc/profile.d/etcd.sh

1
2
3
4
5
6
7
8
etcd_v3 endpoint status --write-out=table # 下面是输出
+-------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://172.19.0.2:2379 | d7380397c3ec4b90 | 3.3.17 | 2.1 MB | true | 9 | 85397 |
| https://172.19.0.3:2379 | f776f8545c82d916 | 3.3.17 | 2.1 MB | false | 9 | 85405 |
| https://172.19.0.4:2379 | ead42f3e6c9bb295 | 3.3.17 | 2.1 MB | false | 9 | 85406 |
+-------------------------+------------------+---------+---------+-----------+-----------+------------+

配置etcd备份脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
mkdir -p /opt/etcd
cat>/opt/etcd/etcd_cron.sh<<'EOF'
#!/bin/bash
set -e

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

: ${bak_dir:=/root/} #缺省备份目录,可以修改成存在的目录
: ${cert_dir:=/etc/kubernetes/pki/etcd/}
: ${endpoints:=https://172.19.0.2:2379,https://172.19.0.3:2379,https://172.19.0.4:2379

bak_prefix='etcd-'
cmd_suffix='date +%Y-%m-%d-%H:%M'
bak_suffix='.db'

#将规范化后的命令行参数分配至位置参数($1,$2,...)
temp=`getopt -n $0 -o c:d: -u -- "$@"`

[ $? != 0 ] && {
echo '
Examples:
# just save once
bash $0 /tmp/etcd.db
# save in contab and keep 5
bash $0 -c 5
'
exit 1
}
set -- $temp


# -c 备份保留副本数量
# -d 指定备份存放目录
while true;do
case "$1" in
-c)
[ -z "$bak_count" ] && bak_count=$2
printf -v null %d "$bak_count" &>/dev/null || \
{ echo 'the value of the -c must be number';exit 1; }
shift 2
;;
-d)
[ ! -d "$2" ] && mkdir -p $2
bak_dir=$2
shift 2
;;
*)
[[ -z "$1" || "$1" == '--' ]] && { shift;break; }
echo "Internal error!"
exit 1
;;
esac
done


function etcd_v2(){

etcdctl --cert-file $cert_dir/healthcheck-client.crt \
--key-file $cert_dir/healthcheck-client.key \
--ca-file $cert_dir/ca.crt \
--endpoints $endpoints $@
}

function etcd_v3(){

ETCDCTL_API=3 etcdctl \
--cert $cert_dir/healthcheck-client.crt \
--key $cert_dir/healthcheck-client.key \
--cacert $cert_dir/ca.crt \
--endpoints $endpoints $@
}

etcd::cron::save(){
cd $bak_dir/
etcd_v3 snapshot save $bak_prefix$($cmd_suffix)$bak_suffix
rm_files=`ls -t $bak_prefix*$bak_suffix | tail -n +$[bak_count+1]`
if [ -n "$rm_files" ];then
rm -f $rm_files
fi
}

main(){
[ -n "$bak_count" ] && etcd::cron::save || etcd_v3 snapshot save $@
}

main $@
EOF

crontab -e添加下面内容自动保留四个备份副本

1
bash /opt/etcd/etcd_cron.sh  -c 4 -d /opt/etcd/ &>/dev/null

node

按照前面的做:

  • 配置系统设置
  • 设置hostname
  • 安装docker-ce
  • 设置hosts和nginx
  • 配置软件源,安装kubeadm kubelet

和master的join一样,提前准备好环境和docker,然后join的时候不需要带--control-plane,只有一个master的话join的那个ip写controlPlaneEndpoint的值

1
2
3
4
5
6
$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-m1 NotReady master 24m v1.16.3 172.19.0.2 <none> CentOS Linux 7 (Core) 3.10.0-957.el7.x86_64 docker://19.3.5
k8s-m2 NotReady master 19m v1.16.3 172.19.0.3 <none> CentOS Linux 7 (Core) 3.10.0-957.el7.x86_64 docker://19.3.5
k8s-m3 NotReady master 22m v1.16.3 172.19.0.4 <none> CentOS Linux 7 (Core) 3.10.0-957.el7.x86_64 docker://19.3.5
k8s-n1 NotReady <none> 12s v1.16.3 172.19.0.5 <none> CentOS Linux 7 (Core) 3.10.0-957.el7.x86_64 docker://19.3.5

role只是一个label,可以打label,想显示啥就node-role.kubernetes.io/xxxx

1
kubectl label node k8s-n1 node-role.kubernetes.io/node=""

addon(此章开始到结尾选取任意一个master上执行)

容器的网络还没处理好,coredns无法分配到ip会处于pending状态,这里我用flannel部署,如果你了解bgp可以使用calico
yaml文件来源与flannel官方github https://github.com/coreos/flannel/tree/master/Documentation

1
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

修改

  • 如果是在1.16之前使用psp,policy/v1beta1得修改成extensions/v1beta1

    1
    2
    apiVersion: policy/v1beta1
    kind: PodSecurityPolicy
  • rbac的version改为下面,不要使用v1beta1了,使用下面命令修改

    1
    sed -ri '/apiVersion: rbac/s#v1.+#v1#' kube-flannel.yml
  • 官方yaml自带了四种架构的daemonset,我们删掉除了amd64以外的,大概是227行到结尾

    1
    sed -ri '227,$d' kube-flannel.yml
  • pod的cidr修改了的话这里也要修改,如果是在同一个二层,可以使用把vxlan改为性能更强的host-gw模式,vxlan的话需要安全组放开8472端口的udp

    1
    2
    3
    4
    5
    6
    7
    net-conf.json: |
    {
    "Network": "10.244.0.0/16",
    "Backend": {
    "Type": "vxlan"
    }
    }
  • 使用下面命令修改镜像

    1
    sed -ri '/image/s#quay.io#quay.azk8s.cn#' kube-flannel.yml
  • 修改limits,需要大于request

    1
    2
    3
    limits:
    cpu: "200m"
    memory: "100Mi"

部署flannel

1.15后node的cidr是数组,而不是单个了,flannel目前0.11和之前版本部署的话会有下列错误,见文档
https://github.com/kubernetes/kubernetes/blob/v1.15.0/staging/src/k8s.io/api/core/v1/types.go#L3890-L3893
https://github.com/kubernetes/kubernetes/blob/v1.16.3/staging/src/k8s.io/api/core/v1/types.go#L4206-L4216

1
Error registering network: failed to acquire lease: node "xxx" pod cidr not assigned

手动打patch,后续扩的node也记得打下

1
2
3
4
5
6
7
nodes=`kubectl get node --no-headers | awk '{print $1}'`
for node in $nodes;do
cidr=`kubectl get node "$node" -o jsonpath='{.spec.podCIDRs[0]}'`
[ -z "$(kubectl get node $node -o jsonpath='{.spec.podCIDR}')" ] && {
kubectl patch node "$node" -p '{"spec":{"podCIDR":"'"$cidr"'"}}'
}
done

1
kubectl apply -f kube-flannel.yml

验证集群可用性

1
kubectl -n kube-system get pod -o wide

等待kube-system空间下的pod都是running后我们来测试下集群可用性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
cat<<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx:alpine
name: nginx
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
---
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- name: busybox
image: zhangguanzhang/centos
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
EOF

等待pod running

1
2
3
4
5
6
7
8
$ kubectl get po,svc -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/busybox 1/1 Running 0 4m4s 10.244.2.18 k8s-n1 <none> <none>
pod/nginx-5c559d5697-2ctxh 1/1 Running 0 4m4s 10.244.2.16 k8s-n1 <none> <none>

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 12m <none>
service/nginx ClusterIP 10.100.39.101 <none> 80/TCP 4m4s app=nginx

验证集群dns

1
2
3
4
5
6
$ kubectl exec -ti busybox -- nslookup kubernetes
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name: kubernetes
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local

在master上curl nginx的svc的ip出现nginx的index内容即集群正常,例如我的nginx svc ip是10.100.39.101

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ curl -s 10.100.39.101
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

pod里验证集群域名到pod是否正常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ kubectl exec -ti busybox -- curl nginx
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

关于使用kubeadm的注意事项和个人建议

  • 小白不要着急啥都往上部署,例如dashboard和什么helm,没这个必要,先把命令行的kubectl和一些基础学会了
  • 默认证书是只有一年的,可以自己去修改源码更改
  • 先去把官方文档的concept和tasks板块看完,市面上的书籍和教程实际上都是讲的这俩板块儿
  • 不懂网络的话去找点CCIE的教程看下
  • systemd和docker以及Linux基础都是挺重要的,–help找选项很多人居然都不会
  • yaml,yaml的结构就是无非那些字符,数字,object,数组的混合,可以尝试大脑中把一段yaml转换成json,不然看不懂yaml的结构学不会k8s。很多层级实际上是遵循着逻辑的,例如一个pod有多个容器,所以pod.spec.containers就是一个obkect的数组,又因为pod共享network namepsace,所以hostNetwork这个属性肯定是containers同个级别的
  • kubeconfig实际上就是存了三个信息,一个是host(集群),用户认证信息,这俩都是可以写多个的,所以都是yaml的数组-开头,以及当前的context是哪个host搭配哪个认证信息。和web的jwt思想一样
  • 互斥和污点都是基础知识,也在concept和tasks板块里,上生产的话多份pod肯定要互斥自己分散开来,就像没有容器技术的时代是每个节点跑一份服务,这样down了一个node业务不会挂
  • 获取和操作集群资源对象不单单是kubectl,你用curl带上证书啥的也能操作集群。
  • kubectl的子命令带上-v=数字能够调试,很多时候是排查的一个手段
  • 如果不想打开网页,想查看pod.spec.hostNetwork字段的意思,kubectl explain pod.spec.hostNetwork

kubeadm的master是利用staticPod,kubelet创建staticPod不需要连接kube-apiserver即可创建,虽然kubelet的log会一直刷连不上kube-apiserver,但是kubelet另一边也会创建管理组件和etcd的staticPod,最终kube-apiserver起来后我们就能操作集群里。staticPod也是基础知识,看到过很多人感觉从没看过官方文档,居然还问staticPod是啥

关于kubeadm过程和更多详细参数选项见下面文章

CATALOG
  1. 1. 事前准备(每台机器)
    1. 1.1. 系统层面设置
    2. 1.2. 安装docker
  2. 2. kubeadm部署
    1. 2.1. 安装kubeadm相关
      1. 2.1.1. master部分
      2. 2.1.2. node
    2. 2.2. 配置kubelet的参数方法(有需要的话)
    3. 2.3. 配置HA,所有机器(单个master跳过此步)
    4. 2.4. 配置集群信息(第一个master上配置)
    5. 2.5. kubeadm init
      1. 2.5.1. 配置其他master的k8s管理组件
      2. 2.5.2. 所有master配置kubectl
      3. 2.5.3. 所有master配置etcdctl
    6. 2.6. node
    7. 2.7. addon(此章开始到结尾选取任意一个master上执行)
      1. 2.7.1. 修改
      2. 2.7.2. 部署flannel
  3. 3. 验证集群可用性
  4. 4. 关于使用kubeadm的注意事项和个人建议
    1. 4.1. 关于kubeadm过程和更多详细参数选项见下面文章