zhangguanzhang's Blog

不走etcd v2 api下二进制跑flannel的总结

字数统计: 1.1k阅读时长: 5 min
2019/03/15 Share

      这几天给线上搭建k8s集群,文件和systemd参数大多是从kubeadm的staticPod的yml里扣出来的,起初是除了flanneld全部是systemd管理二进制。起初因为脚本问题导致kube-proxy的kubeocnfig少执行了use-context所以显示的是匿名用户无法list node信息,kube-proxy是在运行但是无法维持svc的网络,flanneld因为是pod要watch节点是通过kubernetes这个svc走的连接导致不通,然后pod状态变成退出,而kubelet在不手动清理掉退出状态的flannel容器就不会创建新的flannel的pod,而改退出状态容器的gc时间又是全局这样不好,遂萌生了之前一个未完成的坑: flanneld扣成二进制跑
      国内和看到的一些个人博客的flanneld全部都是通过etcdctl v2去写pod子网(而且不用cni plugins还去改docker0的段就不说了..),感觉这种贼low,这几天查了下资料成功搞出来了,记录下过程。
      先说下之前的尝试:之前就把kube-flannel.yml文件和参数照搬写成systemd脚本启动,yml里的clusterrole和sa啥的都创建好然后用sa的token生成了kubeconfig给flannel用。但是运行不起来,报错env variables POD_NAME and POD_NAMESPACE must be set
      今天刚开始谷歌搜关键字搜到了这个issue https://github.com/coreos/flannel/issues/932 ,说手动添加上面两个变量来欺骗集群。但是开发者给出答案说只需要设置下每个node环境变量NODE_NAME为名字就行了,issue里还贴了源码对应的逻辑判断
https://github.com/coreos/flannel/blob/master/subnet/kube/kube.go#L94

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
nodeName := os.Getenv("NODE_NAME")
if nodeName == "" {
podName := os.Getenv("POD_NAME")
podNamespace := os.Getenv("POD_NAMESPACE")
if podName == "" || podNamespace == "" {
return nil, fmt.Errorf("env variables POD_NAME and POD_NAMESPACE must be set")
}

pod, err := c.Pods(podNamespace).Get(podName, metav1.GetOptions{})
if err != nil {
return nil, fmt.Errorf("error retrieving pod spec for '%s/%s': %v", podNamespace, podName, err)
}
nodeName = pod.Spec.NodeName
if nodeName == "" {
return nil, fmt.Errorf("node name not present in pod spec '%s/%s'", podNamespace, podName)
}
}

找了个flannel的pod默认情况下的环境变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
"Env": [
"POD_NAME=kube-flannel-ds-amd64-gmzsf",
"POD_NAMESPACE=kube-system",
"KUBE_DNS_SERVICE_PORT_DNS=53",
"KUBE_DNS_SERVICE_PORT_DNS_TCP=53",
"KUBE_DNS_PORT_53_UDP_PORT=53",
"KUBE_DNS_PORT_53_TCP_PORT=53",
"KUBE_DNS_PORT_9153_TCP_ADDR=10.96.0.10",
"KUBERNETES_SERVICE_HOST=10.96.0.1",
"KUBERNETES_PORT_443_TCP_PROTO=tcp",
"KUBERNETES_PORT_443_TCP_PORT=443",
"KUBE_DNS_SERVICE_HOST=10.96.0.10",
"KUBE_DNS_SERVICE_PORT=53",
"KUBE_DNS_SERVICE_PORT_METRICS=9153",
"KUBE_DNS_PORT_53_TCP=tcp://10.96.0.10:53",
"KUBE_DNS_PORT_9153_TCP_PROTO=tcp",
"KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1",
"KUBE_DNS_PORT=udp://10.96.0.10:53",
"KUBE_DNS_PORT_53_UDP_ADDR=10.96.0.10",
"KUBE_DNS_PORT_53_TCP_PROTO=tcp",
"KUBE_DNS_PORT_9153_TCP_PORT=9153",
"KUBERNETES_SERVICE_PORT=443",
"KUBERNETES_SERVICE_PORT_HTTPS=443",
"KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443",
"KUBE_DNS_PORT_53_UDP=udp://10.96.0.10:53",
"KUBE_DNS_PORT_53_UDP_PROTO=udp",
"KUBE_DNS_PORT_53_TCP_ADDR=10.96.0.10",
"KUBE_DNS_PORT_9153_TCP=tcp://10.96.0.10:9153",
"KUBERNETES_PORT=tcp://10.96.0.1:443",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"FLANNEL_ARCH=amd64"
]

发现根本没环境变量NODE_NAME但是有变量POD_NAMEPOD_NAMESPACE,也就是进入到代码if里,但是systemd跑下三个环境变量都没有,也就抛出错误退出。如果我们设置了环境变量NODE_NAME应该就会像开发者回复那样运行。从kube-flannel.yml思考了下逻辑:

  • net-conf.json包含所有flanneld的CIDR信息文件挂载在目录/etc/kube-flannel/,手动创建目录和存放文件。当然可以看到源码 https://github.com/coreos/flannel/blob/master/subnet/kube/kube.go#L54 是固定文件名的
  • cni-conf.json文件同样挂载在目录/etc/kube-flannel/,但是initContainer逻辑复制到目录/etc/cni/net.d/10-flannel.conflist了,systemd脚本里可以写prestart exec啥的模仿,这里我偷懒直接改名放进去
  • flanneld的pod会在宿主机目录/run/flannel生成自己分配到的cidr和mtu信息文件,文件是连接apiserver后生成的,所以我们只需要创建目录
  • kube-flannel.yml里的clusterrole和sa啥的都创建好然后用sa的token生成了kubeconfig给flannel使用
  • 准备二进制文件然后就是systemd脚本,明确bind哪张网卡,然后ip,还有健康检查bind的ip和port来用于监控。吐槽下flannel从prometheus的爸爸coreos出生居然到现在还没有metrics指标功能
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    cat<<EOF>/usr/lib/systemd/system/flanneld.service
    [Unit]
    Description=Network fabric for containers
    Documentation=https://github.com/coreos/flannel
    After=network.target
    After=network-online.target
    Wants=network-online.target

    [Service]
    Type=notify
    Restart=always
    RestartSec=5
    # Kubernetes knows the nodes by their FQDN so we have to use the FQDN
    #Environment=NODE_NAME=my-node.foo.bar.com
    # Note that we don't specify any etcd option. This is because we want to talk
    # to the apiserver instead. The apiserver then talks to etcd on flannel's
    # behalf.
    Environment=NODE_NAME={{ nodename }}
    ExecStart=/usr/local/bin/flanneld \
    --kube-subnet-mgr=true \
    --kubeconfig-file=/etc/kubernetes/flanneld.kubeconfig \
    --ip-masq=true \
    --iface={{ INTERFACE_NAME }} \
    --public-ip {{ inventory_hostname }} \
    --healthz-ip {{ inventory_hostname }} \
    --healthz-port {{ flanneld_healthz_port }} \
    --v=2

k8s
k8s

上面部署已集成在我写的ansible部署里 https://github.com/zhangguanzhang/Kubernetes-ansible

CATALOG