目录

Kubeadm系列-07-cni

概述

Flannel 是默认采用的网络插件方案,默认条件下使用的是 vxlan 的模式,私有化场景下,如果确定客户的主机都在一个子网内,可以使用 host-gw 模式提高网络性能,详情可以参考 Flannel 章节的文档。

安装

Flannel 的安装逻辑如下,通过安装的 yaml 文件里有两个 initContainer,专门就是用来做 CNI 和 Flannel 配置的安装,所以命名也是叫 install-cni-plugin 以及 install-cni。

那么这两个容器主要是怎么安装的呢,其实很简单,可以看看 args 字段,实际上就是把 flannel 的二进制,以及 cni-conf.json 和 10-flannel.conflist 通过 cp 复制到指定的目录。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
initContainers:
- name: install-cni-plugin
 #image: flannelcni/flannel-cni-plugin:v1.1.0 for ppc64le and mips64le (dockerhub limitations may apply)
  image: rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.0
  command:
  - cp
  args:
  - -f
  - /flannel
  - /opt/cni/bin/flannel
  volumeMounts:
  - name: cni-plugin
    mountPath: /opt/cni/bin
- name: install-cni
 #image: flannelcni/flannel:v0.18.1 for ppc64le and mips64le (dockerhub limitations may apply)
  image: rancher/mirrored-flannelcni-flannel:v0.18.1
  command:
  - cp
  args:
  - -f
  - /etc/kube-flannel/cni-conf.json
  - 
  volumeMounts:
  - name: cni
    mountPath: /etc/cni/net.d
  - name: flannel-cfg
    mountPath: /etc/kube-flannel/

这些配置文件又是从哪里来的呢,实际上是来自于 configMap

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }    
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }    

这些配置文件不会像 initContainer 那样把文件落到宿主机的,而是通过 volumeMount 的方式提供给运行 Flannel 二进制的容器,所以这些文件在宿主机上的 /etc/kube-flannel/ 目录是找不到的,进入到 Flannel 的容器才能看到。

1
2
3
4
# kiexec
Namespace: kube-system | Pod: ✔ kube-flannel-ds-82mww
/ # ls /etc/kube-flannel/
cni-conf.json  net-conf.json

默认配置

vxlan 是 Flannel 默认采用的模式,此模式下的节点路由如下:

1
2
3
4
5
6
7
8
# ip r
default via 172.22.0.1 dev eth0
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
10.244.3.0/24 via 10.244.3.0 dev flannel.1 onlink
10.244.4.0/24 via 10.244.4.0 dev flannel.1 onlink
10.244.5.0/24 via 10.244.5.0 dev flannel.1 onlink
169.254.0.0/16 dev eth0 scope link metric 1002

通过修改配置,也可以让 Flannel 切换到 host-gw 上,此模式下的节点路由变成:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# ip r
default via 172.22.0.1 dev eth0
10.4.0.0/24 dev nerdctl0 proto kernel scope link src 10.4.0.1
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1
10.244.1.0/24 via 172.22.1.176 dev eth0
10.244.2.0/24 via 172.22.0.117 dev eth0
10.244.3.0/24 via 172.22.0.76 dev eth0
10.244.4.0/24 via 172.22.0.212 dev eth0
10.244.5.0/24 via 172.22.0.64 dev eth0
169.254.0.0/16 dev eth0 scope link metric 1002
172.22.0.0/20 dev eth0 proto kernel scope link src 172.22.0.239

切换后,Flannel 的日志如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
I0826 03:22:37.551391       1 main.go:463] Found network config - Backend type: host-gw
I0826 03:22:37.551432       1 match.go:195] Determining IP address of default interface
I0826 03:22:37.551838       1 match.go:248] Using interface with name eth0 and address 172.22.1.176
I0826 03:22:37.551860       1 match.go:270] Defaulting external address to interface address (172.22.1.176)
I0826 03:22:37.569614       1 kube.go:351] Setting NodeNetworkUnavailable
I0826 03:22:37.579433       1 main.go:341] Setting up masking rules
I0826 03:22:37.758215       1 main.go:362] Changing default FORWARD chain policy to ACCEPT
I0826 03:22:37.758315       1 main.go:375] Wrote subnet file to /run/flannel/subnet.env
I0826 03:22:37.758326       1 main.go:379] Running backend.
I0826 03:22:37.758343       1 main.go:400] Waiting for all goroutines to exit
I0826 03:22:37.761081       1 route_network.go:55] Watching for new subnet leases
I0826 03:22:37.761153       1 route_network.go:92] Subnet added: 10.244.0.0/24 via 172.22.0.239
W0826 03:22:37.761524       1 route_network.go:151] Replacing existing route to {Ifindex: 5 Dst: 10.244.0.0/24 Src: <nil> Gw: 10.244.0.0 Flags: [onlink] Table: 254 Realm: 0} with {Ifindex: 2 Dst: 10.244.0.0/24 Src: <nil> Gw: 172.22.0.239 Flags: [] Table: 0 Realm: 0}
I0826 03:22:37.848961       1 route_network.go:92] Subnet added: 10.244.2.0/24 via 172.22.0.117
W0826 03:22:37.849059       1 route_network.go:151] Replacing existing route to {Ifindex: 5 Dst: 10.244.2.0/24 Src: <nil> Gw: 10.244.2.0 Flags: [onlink] Table: 254 Realm: 0} with {Ifindex: 2 Dst: 10.244.2.0/24 Src: <nil> Gw: 172.22.0.117 Flags: [] Table: 0 Realm: 0}
I0826 03:22:37.849360       1 route_network.go:92] Subnet added: 10.244.3.0/24 via 172.22.0.76
W0826 03:22:37.849454       1 route_network.go:151] Replacing existing route to {Ifindex: 5 Dst: 10.244.3.0/24 Src: <nil> Gw: 10.244.3.0 Flags: [onlink] Table: 254 Realm: 0} with {Ifindex: 2 Dst: 10.244.3.0/24 Src: <nil> Gw: 172.22.0.76 Flags: [] Table: 0 Realm: 0}
I0826 03:22:37.850273       1 route_network.go:92] Subnet added: 10.244.4.0/24 via 172.22.0.212
W0826 03:22:37.850377       1 route_network.go:151] Replacing existing route to {Ifindex: 5 Dst: 10.244.4.0/24 Src: <nil> Gw: 10.244.4.0 Flags: [onlink] Table: 254 Realm: 0} with {Ifindex: 2 Dst: 10.244.4.0/24 Src: <nil> Gw: 172.22.0.212 Flags: [] Table: 0 Realm: 0}
I0826 03:22:37.850675       1 route_network.go:92] Subnet added: 10.244.5.0/24 via 172.22.0.64
W0826 03:22:37.850758       1 route_network.go:151] Replacing existing route to {Ifindex: 5 Dst: 10.244.5.0/24 Src: <nil> Gw: 10.244.5.0 Flags: [onlink] Table: 254 Realm: 0} with {Ifindex: 2 Dst: 10.244.5.0/24 Src: <nil> Gw: 172.22.0.64 Flags: [] Table: 0 Realm: 0}

其中 Subnet added: 10.244.0.0/24 via 172.22.0.239 的日志已经说的非常明白了,这里调整的路由是将某个节点的 ip 作为某个子网的网关,因此数据包不需要封包,就可以直接路由到这个节点上,另外就是由于 host-gw 不需要封包解包,所以 MTU 的值会被 Flannel 自动改为1500。

1
2
3
4
5
# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1500
FLANNEL_IPMASQ=true

关于修改配置后,其他容器需要重启吗?正常情况是不需要的,因为容器的网络栈只会让容器的数据包发到 cni0 这个设备上,至于后面是走 vxlan 还是 host-gw 完全取决于路由的配置,但是不排除某些组件会对路由、网络方案的改变敏感,是否进行变更,请仔细测试再实行,另外 host-gw 虽然性能上更好,但是使用上是需要满足一定的条件的,最基本的是 worker 节点需要在同一个子网下,也就是二层可以通信的。

性能对比

benchmark 工具使用的是 k8s-bench-suite, 具体命令是 knb --verbose --client-node node2 --server-node node3, 在同样的机器上进行测试,实测结果vxlan 模式对比 host-gw 模式,大概会有10%左右的额外消耗(数据取决于硬件和网络质量)。

vxlan 模式

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
=========================================================
 Benchmark Results
=========================================================
 Name            : knb-12885
 Date            : 2022-08-26 07:11:41 UTC
 Generator       : knb
 Version         : 1.5.0
 Server          : node2
 Client          : node3
 UDP Socket size : auto
=========================================================
  Discovered CPU         : Intel Xeon Processor (Skylake, IBRS)
  Discovered Kernel      : 5.4.127-1.el7.elrepo.x86_64
  Discovered k8s version : v1.21.7
  Discovered MTU         : 1450
  Idle :
    bandwidth = 0 Mbit/s
    client cpu = total 6.97% (user 2.53%, nice 0.05%, system 4.21%, iowait 0.03%, steal 0.15%)
    server cpu = total 8.09% (user 2.73%, nice 0.05%, system 5.18%, iowait 0.00%, steal 0.13%)
    client ram = 1233 MB
    server ram = 1198 MB
  Pod to pod :
    TCP :
      bandwidth = 845 Mbit/s
      client cpu = total 5.06% (user 1.35%, nice 0.05%, system 3.49%, iowait 0.07%, steal 0.10%)
      server cpu = total 10.78% (user 1.76%, nice 0.02%, system 8.98%, iowait 0.02%, steal 0.00%)
      client ram = 1235 MB
      server ram = 1197 MB
    UDP :
      bandwidth = 877 Mbit/s
      client cpu = total 26.54% (user 2.83%, nice 0.05%, system 23.57%, iowait 0.07%, steal 0.02%)
      server cpu = total 13.43% (user 3.74%, nice 0.03%, system 9.56%, iowait 0.00%, steal 0.10%)
      client ram = 1234 MB
      server ram = 1198 MB
  Pod to Service :
    TCP :
      bandwidth = 856 Mbit/s
      client cpu = total 5.25% (user 1.40%, nice 0.05%, system 3.68%, iowait 0.05%, steal 0.07%)
      server cpu = total 10.31% (user 1.92%, nice 0.02%, system 8.37%, iowait 0.00%, steal 0.00%)
      client ram = 1233 MB
      server ram = 1199 MB
    UDP :
      bandwidth = 835 Mbit/s
      client cpu = total 27.90% (user 2.94%, nice 0.02%, system 24.82%, iowait 0.07%, steal 0.05%)
      server cpu = total 13.29% (user 3.74%, nice 0.03%, system 9.49%, iowait 0.00%, steal 0.03%)
      client ram = 1236 MB
      server ram = 1203 MB
=========================================================

host-gw 模式

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
=========================================================
 Benchmark Results
=========================================================
 Name            : knb-8657
 Date            : 2022-08-26 07:08:07 UTC
 Generator       : knb
 Version         : 1.5.0
 Server          : node2
 Client          : node3
 UDP Socket size : auto
=========================================================
  Discovered CPU         : Intel Xeon Processor (Skylake, IBRS)
  Discovered Kernel      : 5.4.127-1.el7.elrepo.x86_64
  Discovered k8s version : v1.21.7
  Discovered MTU         : 1500
  Idle :
    bandwidth = 0 Mbit/s
    client cpu = total 3.35% (user 1.56%, nice 0.02%, system 1.70%, iowait 0.07%, steal 0.00%)
    server cpu = total 2.45% (user 1.14%, nice 0.09%, system 1.22%, iowait 0.00%, steal 0.00%)
    client ram = 1258 MB
    server ram = 1194 MB
  Pod to pod :
    TCP :
      bandwidth = 875 Mbit/s
      client cpu = total 4.53% (user 1.37%, nice 0.00%, system 3.00%, iowait 0.09%, steal 0.07%)
      server cpu = total 7.61% (user 1.49%, nice 0.07%, system 5.98%, iowait 0.02%, steal 0.05%)
      client ram = 1250 MB
      server ram = 1197 MB
    UDP :
      bandwidth = 944 Mbit/s
      client cpu = total 34.08% (user 4.70%, nice 0.03%, system 28.94%, iowait 0.03%, steal 0.38%)
      server cpu = total 18.45% (user 4.81%, nice 0.02%, system 13.11%, iowait 0.02%, steal 0.49%)
      client ram = 1245 MB
      server ram = 1197 MB
  Pod to Service :
    TCP :
      bandwidth = 931 Mbit/s
      client cpu = total 4.01% (user 1.25%, nice 0.05%, system 2.62%, iowait 0.09%, steal 0.00%)
      server cpu = total 8.14% (user 1.59%, nice 0.02%, system 6.48%, iowait 0.00%, steal 0.05%)
      client ram = 1242 MB
      server ram = 1197 MB
    UDP :
      bandwidth = 896 Mbit/s
      client cpu = total 26.61% (user 2.79%, nice 0.02%, system 23.73%, iowait 0.07%, steal 0.00%)
      server cpu = total 11.16% (user 3.18%, nice 0.03%, system 7.89%, iowait 0.00%, steal 0.06%)
      client ram = 1236 MB
      server ram = 1197 MB
=========================================================

参考资料

  1. Flannel的两种模式解析(VXLAN、host-gw)
  2. Benchmark results of Kubernetes network plugins (CNI) over 10Gbit/s network (Updated: August 2020)
警告
本文最后更新于 2022年3月20日,文中内容可能已过时,请谨慎参考。