目录

cilium-chain和Contiv

概述

测试在 Contiv Netplugin 的集群中,通过 cilium-chain 的方式,提供网络策略以及流量可视化的能力。

测试环境

测试集群,其中有几个节点内核是 5.10 以上的,符合 Cilium 安装的要求。

1
2
3
4
# kubectl get node --show-labels|grep -i open
10.189.212.124  Ready 209d  v1.20.3-vip.2  beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,fat=true,kubernetes.io/arch=amd64,kubernetes.io/hostname=10.189.212.124,kubernetes.io/os=linux,nansha=true,os-type=openeuler,status=online
10.189.212.125  Ready 195d  v1.20.3-vip.2  beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=10.189.212.125,kubernetes.io/os=linux,machine-type=A1-1,nansha=true,os-type=openeuler,osp-proxy=true,production=true,status=online
10.189.222.60   Ready 191d  v1.20.3-vip.2  beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=10.189.222.60,kubernetes.io/os=linux,os-type=openeuler,usetype=jenkins-openeuler

选定10.189.212.124和10.189.212.125作为 cilium-agent 部署的节点。

Cilium部署

考虑到公司内部的集群使用到 Contrive Netplugin 作为 CNI,原生是没有 Service 能力的,因此在使用 Helm 部署的时候,需要给各个组件补充 KUBERNETES_SERVICE_HOST 这个环境变量,另外还有一些 NodeSelector 等条件的配置,下面是需要修改的一些 Helm 的 values 值与默认配置不同的地方。

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
<   useDigest: false
---
>   useDigest: true
164d163
<   kubernetes.io/hostname: 10.189.212.124
189,191c188
< extraEnv:
<   - name: KUBERNETES_SERVICE_HOST
<     value: hh-k8s-noah-sc-staging001-master.api.vip.com
---
> extraEnv: []
218c215
< podSecurityContext: {}
---
> podSecurityContext:
541c538
<   chainingMode: generic-veth
---
>   chainingMode: none
562c559
<   customConf: true
---
>   customConf: false
580c577
<   configMap: cni-configuration
---
>   # configMap: cni-configuration
943c940
<     useDigest: false
---
>     useDigest: true
1062c1059
<     enabled: false
---
>     enabled: true
1108c1105
<     enabled: true
---
>     enabled: false
1120c1117
<       useDigest: false
---
>       useDigest: true
1148d1144
<       kubernetes.io/hostname: 10.189.212.124
1155,1157c1151
<     extraEnv:
<       - name: KUBERNETES_SERVICE_HOST
<         value: hh-k8s-noah-sc-staging001-master.api.vip.com
---
>     extraEnv: []
1299c1293
<     enabled: true
---
>     enabled: false
1344c1338
<         useDigest: false
---
>         useDigest: true
1351,1353c1345
<       extraEnv:
<         - name: KUBERNETES_SERVICE_HOST
<           value: hh-k8s-noah-sc-staging001-master.api.vip.com
---
>       extraEnv: []
1377c1369
<         useDigest: false
---
>         useDigest: true
1384,1386c1376
<       extraEnv:
<         - name: KUBERNETES_SERVICE_HOST
<           value: hh-k8s-noah-sc-staging001-master.api.vip.com
---
>       extraEnv: []
1405c1395
<           enabled: false
---
>           enabled: true
1440d1429
<       kubernetes.io/hostname: 10.189.212.124
1636c1625
< enableIPv4Masquerade: false
---
> enableIPv4Masquerade: true
1866c1855
<     useDigest: false
---
>     useDigest: true
1875,1877c1864
<   extraEnv:
<     - name: KUBERNETES_SERVICE_HOST
<       value: hh-k8s-noah-sc-staging001-master.api.vip.com
---
>   extraEnv: []
1984d1970
<     kubernetes.io/hostname: 10.189.212.124
2134c2120
< routingMode: "native"
---
> routingMode: ""
2160c2146
<     useDigest: false
---
>     useDigest: true
2194d2179
<     kubernetes.io/hostname: 10.189.212.124
2273c2258
<     useDigest: false
---
>     useDigest: true
2278c2263
<   replicas: 1
---
>   replicas: 2
2313d2297
<     kubernetes.io/hostname: 10.189.212.124
2328,2330c2312
<   extraEnv:
<     - name: KUBERNETES_SERVICE_HOST
<       value: hh-k8s-noah-sc-staging001-master.api.vip.com
---
>   extraEnv: []
2476,2478c2458
<   extraEnv:
<     - name: KUBERNETES_SERVICE_HOST
<       value: hh-k8s-noah-sc-staging001-master.api.vip.com
---
>   extraEnv: []
2493d2472
<     kubernetes.io/hostname: 10.189.212.124
2560c2539
<     useDigest: false
---
>     useDigest: true
2571,2573c2550
<   extraEnv:
<     - name: KUBERNETES_SERVICE_HOST
<       value: hh-k8s-noah-sc-staging001-master.api.vip.com
---
>   extraEnv: []
2594d2570
<     kubernetes.io/hostname: 10.189.212.124
2713c2689
<       useDigest: false
---
>       useDigest: true
2723c2699
<         useDigest: false
---
>         useDigest: true
2760c2736
<         useDigest: false
---
>         useDigest: true
2767,2769c2743
<       extraEnv:
<         - name: KUBERNETES_SERVICE_HOST
<           value: hh-k8s-noah-sc-staging001-master.api.vip.com
---
>       extraEnv: []
2824,2826c2798
<     extraEnv:
<       - name: KUBERNETES_SERVICE_HOST
<         value: hh-k8s-noah-sc-staging001-master.api.vip.com
---
>     extraEnv: []
2893d2864
<       kubernetes.io/hostname: 10.189.212.124

需要注意的是,在配置 cni-configuration 的时候,因为公司内部集群使用的 CNI 版本是0.1.0,低于官方提供的版本0.3.1,因此需要做一下修改,具体如下。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
apiVersion: v1
kind: ConfigMap
metadata:
  name: cni-configuration
  namespace: kube-system
data:
  cni-config: |-
    {
      "name": "generic-veth",
      "cniVersion": "0.1.0",
      "plugins": [
        {
          "type": "contivk8s.bin"
        },
        {
          "type": "cilium-cni"
        }
      ]
    }    

同时,通过部署一个包含大量网络工具的 Pod,来验证 cilium-chain 下的网络策略的能力。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: apps/v1
kind: DaemonSet
metadata:
  namespace: kube-system
  name: nm
spec:
  selector:
    matchLabels:
      app: network-multitool
  template:
    metadata:
      labels:
        app: network-multitool
    spec:
      nodeSelector:
        kubernetes.io/os: linux
        os-type: openeuler
      containers:
        - name: network-multitool
          image: runzhliu/network-multitool:latest
          command: ["/bin/bash", "-c", "sleep infinity"]
          securityContext:
            privileged: true

Hubble 的部署需要注意一点,因为集群没有 Service 能力,而默认的部署配置中,hubble-relay 会通过 hubble-peer 这个 service 来创建一个 grpc 客户端访问 cilium-agent 的4244端口来获取具体的流量 flow 的信息,因此在测试的部署中,需要手动修改 ConfigMap 中的配置,具体如下。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
apiVersion: v1
kind: ConfigMap
metadata:
  name: hubble-relay-config
  namespace: kube-system
data:
  config.yaml: |
    cluster-name: default
    # 修改此处为其中一个cilium-agent的podIp:port
    # peer-service: "hubble-peer.kube-system.svc.cluster.local:443"
    peer-service: "10.189.212.124:4244"
    listen-address: :4245
    gops: true
    gops-port: "9893"
    dial-timeout:
    retry-timeout:
    sort-buffer-len-max:
    sort-buffer-drain-timeout:    

最终部署的结果如下,其中 cilium operator 和 cilium agent 都部署成功,另外 hubble 作为流量可视化的工具也部署完成。

1
2
3
4
5
6
7
8
9
# k get pods -n kube-system -o wide
NAME                               READY   STATUS        RESTARTS   AGE   IP               NODE             
cilium-j72zq                       1/1     Running       0          13h   10.189.212.124   10.189.212.124   
cilium-operator-656ccd67df-25zk4   1/1     Running       1          13h   10.189.212.124   10.189.212.124   
hubble-relay-6cb98b55b8-x77t6      1/1     Running       0          11h   10.189.50.250    10.189.212.124   
hubble-ui-c95d56bff-r8d4m          2/2     Running       0          11h   10.189.51.40     10.189.212.124   
nm-4fpwk                           1/1     Running       0          13h   10.189.48.149    10.189.212.125   
nm-s6d64                           1/1     Running       0          13h   10.189.48.114    10.189.212.124   
nm-smvgs                           1/1     Running       0          13h   10.189.48.106    10.189.222.60    

因为 10.189.212.124 是 cilium agent 的部署节点,因此在这个节点上创建的 Pod 的网卡会被加载 eBPF 的程序,如下。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[root@ns-k8s-noah-staging001-node-s0092 ~]# tc filter show dev vvport19736 egress
filter protocol all pref 1 bpf chain 0
filter protocol all pref 1 bpf chain 0 handle 0x1 cil_to_container-vvport19736 direct-action not_in_hw id 1953 tag 2c99aaca58eec1f6 jited
[root@ns-k8s-noah-staging001-node-s0092 ~]# tc filter show dev vvport19736 ingress
filter protocol all pref 1 bpf chain 0
filter protocol all pref 1 bpf chain 0 handle 0x1 cil_from_container-vvport19736 direct-action not_in_hw id 1956 tag aa33d0c0bc33ceb9 jited
# kubectl -n kube-system exec ds/cilium -- bpftool net show dev vvport8702
xdp:

tc:
vvport8702(3898) clsact/ingress cil_from_container-vvport8702 id 578
vvport8702(3898) clsact/egress cil_to_container-vvport8702 id 579

flow_dissector:

# kubectl -n kube-system exec ds/cilium -- cilium map get cilium_lxc
Key               Value                                                                                             State   Error
10.189.55.252:0   id=3624  sec_id=1026  flags=0x0000 ifindex=3900 mac=02:02:0A:BD:37:FC nodemac=1A:BF:8B:0F:AB:A6   sync
10.189.54.157:0   id=3020  sec_id=6175  flags=0x0000 ifindex=3898 mac=02:02:0A:BD:36:9D nodemac=E2:FC:F6:A9:25:63   sync
10.189.82.106:0   id=2086  sec_id=32258 flags=0x0000 ifindex=3912 mac=02:02:0A:BD:52:6A nodemac=22:81:F8:3D:6A:7B   sync
10.189.83.12:0    id=3670  sec_id=42792 flags=0x0000 ifindex=3914 mac=02:02:0A:BD:53:0C nodemac=7A:5C:C0:FE:53:48   sync
10.189.83.14:0    id=1298  sec_id=62616 flags=0x0000 ifindex=3916 mac=02:02:0A:BD:53:0E nodemac=22:B5:40:34:B7:63   sync
# kubectl -n kube-system exec ds/cilium -- bpftool net show dev bond0.212
xdp:

tc:
bond0.212(7) clsact/ingress cil_from_netdev-bond0.212 id 636
bond0.212(7) clsact/egress cil_to_netdev-bond0.212 id 633

flow_dissector:

另外可以查看一下非 cilium-agent 的部署节点 10.189.212.125 上的 Pod 的网卡的情况,可以看到并没有加载任何的 eBPF 程序。

1
2
[root@ns-k8s-noah-staging001-node-s0093 ~]# tc filter show dev vvport8692 ingress
[root@ns-k8s-noah-staging001-node-s0093 ~]# tc filter show dev vvport8692 egress

再查看一下节点的 CNI 配置。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
[root@ns-k8s-noah-staging001-node-s0092 ~]# ls /etc/cni/net.d/
05-cilium.conflist  87-podman-bridge.conflist  contiv_cni.conf
[root@ns-k8s-noah-staging001-node-s0092 ~]# cat /etc/cni/net.d/05-cilium.conflist
{
  "name": "generic-veth",
  "cniVersion": "0.1.0",
  "plugins": [
    {
      "type": "contivk8s.bin"
    },
    {
      "type": "cilium-cni"
    }
  ]
}

网络策略

在不创建任何网络策略的情况下,Cilium 默认网络流量都是完全放通的,下面验证一下。

1
2
3
4
5
6
7
# k -n kube-system exec -it nm-s6d64 -- ping -I 10.189.48.114 -c 1 10.189.48.149
PING 10.189.48.149 (10.189.48.149) from 10.189.48.114 : 56(84) bytes of data.
64 bytes from 10.189.48.149: icmp_seq=1 ttl=64 time=1.43 ms

--- 10.189.48.149 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.434/1.434/1.434/0.000 ms

我们从 Hubble UI 上也可以看到相关的流量,这个实验也验证了通过 Cilium Chaining 创建的网卡跟原生的 Contiv Netplugin 的连通性。

/cilium-chain%E5%92%8Ccontiv/img.png

下面创建一个网络策略,计划是禁止 10.189.48.114 的流量到 10.189.48.149。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "icmp-rule"
  namespace: kube-system
spec:
  endpointSelector:
    matchLabels:
      app: network-multitool
  egress:
    - toCIDRSet:
        - cidr: 10.0.0.0/8
          except:
            - 10.189.48.149/32                

效果如下。

1
2
3
4
5
6
7
[root@hh-k8s-noah-staging001-master-s1001 ~]#  k -n kube-system exec -it nm-s6d64 -- ping -I 10.189.48.114 -c 1 10.189.48.149
PING 10.189.48.149 (10.189.48.149) from 10.189.48.114 : 56(84) bytes of data.

--- 10.189.48.149 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

command terminated with exit code 1

从 Hubble UI 可以看到被 drop 的流量。

/cilium-chain%E5%92%8Ccontiv/img_1.png

另外,再测试一下跨网段的网络的连通性,10.199.136.24 这个 IP 是之前测试混部的时候另外一个测试环境下的集群的节点。

1
2
3
4
5
6
7
#  k -n kube-system exec -it nm-s6d64 -- ping -I 10.189.48.114 -c 1 10.199.136.24
PING 10.199.136.24 (10.199.136.24) from 10.189.48.114 : 56(84) bytes of data.
64 bytes from 10.199.136.24: icmp_seq=1 ttl=63 time=0.981 ms

--- 10.199.136.24 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.981/0.981/0.981/0.000 ms

创建一个网络策略,目的是禁止10.189.48.114的流量到10.199.136.24。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "icmp-rule"
  namespace: kube-system
spec:
  endpointSelector:
    matchLabels:
      app: network-multitool
  egress:
    - toCIDRSet:
        - cidr: 10.0.0.0/8
          except:
            - 10.199.136.24/32                

效果如下。

1
2
3
4
5
6
7
#  k -n kube-system exec -it nm-s6d64 -- ping -I 10.189.48.114 -c 1 10.199.136.24
PING 10.199.136.24 (10.199.136.24) from 10.189.48.114 : 56(84) bytes of data.

--- 10.199.136.24 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

command terminated with exit code 1

总结

从测试集群的实验结果可以得出结论,通过 cilium-chain 的模式,可以与公司内部的 Contiv Netplugin 的集群兼容,补充原来不具备的网络策略以及可观测性的能力。关于网络策略的编写是比较复杂且容易出错的,平台可以参考 editor.networkpolicy,实现一个网络策略的增删改查的功能。

参考资料

  1. cilium native-routing模式流程分析
  2. Kube-OVN集成Cilium
警告
本文最后更新于 2023年11月12日,文中内容可能已过时,请谨慎参考。