概述
这里的指标采集,包括 Cilium Operator,Cilium 本身以及 Hubble 吐出的指标。
安装
通过 Helm 来部署 Cilium,其中修改了大量的参数值,以满足 Staging 集群的环境的需求,主要就是污点的容忍、调度的 NodeSelector 以及 SecurityContext 等,另外就是需要开启关于 metrics 的相关的参数,修改后的配置文件与默认的配置的差别如下。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
|
# diff values.yaml vip-proxy-metrics.yaml
150c150
< useDigest: true
---
> useDigest: false
163a164
> kubernetes.io/hostname: 10.189.212.125
168,172c169,172
< - operator: Exists
< # - key: "key"
< # operator: "Equal|Exists"
< # value: "value"
< # effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
---
> - operator: Equal
> key: "key"
> value: "cilium"
> effect: "NoExecute"
188c188,190
< extraEnv: []
---
> extraEnv:
> - name: KUBERNETES_SERVICE_HOST
> value: hh-k8s-noah-sc-staging001-master.api.vip.com
215c217
< podSecurityContext:
---
> podSecurityContext: {}
237c239
< privileged: false
---
> privileged: true
538c540
< chainingMode: none
---
> chainingMode: generic-veth
559c561
< customConf: false
---
> customConf: true
577c579
< # configMap: cni-configuration
---
> configMap: cni-configuration
940c942
< useDigest: true
---
> useDigest: false
952c954,958
< tolerations: []
---
> tolerations:
> - operator: Equal
> key: "key"
> value: "cilium"
> effect: "NoExecute"
982,988c988,994
< # enabled:
< # - dns:query;ignoreAAAA
< # - drop
< # - tcp
< # - flow
< # - icmp
< # - http
---
> enabled:
> - dns:query;ignoreAAAA
> - drop
> - tcp
> - flow
> - icmp
> - http
994d999
< enabled: ~
1059c1064
< enabled: true
---
> enabled: false
1105c1110
< enabled: false
---
> enabled: true
1117c1122
< useDigest: true
---
> useDigest: false
1144a1150
> kubernetes.io/hostname: 10.189.212.125
1148c1154,1158
< tolerations: []
---
> tolerations:
> - operator: Equal
> key: "key"
> value: "cilium"
> effect: "NoExecute"
1151c1161,1163
< extraEnv: []
---
> extraEnv:
> - name: KUBERNETES_SERVICE_HOST
> value: hh-k8s-noah-sc-staging001-master.api.vip.com
1257c1269
< enabled: false
---
> enabled: true
1293c1305
< enabled: false
---
> enabled: true
1338c1350
< useDigest: true
---
> useDigest: false
1342c1354,1356
< securityContext: {}
---
> securityContext:
> privileged: true
>
1345c1359,1361
< extraEnv: []
---
> extraEnv:
> - name: KUBERNETES_SERVICE_HOST
> value: hh-k8s-noah-sc-staging001-master.api.vip.com
1369c1385
< useDigest: true
---
> useDigest: false
1373c1389,1391
< securityContext: {}
---
> securityContext:
> privileged: true
>
1376c1394,1396
< extraEnv: []
---
> extraEnv:
> - name: KUBERNETES_SERVICE_HOST
> value: hh-k8s-noah-sc-staging001-master.api.vip.com
1395c1415
< enabled: true
---
> enabled: false
1429a1450
> kubernetes.io/hostname: 10.189.212.125
1433c1454,1458
< tolerations: []
---
> tolerations:
> - operator: Equal
> key: "key"
> value: "cilium"
> effect: "NoExecute"
1585c1610
< # kubeProxyReplacement: "true"
---
> kubeProxyReplacement: "true"
1625c1650
< enableIPv4Masquerade: true
---
> enableIPv4Masquerade: false
1773c1798
< enabled: false
---
> enabled: true
1855c1880
< useDigest: true
---
> useDigest: false
1864c1889,1891
< extraEnv: []
---
> extraEnv:
> - name: KUBERNETES_SERVICE_HOST
> value: hh-k8s-noah-sc-staging001-master.api.vip.com
1970a1998
> kubernetes.io/hostname: 10.189.212.125
1975,1979c2003,2006
< - operator: Exists
< # - key: "key"
< # operator: "Equal|Exists"
< # value: "value"
< # effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
---
> - operator: Equal
> key: "key"
> value: "cilium"
> effect: "NoExecute"
2120c2147
< routingMode: ""
---
> routingMode: "native"
2146c2173
< useDigest: true
---
> useDigest: false
2164,2168c2191,2194
< - operator: Exists
< # - key: "key"
< # operator: "Equal|Exists"
< # value: "value"
< # effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
---
> - operator: Equal
> key: "key"
> value: "cilium"
> effect: "NoExecute"
2179a2206
> kubernetes.io/hostname: 10.189.212.125
2258c2285
< useDigest: true
---
> useDigest: false
2263c2290
< replicas: 2
---
> replicas: 1
2297a2325
> kubernetes.io/hostname: 10.189.212.125
2302,2306c2330,2333
< - operator: Exists
< # - key: "key"
< # operator: "Equal|Exists"
< # value: "value"
< # effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
---
> - operator: Equal
> key: "key"
> value: "cilium"
> effect: "NoExecute"
2312c2339,2341
< extraEnv: []
---
> extraEnv:
> - name: KUBERNETES_SERVICE_HOST
> value: hh-k8s-noah-sc-staging001-master.api.vip.com
2389c2418
< enabled: false
---
> enabled: true
2434c2463
< restart: true
---
> restart: false
2458c2487,2489
< extraEnv: []
---
> extraEnv:
> - name: KUBERNETES_SERVICE_HOST
> value: hh-k8s-noah-sc-staging001-master.api.vip.com
2472a2504
> kubernetes.io/hostname: 10.189.212.125
2477,2481c2509,2512
< - operator: Exists
< # - key: "key"
< # operator: "Equal|Exists"
< # value: "value"
< # effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
---
> - operator: Equal
> key: "key"
> value: "cilium"
> effect: "NoExecute"
2498c2529
< privileged: false
---
> privileged: true
2539c2570
< useDigest: true
---
> useDigest: false
2550c2581,2583
< extraEnv: []
---
> extraEnv:
> - name: KUBERNETES_SERVICE_HOST
> value: hh-k8s-noah-sc-staging001-master.api.vip.com
2570a2604
> kubernetes.io/hostname: 10.189.212.125
2586,2589c2620,2623
< # - key: "key"
< # operator: "Equal|Exists"
< # value: "value"
< # effect: "NoSchedule|PreferNoSchedule|NoExecute(1.6 only)"
---
> - operator: Equal
> key: "key"
> value: "cilium"
> effect: "NoExecute"
2689c2723
< useDigest: true
---
> useDigest: false
2699c2733
< useDigest: true
---
> useDigest: false
2736c2770
< useDigest: true
---
> useDigest: false
2743c2777,2779
< extraEnv: []
---
> extraEnv:
> - name: KUBERNETES_SERVICE_HOST
> value: hh-k8s-noah-sc-staging001-master.api.vip.com
2798c2834,2836
< extraEnv: []
---
> extraEnv:
> - name: KUBERNETES_SERVICE_HOST
> value: hh-k8s-noah-sc-staging001-master.api.vip.com
2864a2903
> kubernetes.io/hostname: 10.189.212.125
2868c2907,2911
< tolerations: []
---
> tolerations:
> - operator: Equal
> key: "key"
> value: "cilium"
> effect: "NoExecute"
3138c3181,3185
< tolerations: []
---
> tolerations:
> - operator: Equal
> key: "key"
> value: "cilium"
> effect: "NoExecute"
3167c3214,3218
< tolerations: []
---
> tolerations:
> - operator: Equal
> key: "key"
> value: "cilium"
> effect: "NoExecute"
|
另外还需要部署 Grafana 和 Prometheus 来验证指标收集的效果。
1
|
k apply -f https://raw.githubusercontent.com/cilium/cilium/HEAD/examples/kubernetes/addons/prometheus/monitoring-example.yaml
|
最终部署的结果如下。
1
2
3
4
5
6
7
8
9
|
# k get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE
cilium-c987r 1/1 Running 0 3h22m 10.189.212.125 10.189.212.125
cilium-operator-7df8cb69b8-2h4gm 1/1 Running 0 3h22m 10.189.212.125 10.189.212.125
hubble-ui-7b4bcf6bcf-d4fpb 2/2 Running 0 3h22m 10.189.82.106 10.189.212.125
# k get po -n cilium-monitoring -o wide
NAME READY STATUS RESTARTS AGE IP NODE
grafana-7457fdc76-xhg8l 1/1 Running 0 3h7m 10.189.83.14 10.189.212.125
prometheus-547b7d9856-zl8lp 1/1 Running 0 3h7m 10.189.83.12 10.189.212.125
|
需要注意的是,如果一开始没有通过 helm 开启选项 --set prometheus.enabled=true
让 cilium-agent 暴露 Prometheus 指标,手动配置的时候,需要注意将相关的 annotation 配置到 PodTemplate,然后通过 Prometheus 查看是否能看到指标,否则肯定就是某些配置的问题。
查看Dashboard
指标Label
所有的指标,如果没有合适的 Label,就无法精准表示指标的含义了,但是大量的 Label 会增加存储容量的需求,需要根据需求,适当设计。
默认的安装的方法里,配置 Hubble 指标主要在下面的地方,除了按照 dns, drop, tcp 配置外,如果需要配置上流量的上下文,还需要配置一些特殊的标记,具体参考 Hubble Metrics。
1
2
3
4
5
6
7
8
9
10
|
hubble:
enabled: true
metrics:
enabled:
- dns:query;ignoreAAAA
- drop
- tcp
- flow:destinationContext=dns|ip
- icmp
- http
|
根据以上的配置,flow:destinationContext=dns|ip
,将会在 flow 的指标上添加目标上下文的信息,如果有域名就填域名,没有就是 IP,最终的指标如下。
1
2
3
|
hubble_flows_processed_total{destination="10.189.94.59",protocol="TCP",subtype="to-stack",type="Trace",verdict="FORWARDED"} 159
hubble_flows_processed_total{destination="10.190.135.235",protocol="TCP",subtype="to-stack",type="Trace",verdict="FORWARDED"} 159
hubble_flows_processed_total{destination="10.190.56.61",protocol="TCP",subtype="to-stack",type="Trace",verdict="FORWARDED"} 1
|
Service Map
官方的 Hubble Grafana 插件是收费的,所以如果需要做 Service Map 的话,需要开发一个将 Hubble 的指标转成 Node Graph 插件要求的格式的转换的插件。
参考资料
- Monitoring & Metrics
- Hubble Service Map
- Cilium Hubble Series (Part 3): Hubble and Grafana Better Together
警告
本文最后更新于 2023年11月12日,文中内容可能已过时,请谨慎参考。