目录

CephFS动态卷跨k8s集群恢复

概述

本文主要是分析,Kubernetes 的 CephFS CSI 创建的动态卷,能否通过备份 PV/PVC 的 YAML 文件,跨集群恢复。

备份恢复流程

可以按照下面的脚本顺序,备份 A 集群的PV/PVC/Pod,然后再切换到 B 集群恢复,恢复之后,PVC 可以正常 Bound 到 PV,并且 Pod 也可以挂载到 PVC。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# 备份A集群的PV/PVC/Pod
kubeclt get cm -o yaml > cm.yaml
kubeclt get secret csi-cephfs-secret -o yaml > secret.yaml
kubeclt get secret sc -o yaml > sc.yaml
kubeclt get pvc claim-oscar01-2eliu -o yaml > pvc.yaml
kubeclt get pv pvc-c13aa21a-ad98-4fd7-a766-58edbf8ec698 -o yaml > pv.yaml
kubeclt get po jupyter-oscar01-2eliu -o yaml > pod.yaml

# B集群恢复
kubeclt apply -f cm-B.yaml
kubeclt apply -f secret-B.yaml
kubeclt apply -f sc.yaml
kubeclt apply -f pv-B.yaml
kubeclt apply -f pvc-B.yaml
kubeclt apply -f pod-B.yaml

NodePlugin日志

截取了部分日志看,从 NodePlugin 的日志看,B 集群的 CephFS CSI 根据备份恢复的 PV/PVC 正常地个 Pod 挂载到了 CephFS 目录。

  1. NodeStageVolume: 通过Ceph的配置,可以找到集群中的subvolume是llms8b5b0415-021f-48d7-951d-ca2dc5315997,并且给Pod调度到的节点创建global mountpoint
  2. NodeGetCapabilities: 查看卷容量
  3. NodePublishVolume: 给Pod创建bind mount目录
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# kubeclt logs B-ai-001-cephcsi-ceph-csi-cephfs-nodeplugin-6kb2g -c csi-cephfsplugin
I0730 06:13:21.732418 2699608 utils.go:212] ID: 3 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0730 06:13:21.735869 2699608 utils.go:195] ID: 4 Req-ID: xxx GRPC call: /csi.v1.Node/NodeStageVolume
I0730 06:13:21.736037 2699608 utils.go:206] ID: 4 Req-ID: xxx GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-c13aa21a-ad98-4fd7-a766-58edbf8ec698/globalmount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"a1d0e45f-0e03-452f-87f3-1704a4555220","fsName":"k8s_fs","fuseMountOptions":"debug","pool":"k8s-datapool","storage.kubernetes.io/csiProvisionerIdentity":"1700547718856-8081-cephfs.csi.ceph.com","subvolumeName":"llms8b5b0415-021f-48d7-951d-ca2dc5315997","subvolumePath":"/volumes/csi/llms8b5b0415-021f-48d7-951d-ca2dc5315997/1a1efd42-3a7f-46c8-8aef-f6d199276fd6","volumeNamePrefix":"llms"},"volume_id":"xxx"}
I0730 06:13:21.790324 2699608 omap.go:88] ID: 4 Req-ID: xxx got omap values: (pool="k8s-metadata", namespace="csi", name="csi.volume.8b5b0415-021f-48d7-951d-ca2dc5315997"): map[csi.imagename:llms8b5b0415-021f-48d7-951d-ca2dc5315997 csi.volname:pvc-c13aa21a-ad98-4fd7-a766-58edbf8ec698 csi.volume.owner:jupyterhub]
I0730 06:13:22.367958 2699608 volumemounter.go:126] requested mounter: , chosen mounter: kernel
I0730 06:13:22.368001 2699608 nodeserver.go:293] ID: 4 Req-ID: xxx cephfs: mounting volume xxx with Ceph kernel client
I0730 06:13:22.380488 2699608 cephcmds.go:105] ID: 4 Req-ID: xxx command succeeded: modprobe [ceph]
I0730 06:13:22.439712 2699608 cephcmds.go:105] ID: 4 Req-ID: xxx command succeeded: mount [-t ceph 10.214.166.21:6789,10.214.228.25:6789,10.214.230.21:6789:/volumes/csi/llms8b5b0415-021f-48d7-951d-ca2dc5315997/1a1efd42-3a7f-46c8-8aef-f6d199276fd6 /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-c13aa21a-ad98-4fd7-a766-58edbf8ec698/globalmount -o name=admin,secretfile=/tmp/csi/keys/keyfile-543738981,mds_namespace=k8s_fs,_netdev]
I0730 06:13:22.439745 2699608 nodeserver.go:248] ID: 4 Req-ID: xxx cephfs: successfully mounted volume xxx to /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-c13aa21a-ad98-4fd7-a766-58edbf8ec698/globalmount
I0730 06:13:22.439792 2699608 utils.go:212] ID: 4 Req-ID: xxx GRPC response: {}
I0730 06:13:22.442691 2699608 utils.go:195] ID: 5 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0730 06:13:22.442702 2699608 utils.go:206] ID: 5 GRPC request: {}
I0730 06:13:22.442764 2699608 utils.go:212] ID: 5 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I0730 06:13:22.443339 2699608 utils.go:195] ID: 6 Req-ID: xxx GRPC call: /csi.v1.Node/NodePublishVolume
I0730 06:13:22.443465 2699608 utils.go:206] ID: 6 Req-ID: xxx GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-c13aa21a-ad98-4fd7-a766-58edbf8ec698/globalmount","target_path":"/var/lib/kubelet/pods/279aeeaf-29ce-4c97-a696-4de6df8f4394/volumes/kubernetes.io~csi/pvc-c13aa21a-ad98-4fd7-a766-58edbf8ec698/mount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"a1d0e45f-0e03-452f-87f3-1704a4555220","fsName":"k8s_fs","fuseMountOptions":"debug","pool":"k8s-datapool","storage.kubernetes.io/csiProvisionerIdentity":"1700547718856-8081-cephfs.csi.ceph.com","subvolumeName":"llms8b5b0415-021f-48d7-951d-ca2dc5315997","subvolumePath":"/volumes/csi/llms8b5b0415-021f-48d7-951d-ca2dc5315997/1a1efd42-3a7f-46c8-8aef-f6d199276fd6","volumeNamePrefix":"llms"},"volume_id":"xxx"}
I0730 06:13:22.445108 2699608 cephcmds.go:105] ID: 6 Req-ID: xxx command succeeded: mount [-o bind,_netdev /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-c13aa21a-ad98-4fd7-a766-58edbf8ec698/globalmount /var/lib/kubelet/pods/279aeeaf-29ce-4c97-a696-4de6df8f4394/volumes/kubernetes.io~csi/pvc-c13aa21a-ad98-4fd7-a766-58edbf8ec698/mount]
I0730 06:13:22.445126 2699608 nodeserver.go:523] ID: 6 Req-ID: xxx cephfs: successfully bind-mounted volume xxx to /var/lib/kubelet/pods/279aeeaf-29ce-4c97-a696-4de6df8f4394/volumes/kubernetes.io~csi/pvc-c13aa21a-ad98-4fd7-a766-58edbf8ec698/mount

总结

从备份恢复的角度,动态卷确实也可以跨集群备份恢复,前提是跟 CephFS CSI 相关的 ConfigMap,Secret,还有 PV/PVC 本身,还有挂载的工作负载的 YAML 都需要全部备份好。