目录

GPU服务器的管理

概述

Docker配置

1
2
3
4
5
6
7
8
# cat .docker/config.json
{
 "auths": {
  "https://hub.oa.com": {
   "auth": "Q3JzREhBZG1pbjppOUN5MlQ0b3Q3R3ZCcTE="
  }
 }
}
1
2
3
4
5
6
7
8
# docker info
Containers: 5
Images: 216
Storage Driver: overlay
 Backing Filesystem: xfs
Execution Driver: native-0.2
Kernel Version: 3.10.104-1-tlinux2-0041.tl1
Operating System: <unknown>
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
docker version
Client version: 1.3.14
Client API version: 1.15
Go version (client): go1.3.3
Git commit (client): cc33521-dirty
OS/Arch (client): linux/amd64
Server version: 1.3.14
Server API version: 1.15
Go version (server): go1.3.3
Git commit (server): cc33521-dirty
1
2
3
4
5
6
7
8
9
# kubectl get node --show-labels=true
NAME              STATUS                        ROLES     AGE       VERSION                              LABELS
10.140.41.18      Ready                         <none>    1y        v1.8.12-56+dcb696974a7b1f            beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=10.140.41.18,tencent.cr/deviceClass=M1,tencent.cr/equipment=16062,tencent.cr/nic-speed=1Gbps,tencent.cr/role=normal,tencent.cr/sriov=0,tencent.cr/szoneID=24846
10.140.41.19      Ready                         <none>    1y        v1.8.12-56+dcb696974a7b1f            beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=10.140.41.19,tencent.cr/deviceClass=M1,tencent.cr/equipment=16062,tencent.cr/nic-speed=1Gbps,tencent.cr/role=normal,tencent.cr/sriov=0,tencent.cr/szoneID=24846
10.140.41.49      Ready                         <none>    1y        v1.8.12-56+dcb696974a7b1f            beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=10.140.41.49,tencent.cr/deviceClass=M1,tencent.cr/equipment=16062,tencent.cr/nic-speed=1Gbps,tencent.cr/role=normal,tencent.cr/sriov=0,tencent.cr/szoneID=24846
10.165.10.100     Ready                         <none>    1y        v1.8.12-56+dcb696974a7b1f            beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=10.165.10.100,tencent.cr/deviceClass=M10C,tencent.cr/equipment=152731,tencent.cr/nic-speed=10Gbps,tencent.cr/role=normal,tencent.cr/sriov=0,tencent.cr/szoneID=113293
10.165.10.106     Ready                         <none>    1y        v1.8.12-56+dcb696974a7b1f            beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=10.165.10.106,tencent.cr/deviceClass=M10C,tencent.cr/equipment=152731,tencent.cr/nic-speed=10Gbps,tencent.cr/role=normal,tencent.cr/sriov=0,tencent.cr/szoneID=113293
10.165.10.108     Ready                         <none>    1y        v1.8.12-56+dcb696974a7b1f            beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=10.165.10.108,tencent.cr/deviceClass=M10C,tencent.cr/equipment=152731,tencent.cr/nic-speed=10Gbps,tencent.cr/role=normal,tencent.cr/sriov=0,tencent.cr/szoneID=113293
10.165.10.110     Ready                         <none>    74d       v1.8.12-56+dcb696974a7b1f            beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=10.165.10.110,tencent.cr/deviceClass=M10C,tencent.cr/equipment=152731,tencent.cr/nic-speed=10Gbps,tencent.cr/role=normal,tencent.cr/sriov=0,tencent.cr/szoneID=193

测试镜像和调试命令

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
apiVersion: v1
kind: Pod
metadata:
    name: cuda-vector-add
spec:
    restartPolicy: OnFailure
    containers:
        -name: cuda-vector-add
        image: "nvidia/cuda-vector-add:v0.1"
resources:
    limits:
        nvidia.com/gpu: 1
        memory: "512Mi"
        cpu: "250m"
警告
本文最后更新于 2017年2月1日,文中内容可能已过时,请谨慎参考。