Nightingale系列-00-容器部署

runzhliu 发布于 2017-02-01, 更新于 2017-02-01, 收录于 Kubernetes和容器

概述笔者是从 nightingale 5.0 的版本开始关注到这个监控系统的，除了项目中提到的监控以外，笔者的团队主要是用 nightingale 替代 AlertManger 的功能。原因很简单，因为 Prometheus 虽然提供了告警

Number-of-Partitions-for-groupBy-Aggregation

runzhliu 发布于 2017-02-01, 更新于 2017-02-01, 收录于大数据和机器学习

概述本文对原文进行翻译并且重新排版，英文没有问题的同学可以直接看原文。原文地址为 https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-performance-tuning-groupBy-aggregation.html 本次 case study 的目标是在 Spark SQL 使用 groupBy 聚合的合理的 partition 数量的调优。创建

NV-docker测试系列

runzhliu 发布于 2017-02-01, 更新于 2017-02-01, 收录于 Kubernetes和容器

概述 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68

nvidia-docker安装系列

runzhliu 发布于 2017-02-01, 更新于 2017-02-01, 收录于 Kubernetes和容器

概述按照官网文档，可以按照下面的命令进行安装。 1 2 3 4 5 6 7 distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo yum-config-manager --enable libnvidia-container-experimental # 验证 nvidia-docker run -–rm nvidia/cuda nvidia-smi Kubernetes GPU插件安装 https://github.com/NVIDIA/k8s-device-plugin#deployment-via-helm 1 2 3 4

nvidia-smi速度慢

runzhliu 发布于 2017-02-01, 更新于 2017-02-01, 收录于大数据和机器学习

概述同事反馈 GPU 机器的 nvidia-smi 速度很慢，有卡住的情况。分析正常还是通过 strace 查看一下，发现系统调用 open 会卡住一段时间。参考 persistence-mode

NVIDIA-TensorRT-Inference-Server-on-Kubernetes

runzhliu 发布于 2017-02-01, 更新于 2017-02-01, 收录于大数据和机器学习

概述 NVIDIA TensorRT Inference Server 是 NVIDIA 推出的，经过优化的，可以在 NVIDIA GPUs 使用的推理引擎，TensorRT 有下面几个特点。支持多种框架模型，包括TensorFlow GraphDef, TensorFlow