1 Overview
调研基于 Spark 2.2 on K8S,访问 Kerberized HDFS 的方法。其实 Spark 2.3/2.4 的方案应该差别不大。
2 Practice
2.1 Prerequisite
- Kerberized HDFS: 此处参考 HDFS kerberos 客户端使用,关键是 hdfs.keytab 和 krb5.conf
- Spark Driver/Executor/Init/Base 镜像
- Installed & Runnig Kubernetes Cluster
2.2 Build & Push Images
为了清空实验环境,用的是官网的源码,打包并且构建镜像,最后 Push 到 hob.oa.com。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
git clone https://github.com/apache-spark-on-k8s/spark.git
cd spark
git checkout branch-2.2-kubernetes
build/mvn install -Pkubernetes -pl resource-managers/kubernetes/core -am -DskipTests
build/mvn compile -Pkubernetes -pl resource-managers/kubernetes/core -am -DskipTests
dev/make-distribution.sh --tgz -Phadoop-2.7 -Pkubernetes
tar -xvf spark-2.2.0-k8s-0.5.0-bin-2.7.3.tgz
cd spark-2.2.0-k8s-0.5.0-bin-2.7.3
## 构建镜像需要注意更改 spark-base 的 FROM 为内网能拉的 linux 版本(但似乎有些软件例如 apt 是无法使用的)
## 构建 driver 和 executor 需要注意 JAVA_HOME 的问题
docker build -t hub.oa.com/runzhliu/spark-base-runzhliu:<tag> -f dockerfiles/spark-base/Dockerfile .
docker build -t hub.oa.com/runzhliu/spark-driver-runzhliu:<tag> -f dockerfiles/driver/Dockerfile .
docker build -t hub.oa.com/runzhliu/spark-executor-runzhliu: -f dockerfiles/executor/Dockerfile .
docker build -t hub.oa.com/runzhliu/spark-init-runzhliu:<tag> -f dockerfiles/init-container/Dockerfile .
## 可能需要 docker login
docker push hub.oa.com/runzhliu/spark-base-runzhliu:<tag>
docker push hub.oa.com/runzhliu/spark-driver-runzhliu:<tag>
docker push hub.oa.com/runzhliu/spark-executor-runzhliu:<tag>
docker push hub.oa.com/runzhliu/spark-init-runzhliu:<tag>
## 这个是测试的容器镜像,为了在集群中提交 spark-submit 脚本
docker push hub.oa.com/runzhliu/kerberos-test-runzhliu:<tag>
|
2.3 Experiment Environment
设想是构建一个 Running 状态的 Pod,并且进入容器中测试 spark-submit 访问 HDFS。
1
2
3
4
5
6
7
8
9
10
11
|
## 实验镜像
FROM hub.oa.com/runzhliu/spark-base-runzhliu:<tag>
COPY core-site.xml /opt/spark/hconf/
COPY spark-examples_2.11-2.2.0-k8s-0.5.0.jar /opt/spark/jars
COPY yarn-site.xml /opt/spark/hconf/
COPY hdfs-site.xml /opt/spark/hconf/
COPY ssl-server.xml /opt/spark/hconf/
COPY krb5.conf /etc/krb5.conf
COPY test-env.sh .
CMD sleep 3600
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
## 实验容器
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kerberos-test
namespace: runzhliu
spec:
replicas: 1
template:
metadata:
labels:
name: kerberos-test
spec:
containers:
- name: kerberos-test
image: hub.oa.com/runzhliu/kerberos-test-runzhliu:<tag>
imagePullPolicy: IfNotPresent
|
创建 Pod,结果如下:
1
2
3
|
# kubectl -n runzhliu get pods
NAME READY STATUS RESTARTS AGE
kerberos-test-5c4458f757-rsjrl 1/1 Running 0 30m
|
然后进入容器,初始化 keytab,然后 klist 看看状态:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
# kubectl -n runzhliu exec -it kerberos-test-5c4458f757-rsjrl -- /bin/bash
[root@kerberos-test-5c4458f757-rsjrl /opt/spark/work-dir]#cd ..
[root@kerberos-test-5c4458f757-rsjrl /opt/spark]#ls
app_python_deps bin conf hconf jars RELEASE sbin work-dir
[root@kerberos-test-5c4458f757-rsjrl /opt/spark]### 初始化 keytab,此时 TGT 应该就缓存了,24小时过期,需要 renew?
[root@kerberos-test-5c4458f757-rsjrl /opt/spark]#kinit -kt /etc/keytab/hdfs.keytab hdfs/cdh1@IEGBACKUP.COM
[root@kerberos-test-5c4458f757-rsjrl /opt/spark]#klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: hdfs/cdh1@IEGBACKUP.COM
Valid starting Expires Service principal
06/04/19 18:43:45 06/05/19 18:43:45 krbtgt/IEGBACKUP.COM@IEGBACKUP.COM
renew until 06/11/19 18:43:45
|
然后运行脚本,运行类 HdfsTest,并且指定 Kerberized HDFS 地址:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
## 脚本
## ???
export HADOOP_JAAS_DEBUG=true
## 关于 HADOOP 的日志级别
export HADOOP_ROOT_LOGGER=DEBUG,console
## 指定 HADOOP 配置文件
export HADOOP_CONF_DIR=/opt/spark/hconf
/opt/spark/bin/spark-submit \
--deploy-mode cluster \
--class org.apache.spark.examples.HdfsTest \
--master=k8s://https://kubernetes.default.svc \
--conf spark.executor.instances=1 \
--conf spark.app.name=spark-hdfs \
--conf spark.kubernetes.namespace=runzhliu \
--conf spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf \
--conf spark.kubernetes.driver.docker.image=hub.oa.com/runzhliu/spark-driver-runzhliu:0.0.2 \
--conf spark.kubernetes.executor.docker.image=hub.oa.com/runzhliu/spark-executor-runzhliu:0.0.2 \
--conf spark.kubernetes.initcontainer.docker.image=hub.oa.com/runzhliu/spark-init-runzhliu:0.0.1 \
--conf spark.kubernetes.kerberos.enabled=true \
## 再交互的时候用的是 Delegating Token
--conf spark.kubernetes.kerberos.keytab=/etc/keytab/hdfs.keytab \
--conf spark.kubernetes.kerberos.principal=hdfs/cdh1@IEGBACKUP.COM \
--conf=spark.driver.cores=2 \
--conf=spark.driver.memory=4096M \
--conf=spark.executor.cores=2 \
--conf=spark.executor.memory=4096M \
--conf=spark.eventLog.dir=hdfs://sh-spark.hdfs.cr.hdfs.db:9000/yongyu/history \
local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
## 这是一个需要 Kerberos 认证的 HDFS 地址
hdfs://sh.hdfs.cr.ied.com:9000/tdw-transfer-data/runzhliu/20180411102548571/20190212/part-00326
|
Pod 日志,里面有提示认证成功失败的日志。
1
2
3
4
5
6
7
8
9
|
2019-06-04 18:44:50 INFO HadoopStepsOrchestrator:54 - Hadoop Conf directory: /opt/spark/hconf
2019-06-04 18:44:50 INFO HadoopConfBootstrapImpl:54 - HADOOP_CONF_DIR defined. Mounting Hadoop specific files
## 此处日志显示 kerberos 认证通过了
2019-06-04 18:44:50 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Debug is true storeKey true useTicketCache false useKeyTab true doNotPrompt true ticketCache is null isInitiator true KeyTab is /etc/keytab/hdfs.keytab refreshKrb5Config is true principal is hdfs/cdh1@IEGBACKUP.COM tryFirstPass is false useFirstPass is false storePass is false clearPass is false
Refreshing Kerberos configuration
principal is hdfs/cdh1@IEGBACKUP.COM
Will use keytab
Commit Succeeded
|
3 Summary
此处为简单的一个测试,验证了 Spark on K8S 访问 Kerberized HDFS 的可行性,后面需要结合 TenC 特点,定制化 Driver 和 Executor,关键在于 Keytab 和对于长任务 Token 过期的可能性。
4 Test
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
dockerfiles docker save spark-driver:latest -o spark-driver.tgz
dockerfiles docker save spark-executor:latest -o spark-executor.tgz
dockerfiles docker save spark-init:latest -o spark-init.tgz
docker load -i spark-driver.tgz
docker load -i spark-executor.tgz
docker load -i spark-init.tgz
docker tag spark-driver:latest hub.oa.com/runzhliu/spark-driver-runzhliu:1.0.1
docker tag spark-executor:latest hub.oa.com/runzhliu/spark-executor-runzhliu:1.0.1
docker tag spark-init:latest hub.oa.com/runzhliu/spark-init-runzhliu:1.0.1
docker push hub.oa.com/runzhliu/spark-init-runzhliu:1.0.1
docker push hub.oa.com/runzhliu/spark-driver-runzhliu:1.0.1
docker push hub.oa.com/runzhliu/spark-executor-runzhliu:1.0.1
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
|
echo 100.96.166.119 cdh1 >> /etc/hosts
echo 100.96.154.84 cdh2 >> /etc/hosts
echo 100.109.210.219 cdh3 >> /etc/hosts
echo 100.96.168.197 cdh4 >> /etc/hosts
echo 100.96.150.107 cdh5 >> /etc/hosts
echo 100.96.154.125 cdh6 >> /etc/hosts
echo 100.96.166.54 kdc >> /etc/hosts
sed -i -e 's/#//' -e 's/default_ccache_name/## default_ccache_name/' /etc/krb5.conf
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true"
export HADOOP_JAAS_DEBUG=true
export HADOOP_ROOT_LOGGER=DEBUG,console
export HADOOP_CONF_DIR=/opt/spark/hconf
mkdir -p /etc/krb5.conf.d
mkdir /var/keytabs
cp hdfs.keytab /var/keytabs/
until /usr/bin/kinit -kt /var/keytabs/hdfs.keytab hdfs/cdh1@IEGBACKUP.COM; do sleep 15; done
/opt/spark/bin/spark-submit \
--deploy-mode cluster \
--class org.apache.spark.examples.HdfsTest \
--master=k8s://https://kubernetes.default.svc \
--conf spark.kubernetes.namespace=runzhliu \
--conf spark.executor.instances=1 \
--conf spark.app.name=spark-hdfs \
--conf spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf \
--conf spark.kubernetes.driver.docker.image=hub.oa.com/runzhliu/spark-driver-runzhliu:1.0.0 \
--conf spark.kubernetes.executor.docker.image=hub.oa.com/runzhliu/spark-executor-runzhliu:1.0.0 \
--conf spark.kubernetes.initcontainer.docker.image=hub.oa.com/runzhliu/spark-init-runzhliu:1.0.0 \
--conf spark.kubernetes.kerberos.enabled=true \
--conf spark.kubernetes.kerberos.keytab=hdfs.keytab \
--conf spark.kubernetes.kerberos.principal=hdfs/cdh1@IEGBACKUP.COM \
--conf spark.kubernetes.driver.extraJavaOptions=-Dlog4j.configuration=file:///opt/spark/conf/log4j.properties \
--conf spark.kubernetes.executor.extraJavaOptions=-Dlog4j.configuration=file:///opt/spark/conf/log4j.properties \
local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
hdfs://sh-test.kerberos.hdfs.db:9000/
## 1.0.1
/opt/spark/bin/spark-submit \
--deploy-mode cluster \
--class org.apache.spark.examples.HdfsTest \
--master=k8s://https://kubernetes.default.svc \
--conf spark.kubernetes.namespace=runzhliu \
--conf spark.executor.instances=1 \
--conf spark.app.name=spark-hdfs \
--conf spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf \
--conf spark.kubernetes.driver.docker.image=hub.oa.com/runzhliu/spark-driver-runzhliu:1.0.1 \
--conf spark.kubernetes.executor.docker.image=hub.oa.com/runzhliu/spark-executor-runzhliu:1.0.1 \
--conf spark.kubernetes.initcontainer.docker.image=hub.oa.com/runzhliu/spark-init-runzhliu:1.0.1 \
--conf spark.kubernetes.kerberos.enabled=true \
--conf spark.kubernetes.kerberos.keytab=hdfs.keytab \
--conf spark.kubernetes.kerberos.principal=hdfs/cdh1@IEGBACKUP.COM \
local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
hdfs://sh-test.kerberos.hdfs.db:9000/
## 0.0.2
/opt/spark/bin/spark-submit \
--deploy-mode cluster \
--class org.apache.spark.examples.HdfsTest \
--master=k8s://https://kubernetes.default.svc \
--conf spark.kubernetes.namespace=runzhliu \
--conf spark.executor.instances=1 \
--conf spark.app.name=spark-hdfs \
--conf spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf \
--conf spark.kubernetes.driver.docker.image=hub.oa.com/runzhliu/spark-driver-runzhliu:0.0.2 \
--conf spark.kubernetes.executor.docker.image=hub.oa.com/runzhliu/spark-executor-runzhliu:0.0.2 \
--conf spark.kubernetes.initcontainer.docker.image=hub.oa.com/runzhliu/spark-init-runzhliu:0.0.1 \
--conf spark.kubernetes.kerberos.enabled=true \
--conf spark.kubernetes.kerberos.keytab=hdfs.keytab \
--conf spark.kubernetes.kerberos.principal=hdfs/cdh1@IEGBACKUP.COM \
local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
hdfs://sh-test.kerberos.hdfs.db:9000/
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
/opt/spark/bin/spark-submit \
--deploy-mode cluster \
--class org.apache.spark.examples.HdfsTest \
--master=k8s://https://kubernetes.default.svc \
--kubernetes-namespace ${NAMESPACE} \
--conf spark.executor.instances=1 \
--conf spark.app.name=spark-hdfs \
--conf spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf \
--conf spark.kubernetes.driver.docker.image=spark-driver:latest \
--conf spark.kubernetes.executor.docker.image=spark-executor:latest \
--conf spark.kubernetes.initcontainer.docker.image=spark-init:latest \
--conf spark.kubernetes.kerberos.enabled=true \
--conf spark.kubernetes.kerberos.keytab=/var/keytabs/hdfs.keytab \
--conf spark.kubernetes.kerberos.principal=hdfs/nn.${NAMESPACE}.svc.cluster.local@CLUSTER.LOCAL \
local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
hdfs://nn.${NAMESPACE}.svc.cluster.local:9000/user/ifilonenko/people.txt
|
参考资料
- http://tapd.oa.com/HbaseLog/markdown_wikis/view/#1010146191008536081
- https://github.com/ifilonenko/hadoop-kerberos-helm
- https://github.com/ifilonenko/secure-hdfs-test
- https://github.com/apache-spark-on-k8s/kubernetes-HDFS/issues/23
- https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg/edit#heading=h.verdza2f4fyd
警告
本文最后更新于 2017年2月1日,文中内容可能已过时,请谨慎参考。