目录

Spark-with-Kerberized-HDFS

1 Overview

调研基于 Spark 2.2 on K8S,访问 Kerberized HDFS 的方法。其实 Spark 2.3/2.4 的方案应该差别不大。

2 Practice

2.1 Prerequisite

  1. Kerberized HDFS: 此处参考 HDFS kerberos 客户端使用,关键是 hdfs.keytab 和 krb5.conf
  2. Spark Driver/Executor/Init/Base 镜像
  3. Installed & Runnig Kubernetes Cluster

2.2 Build & Push Images

为了清空实验环境,用的是官网的源码,打包并且构建镜像,最后 Push 到 hob.oa.com。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
git clone https://github.com/apache-spark-on-k8s/spark.git
cd spark
git checkout branch-2.2-kubernetes
build/mvn install -Pkubernetes -pl resource-managers/kubernetes/core -am -DskipTests
build/mvn compile -Pkubernetes -pl resource-managers/kubernetes/core -am -DskipTests
dev/make-distribution.sh --tgz -Phadoop-2.7 -Pkubernetes
tar -xvf spark-2.2.0-k8s-0.5.0-bin-2.7.3.tgz
cd spark-2.2.0-k8s-0.5.0-bin-2.7.3

## 构建镜像需要注意更改 spark-base 的 FROM 为内网能拉的 linux 版本(但似乎有些软件例如 apt 是无法使用的)
## 构建 driver 和 executor 需要注意 JAVA_HOME 的问题
docker build -t hub.oa.com/runzhliu/spark-base-runzhliu:<tag> -f dockerfiles/spark-base/Dockerfile .
docker build -t hub.oa.com/runzhliu/spark-driver-runzhliu:<tag> -f dockerfiles/driver/Dockerfile .
docker build -t hub.oa.com/runzhliu/spark-executor-runzhliu: -f dockerfiles/executor/Dockerfile .
docker build -t hub.oa.com/runzhliu/spark-init-runzhliu:<tag> -f dockerfiles/init-container/Dockerfile .

## 可能需要 docker login
docker push hub.oa.com/runzhliu/spark-base-runzhliu:<tag>
docker push hub.oa.com/runzhliu/spark-driver-runzhliu:<tag>
docker push hub.oa.com/runzhliu/spark-executor-runzhliu:<tag>
docker push hub.oa.com/runzhliu/spark-init-runzhliu:<tag>

## 这个是测试的容器镜像,为了在集群中提交 spark-submit 脚本
docker push hub.oa.com/runzhliu/kerberos-test-runzhliu:<tag>

2.3 Experiment Environment

设想是构建一个 Running 状态的 Pod,并且进入容器中测试 spark-submit 访问 HDFS。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## 实验镜像
FROM hub.oa.com/runzhliu/spark-base-runzhliu:<tag>

COPY core-site.xml /opt/spark/hconf/
COPY spark-examples_2.11-2.2.0-k8s-0.5.0.jar /opt/spark/jars
COPY yarn-site.xml /opt/spark/hconf/
COPY hdfs-site.xml /opt/spark/hconf/
COPY ssl-server.xml /opt/spark/hconf/
COPY krb5.conf /etc/krb5.conf
COPY test-env.sh .
CMD sleep 3600
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## 实验容器
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kerberos-test
  namespace: runzhliu
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: kerberos-test
    spec:
      containers:
      - name: kerberos-test
        image: hub.oa.com/runzhliu/kerberos-test-runzhliu:<tag>
        imagePullPolicy: IfNotPresent

创建 Pod,结果如下:

1
2
3
# kubectl -n runzhliu get pods
NAME                                       READY     STATUS                  RESTARTS   AGE
kerberos-test-5c4458f757-rsjrl             1/1       Running                 0          30m

然后进入容器,初始化 keytab,然后 klist 看看状态:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# kubectl -n runzhliu exec -it kerberos-test-5c4458f757-rsjrl -- /bin/bash
[root@kerberos-test-5c4458f757-rsjrl /opt/spark/work-dir]#cd ..
[root@kerberos-test-5c4458f757-rsjrl /opt/spark]#ls
app_python_deps  bin  conf  hconf  jars  RELEASE  sbin	work-dir
[root@kerberos-test-5c4458f757-rsjrl /opt/spark]### 初始化 keytab,此时 TGT 应该就缓存了,24小时过期,需要 renew?
[root@kerberos-test-5c4458f757-rsjrl /opt/spark]#kinit -kt /etc/keytab/hdfs.keytab hdfs/cdh1@IEGBACKUP.COM
[root@kerberos-test-5c4458f757-rsjrl /opt/spark]#klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: hdfs/cdh1@IEGBACKUP.COM

Valid starting     Expires            Service principal
06/04/19 18:43:45  06/05/19 18:43:45  krbtgt/IEGBACKUP.COM@IEGBACKUP.COM
	renew until 06/11/19 18:43:45

然后运行脚本,运行类 HdfsTest,并且指定 Kerberized HDFS 地址:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
## 脚本
## ???
export HADOOP_JAAS_DEBUG=true
## 关于 HADOOP 的日志级别
export HADOOP_ROOT_LOGGER=DEBUG,console
## 指定 HADOOP 配置文件
export HADOOP_CONF_DIR=/opt/spark/hconf

/opt/spark/bin/spark-submit \
      --deploy-mode cluster \
      --class org.apache.spark.examples.HdfsTest \
      --master=k8s://https://kubernetes.default.svc \
      --conf spark.executor.instances=1 \
      --conf spark.app.name=spark-hdfs \
      --conf spark.kubernetes.namespace=runzhliu \
      --conf spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf \
      --conf spark.kubernetes.driver.docker.image=hub.oa.com/runzhliu/spark-driver-runzhliu:0.0.2 \
      --conf spark.kubernetes.executor.docker.image=hub.oa.com/runzhliu/spark-executor-runzhliu:0.0.2 \
      --conf spark.kubernetes.initcontainer.docker.image=hub.oa.com/runzhliu/spark-init-runzhliu:0.0.1 \
      --conf spark.kubernetes.kerberos.enabled=true \
      ## 再交互的时候用的是 Delegating Token
      --conf spark.kubernetes.kerberos.keytab=/etc/keytab/hdfs.keytab \
      --conf spark.kubernetes.kerberos.principal=hdfs/cdh1@IEGBACKUP.COM \
      --conf=spark.driver.cores=2 \
      --conf=spark.driver.memory=4096M \
      --conf=spark.executor.cores=2 \
      --conf=spark.executor.memory=4096M \
      --conf=spark.eventLog.dir=hdfs://sh-spark.hdfs.cr.hdfs.db:9000/yongyu/history \
      local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
      ## 这是一个需要 Kerberos 认证的 HDFS 地址
      hdfs://sh.hdfs.cr.ied.com:9000/tdw-transfer-data/runzhliu/20180411102548571/20190212/part-00326

Pod 日志,里面有提示认证成功失败的日志。

1
2
3
4
5
6
7
8
9
2019-06-04 18:44:50 INFO HadoopStepsOrchestrator:54 - Hadoop Conf directory: /opt/spark/hconf
2019-06-04 18:44:50 INFO HadoopConfBootstrapImpl:54 - HADOOP_CONF_DIR defined. Mounting Hadoop specific files
## 此处日志显示 kerberos 认证通过了
2019-06-04 18:44:50 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Debug is true storeKey true useTicketCache false useKeyTab true doNotPrompt true ticketCache is null isInitiator true KeyTab is /etc/keytab/hdfs.keytab refreshKrb5Config is true principal is hdfs/cdh1@IEGBACKUP.COM tryFirstPass is false useFirstPass is false storePass is false clearPass is false
Refreshing Kerberos configuration
principal is hdfs/cdh1@IEGBACKUP.COM
Will use keytab
Commit Succeeded

3 Summary

此处为简单的一个测试,验证了 Spark on K8S 访问 Kerberized HDFS 的可行性,后面需要结合 TenC 特点,定制化 Driver 和 Executor,关键在于 Keytab 和对于长任务 Token 过期的可能性。

4 Test

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
dockerfiles docker save spark-driver:latest -o spark-driver.tgz
dockerfiles docker save spark-executor:latest -o spark-executor.tgz
dockerfiles docker save spark-init:latest -o spark-init.tgz

docker load -i spark-driver.tgz
docker load -i spark-executor.tgz
docker load -i spark-init.tgz

docker tag spark-driver:latest hub.oa.com/runzhliu/spark-driver-runzhliu:1.0.1
docker tag spark-executor:latest hub.oa.com/runzhliu/spark-executor-runzhliu:1.0.1
docker tag spark-init:latest hub.oa.com/runzhliu/spark-init-runzhliu:1.0.1

docker push hub.oa.com/runzhliu/spark-init-runzhliu:1.0.1
docker push hub.oa.com/runzhliu/spark-driver-runzhliu:1.0.1
docker push hub.oa.com/runzhliu/spark-executor-runzhliu:1.0.1
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
echo 100.96.166.119  cdh1 >> /etc/hosts
echo 100.96.154.84   cdh2 >> /etc/hosts
echo 100.109.210.219 cdh3 >> /etc/hosts
echo 100.96.168.197  cdh4 >> /etc/hosts
echo 100.96.150.107  cdh5 >> /etc/hosts
echo 100.96.154.125  cdh6 >> /etc/hosts
echo 100.96.166.54   kdc  >> /etc/hosts

sed -i -e 's/#//' -e 's/default_ccache_name/## default_ccache_name/' /etc/krb5.conf
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true"
export HADOOP_JAAS_DEBUG=true
export HADOOP_ROOT_LOGGER=DEBUG,console
export HADOOP_CONF_DIR=/opt/spark/hconf
mkdir -p /etc/krb5.conf.d
mkdir /var/keytabs
cp hdfs.keytab /var/keytabs/
until /usr/bin/kinit -kt /var/keytabs/hdfs.keytab hdfs/cdh1@IEGBACKUP.COM; do sleep 15; done

/opt/spark/bin/spark-submit \
      --deploy-mode cluster \
      --class org.apache.spark.examples.HdfsTest \
      --master=k8s://https://kubernetes.default.svc \
      --conf spark.kubernetes.namespace=runzhliu \
      --conf spark.executor.instances=1 \
      --conf spark.app.name=spark-hdfs \
      --conf spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf \
      --conf spark.kubernetes.driver.docker.image=hub.oa.com/runzhliu/spark-driver-runzhliu:1.0.0 \
      --conf spark.kubernetes.executor.docker.image=hub.oa.com/runzhliu/spark-executor-runzhliu:1.0.0 \
      --conf spark.kubernetes.initcontainer.docker.image=hub.oa.com/runzhliu/spark-init-runzhliu:1.0.0 \
      --conf spark.kubernetes.kerberos.enabled=true \
      --conf spark.kubernetes.kerberos.keytab=hdfs.keytab \
      --conf spark.kubernetes.kerberos.principal=hdfs/cdh1@IEGBACKUP.COM \
      --conf spark.kubernetes.driver.extraJavaOptions=-Dlog4j.configuration=file:///opt/spark/conf/log4j.properties \
      --conf spark.kubernetes.executor.extraJavaOptions=-Dlog4j.configuration=file:///opt/spark/conf/log4j.properties \
      local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
      hdfs://sh-test.kerberos.hdfs.db:9000/
      
## 1.0.1
/opt/spark/bin/spark-submit \
      --deploy-mode cluster \
      --class org.apache.spark.examples.HdfsTest \
      --master=k8s://https://kubernetes.default.svc \
      --conf spark.kubernetes.namespace=runzhliu \
      --conf spark.executor.instances=1 \
      --conf spark.app.name=spark-hdfs \
      --conf spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf \
      --conf spark.kubernetes.driver.docker.image=hub.oa.com/runzhliu/spark-driver-runzhliu:1.0.1 \
      --conf spark.kubernetes.executor.docker.image=hub.oa.com/runzhliu/spark-executor-runzhliu:1.0.1 \
      --conf spark.kubernetes.initcontainer.docker.image=hub.oa.com/runzhliu/spark-init-runzhliu:1.0.1 \
      --conf spark.kubernetes.kerberos.enabled=true \
      --conf spark.kubernetes.kerberos.keytab=hdfs.keytab \
      --conf spark.kubernetes.kerberos.principal=hdfs/cdh1@IEGBACKUP.COM \
      local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
      hdfs://sh-test.kerberos.hdfs.db:9000/
      
## 0.0.2
/opt/spark/bin/spark-submit \
      --deploy-mode cluster \
      --class org.apache.spark.examples.HdfsTest \
      --master=k8s://https://kubernetes.default.svc \
      --conf spark.kubernetes.namespace=runzhliu \
      --conf spark.executor.instances=1 \
      --conf spark.app.name=spark-hdfs \
      --conf spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf \
      --conf spark.kubernetes.driver.docker.image=hub.oa.com/runzhliu/spark-driver-runzhliu:0.0.2 \
      --conf spark.kubernetes.executor.docker.image=hub.oa.com/runzhliu/spark-executor-runzhliu:0.0.2 \
      --conf spark.kubernetes.initcontainer.docker.image=hub.oa.com/runzhliu/spark-init-runzhliu:0.0.1 \
      --conf spark.kubernetes.kerberos.enabled=true \
      --conf spark.kubernetes.kerberos.keytab=hdfs.keytab \
      --conf spark.kubernetes.kerberos.principal=hdfs/cdh1@IEGBACKUP.COM \
      local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
      hdfs://sh-test.kerberos.hdfs.db:9000/
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
/opt/spark/bin/spark-submit \
      --deploy-mode cluster \
      --class org.apache.spark.examples.HdfsTest \
      --master=k8s://https://kubernetes.default.svc \
      --kubernetes-namespace ${NAMESPACE} \
      --conf spark.executor.instances=1 \
      --conf spark.app.name=spark-hdfs \
      --conf spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf \
      --conf spark.kubernetes.driver.docker.image=spark-driver:latest \
      --conf spark.kubernetes.executor.docker.image=spark-executor:latest \
      --conf spark.kubernetes.initcontainer.docker.image=spark-init:latest \
      --conf spark.kubernetes.kerberos.enabled=true \
      --conf spark.kubernetes.kerberos.keytab=/var/keytabs/hdfs.keytab \
      --conf spark.kubernetes.kerberos.principal=hdfs/nn.${NAMESPACE}.svc.cluster.local@CLUSTER.LOCAL \
      local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
      hdfs://nn.${NAMESPACE}.svc.cluster.local:9000/user/ifilonenko/people.txt

参考资料

  1. http://tapd.oa.com/HbaseLog/markdown_wikis/view/#1010146191008536081
  2. https://github.com/ifilonenko/hadoop-kerberos-helm
  3. https://github.com/ifilonenko/secure-hdfs-test
  4. https://github.com/apache-spark-on-k8s/kubernetes-HDFS/issues/23
  5. https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg/edit#heading=h.verdza2f4fyd
警告
本文最后更新于 2017年2月1日,文中内容可能已过时,请谨慎参考。