目录

在k8s部署Spark-History-Server-篇2

概述

之前我们组在生产环境上部署的是 Spark 2.2 on k8s 的那个 fork,部署在 K8S 上,至少需要一个 Dockerfile,最近有计划升级到 3.0.0 Snapshot 的分支代码上,借此,做个记录。

History Server => HS

启动

Spark 自2.3.0,之后就提供官方的 Dockerfile 了,可以基于生产环境的需求,自行 build。所以这里调研一下,Dockerfile 能否直接支持运行一个 HS 的进程。贴个 Dockerfile 看看(删除了一些注释)。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
FROM openjdk:8-alpine

ARG spark_uid=185

RUN set -ex && \
    apk upgrade --no-cache && \
    ln -s /lib /lib64 && \
    apk add --no-cache bash tini libc6-compat linux-pam krb5 krb5-libs nss && \
    mkdir -p /opt/spark && \
    mkdir -p /opt/spark/examples && \
    mkdir -p /opt/spark/work-dir && \
    touch /opt/spark/RELEASE && \
    rm /bin/sh && \
    ln -sv /bin/bash /bin/sh && \
    echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
    chgrp root /etc/passwd && chmod ug+rw /etc/passwd

COPY jars /opt/spark/jars
COPY bin /opt/spark/bin
COPY sbin /opt/spark/sbin
COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/
COPY examples /opt/spark/examples
COPY kubernetes/tests /opt/spark/tests
COPY data /opt/spark/data

ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir

ENTRYPOINT [ "/opt/entrypoint.sh" ]

# Specify the User that the actual main process will run as
USER ${spark_uid}

看出来了,不论 Driver 还是 Executor,这个 Dockerfile 来跑什么,取决于最后的脚本 entrypoint.sh。再贴个 entrypoint.sh 的关键代码。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
case "$1" in
  driver)
    shift 1
    CMD=(
      "$SPARK_HOME/bin/spark-submit"
      --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
      --deploy-mode client
      "$@"
    )
    ;;
  executor)
    shift 1
    CMD=(
      ${JAVA_HOME}/bin/java
      "${SPARK_EXECUTOR_JAVA_OPTS[@]}"
      -Xms$SPARK_EXECUTOR_MEMORY
      -Xmx$SPARK_EXECUTOR_MEMORY
      -cp "$SPARK_CLASSPATH"
      org.apache.spark.executor.CoarseGrainedExecutorBackend
      --driver-url $SPARK_DRIVER_URL
      --executor-id $SPARK_EXECUTOR_ID
      --cores $SPARK_EXECUTOR_CORES
      --app-id $SPARK_APPLICATION_ID
      --hostname $SPARK_EXECUTOR_POD_IP
    )
    ;;

  *)
    echo "Non-spark-on-k8s command provided, proceeding in pass-through mode..."
    CMD=("$@")
    ;;
esac

注意到了,当运行这个 Dockerfile build 出来的容器的时候,需要输入一些参数,如果输入 driver 则运行的是一个 Driver 进程,如果是 executor 就是一个 Executor 进程。

那么如果想跑 HS 这样的进程服务的时候该怎么办呢?显然最后一个选项就是给兜底的,你可以运行 Spark 官方提供的 start-history-server.sh。所以按照官方 build 完镜像之后可以试试。

1
./bin/docker-image-tool.sh -t v3.0.0 build

然后运行 start-history-server.sh,其实细看这个脚本文件,HS 是用 Daemon 的方式运行的,Docker 是不能直接跑后台进程的(这个说法可能有误,大概可以先这么理解)。而 HS 其实就是运行 org.apache.spark.deploy.history.HistoryServer 这个启动类,所以按照下面这个脚本跑吧。

1
docker run -it spark:v3.0.0 /opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer

然后你就会看到报错了….

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# docker run -it spark:v3.0.0 /opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root❌0:0:root:/root:/bin/ash
+ set -e
+ '[' -z root❌0:0:root:/root:/bin/ash ']'
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' '' == 2 ']'
+ '[' '' == 3 ']'
+ '[' -z ']'
+ case "$1" in
+ echo 'Non-spark-on-k8s command provided, proceeding in pass-through mode...'
Non-spark-on-k8s command provided, proceeding in pass-through mode...
+ CMD=("$@")
+ exec /sbin/tini -s -- /opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/07/09 03:59:22 INFO HistoryServer: Started daemon with process name: 14@df0f7b9fd0cf
19/07/09 03:59:22 INFO SignalUtils: Registered signal handler for TERM
19/07/09 03:59:22 INFO SignalUtils: Registered signal handler for HUP
19/07/09 03:59:22 INFO SignalUtils: Registered signal handler for INT
19/07/09 03:59:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/07/09 03:59:23 INFO SecurityManager: Changing view acls to: root
19/07/09 03:59:23 INFO SecurityManager: Changing modify acls to: root
19/07/09 03:59:23 INFO SecurityManager: Changing view acls groups to:
19/07/09 03:59:23 INFO SecurityManager: Changing modify acls groups to:
19/07/09 03:59:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
19/07/09 03:59:23 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions
Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:278)
	at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: java.io.FileNotFoundException: Log directory specified does not exist: file:/tmp/spark-events Did you configure the correct one through spark.history.fs.logDirectory?
	at org.apache.spark.deploy.history.FsHistoryProvider.startPolling(FsHistoryProvider.scala:259)

这个解决起来容易啊,不就是默认读取的 spark event log 文件夹不存在吗,那就创建一个好了,或者在 Spark 的配置文件里改一下默认的 Event 读取路径就好了,这里不赘述了。

总结

所以说用 Docker 来跑一个 Spark History Server 并不是什么问题,而且基本可以说是开箱即用 的,重点是一些配置,和日志存放的硬盘需要和 Spark App 配合好。

警告
本文最后更新于 2019年10月10日,文中内容可能已过时,请谨慎参考。