目录

弹性计算平台-Spark-3.0.0

概述

TenC Spark 目前支持的 Spark 2.2.0 on K8S 的 Fork 已经不再维护了,目前社区 3.0.0(SNAPSHOT) 对于 K8S 的支持已经比较稳定了,而且提供了像 Hadoop ConfigMap 和 Secret 的配置和支持。所以 TenC Spark 计算是时候支持 Spark 3.0.0 了。涉及的改造和需要讨论的问题主要包括以下几点。

  1. TenC Spark 底层需要集成 Spark 3.0.0
  2. Spark Operator 的支持,如何和 TenC 集成
  3. Kerberos 的支持

Spark 2.2.0 on K8S => Spark2 Spark 3.0.0 => Spark3

TenC Spark集成Spark3

镜像调整

Spark 从 Spark 2.3.0 开始就统一了镜像,也就是说不会像 Spark2 那样分别提供 Driver/Executor/initContainer 的镜像。容器具体是启动哪个进程,由 entrypoint.sh 决定。

/%E5%BC%B9%E6%80%A7%E8%AE%A1%E7%AE%97%E5%B9%B3%E5%8F%B0-spark-3.0.0/image_1dgc6uma81pvo1l5pfafb3o1v3d9.png /%E5%BC%B9%E6%80%A7%E8%AE%A1%E7%AE%97%E5%B9%B3%E5%8F%B0-spark-3.0.0/image_1dgc705rd1g3v1gvccmb12p01n1616.png /%E5%BC%B9%E6%80%A7%E8%AE%A1%E7%AE%97%E5%B9%B3%E5%8F%B0-spark-3.0.0/image_1dgc721eb1ck61lvk1uj21ovi1aab1j.png

参数调整

Spark 运行需要给 Driver/Executor/Submit 提供大量的参数,Spark2 是一个社区的 Fork 版本,Spark3 在参数上做了比较多的调整,典型的是像下面这样的调整。

1
2
Spark2: kubernetes-namespace
Spark3: spark.kubernetes.namespace

下面是我整理的参数异同的比较,下列参数中,大部分都是 dive-proxy 默认加上的参数。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
--deploy-mode=cluster
--master=k8s://https://kubernetes.default.svc
// spark.kubernetes.namespace
--kubernetes-namespace=runzhliu
// spark.kubernetes.driver.pod.name
--conf=spark.kubernetes.driver.pod.name=spark-3a5893d652d9ba5b-driver-fd4pxcjfk7
// spark.kubernetes.driver.container.image | spark.kubernetes.container.image
--conf=spark.kubernetes.driver.docker.image=hub.oa.com/tenc-ecs/spark-2-2-driver:v2.0
// spark.kubernetes.executor.container.image | spark.kubernetes.container.image
--conf=spark.kubernetes.executor.docker.image=hub.oa.com/tenc-ecs/spark-2-2-executor:v2.0
// 3.0.0 没有了
--conf=spark.kubernetes.initcontainer.docker.image=hub.oa.com/public/spark-2-2-init:v1.0.1
--conf=spark.app.name=spark-3a5893d652d9ba5b
--conf=spark.app.id=22bee3fa-ab86-11e9-b6b9-0a580610006e
// spark.kubernetes.driver.request.cores 优先度高于 spark.driver.cores | spark.kubernetes.driver.limit.cores
--conf=spark.driver.cores=2
// spark.{driver,executor}.memory | spark.kubernetes.memoryOverheadFactor | spark.{driver,executor}.memoryOverhead
--conf=spark.driver.memory=2048M
// spark.kubernetes.executor.request.cores | 优先度高于 spark.executor.cores | spark.kubernetes.executor.limit.cores
--conf=spark.executor.cores=2
// spark.{driver,executor}.memory | spark.kubernetes.memoryOverheadFactor | spark.{driver,executor}.memoryOverhead
--conf=spark.executor.memory=2048M
--conf=spark.executor.instances=2
--conf=spark.eventLog.dir=hdfs://sh-spark.hdfs.cr.hdfs.db:9000/yongyu/history
--conf=spark.kubernetes.driver.label.appid=312
--conf=spark.kubernetes.executor.label.appid=312
--conf=spark.kubernetes.driver.label.spark-app-id=22bee3fa-ab86-11e9-b6b9-0a580610006e
--conf=spark.kubernetes.executor.label.spark-app-id=22bee3fa-ab86-11e9-b6b9-0a580610006e
--conf=spark.kubernetes.driver.label.tencent.cr/taskid=22bee3fa-ab86-11e9-b6b9-0a580610006e
--conf=spark.kubernetes.executor.label.tencent.cr/taskid=22bee3fa-ab86-11e9-b6b9-0a580610006e
--conf=spark.kubernetes.driver.label.platform=tenflow
--conf=spark.kubernetes.driver.annotation.tencent.cr/containertype=native
--conf=spark.kubernetes.driver.annotation.network.cni/networkname=flannel
--conf=spark.kubernetes.driver.annotation.tencent.cr/storageopt=size_softlimit=100G,size=120G,inode_softlimit=2000000,inode=2200000
--conf=spark.kubernetes.driver.annotation.network.cni/ingress-bandwidth=240Mi
--conf=spark.kubernetes.driver.annotation.network.cni/egress-bandwidth=240Mi
--conf=spark.kubernetes.driver.annotation.tencent.cr/bufferwritebps=16Mi
--conf=spark.kubernetes.executor.annotation.tencent.cr/storageopt=size_softlimit=100G,size=120G,inode_softlimit=2000000,inode=2200000
--conf=spark.kubernetes.executor.annotation.network.cni/ingress-bandwidth=240Mi
--conf=spark.kubernetes.executor.annotation.network.cni/egress-bandwidth=240Mi
--conf=spark.kubernetes.executor.annotation.tencent.cr/bufferwritebps=16Mi
--conf=spark.kubernetes.executor.annotation.tencent.cr/containertype=native
--conf=spark.kubernetes.executor.annotation.network.cni/networkname=flannel
--conf=spark.executor.heartbeatInterval=20s
--conf=spark.network.timeout=3600
--conf=spark.core.connection.ack.wait.timeout=3600
--conf=spark.rpc.askTimeout=3600
--conf=spark.rpc.lookupTimeout=3600
--conf=spark.storage.blockManagerSlaveTimeoutMs=36000000
--conf=spark.kubernetes.executorEnv.TENC_FILELOG_PATHS=/var/log/spark-specified.log
--conf=spark.kubernetes.driverEnv.TENC_FILELOG_PATHS=/var/log/spark-specified.log
--conf=spark.kubernetes.submission.waitAppCompletion=true
--conf=spark.ui.showConsoleProgress=false
// 3.0.0 取消了
--conf=spark.kubernetes.client.request.timeout=30000
// 3.0.0 取消了
--conf=spark.kubernetes.client.watch.reconnectLimit=10
--conf=spark.kubernetes.allocation.batch.size=20
--conf=spark.worker.cleanup.enabled=true
--conf=spark.eventLog.enabled=true
--conf=spark.eventLog.compress=true
--class=org.apache.spark.examples.SparkPi
http://gateway.msp.ied.com/fileserver/1563691630360/spark-examples_2.11-2.2.0-k8s-0.5.0.jar
/%E5%BC%B9%E6%80%A7%E8%AE%A1%E7%AE%97%E5%B9%B3%E5%8F%B0-spark-3.0.0/image_1dgc8g7adkba5flf414cn1s422g.png /%E5%BC%B9%E6%80%A7%E8%AE%A1%E7%AE%97%E5%B9%B3%E5%8F%B0-spark-3.0.0/image_1dgc9oaukrij11oh50m1iktns2t.png

Spark Operator+Spark3的支持

/%E5%BC%B9%E6%80%A7%E8%AE%A1%E7%AE%97%E5%B9%B3%E5%8F%B0-spark-3.0.0/image_1dgc9sf3puct133u1sso1a341tfr4a.png /%E5%BC%B9%E6%80%A7%E8%AE%A1%E7%AE%97%E5%B9%B3%E5%8F%B0-spark-3.0.0/image_1dgca408d187h61qs9j174j1nlr67.png /%E5%BC%B9%E6%80%A7%E8%AE%A1%E7%AE%97%E5%B9%B3%E5%8F%B0-spark-3.0.0/image_1dgcdmrdv5r3km1mqt1hlqm8d2j.png

Kerberos

/%E5%BC%B9%E6%80%A7%E8%AE%A1%E7%AE%97%E5%B9%B3%E5%8F%B0-spark-3.0.0/image_1dgcad5aegcv1s0d30h5qpocg9.png /%E5%BC%B9%E6%80%A7%E8%AE%A1%E7%AE%97%E5%B9%B3%E5%8F%B0-spark-3.0.0/image_1dgcaqesb1qm1e021pad3h31p101m.png
警告
本文最后更新于 2017年2月1日,文中内容可能已过时,请谨慎参考。