目录

Kerberos提前爬坑

概述

关于 Hadoop 生态的安全认证问题,包括 Kerberos 和 Hadoop Delegation Token,可以阅读一下 CDH 的博文 Hadoop Delegation Tokens Explained

同时关于 Hadoop 的版本,需要注意的是这个 issue

Spark作业认证

目的是 spark-submit 提交作业的时候,能够接入到 kerberos 中,从而向 YARN 提交作业、访问HDFS等等。

针对 spark-submit 的任务,有两种办法通过 kerberos 认证:

  1. kinit -k -t /etc/security/xx.keytab user/host@REALM.COM,然后spark-submit提交即可
  2. 作为参数提供给spark-submit:--keytab /etc/security/ieevee.zelda1.keytab --principal ieevee/zelda1@ZELDA.COM ,注意紧跟着命令,不要放到最后(会被当做spark JOB的参数)

总之还是以keytab的方式来的,只是从当前Principal缓存里读取,还是自己从keytab里读取。

1、新增一个Principaladdprinc -randkey ieevee/zelda1@ZELDA.COM xst -k ieevee.spark.keytab ieevee/zelda1@ZELDA.COM 2、将生成的ieevee.spark.keytab文件拷贝到/etc/security/下 3、kinit后submit作业

kinit -kt /etc/security/ieevee.spark.keytab ieevee/zelda1 klist #检查Principal缓存

1
./bin/spark-submit --master yarn --class org.apache.spark.examples.SparkLR --name SparkLR lib/spark-examples-1.6.1-hadoop2.6.0.jar

或者跳过 kinit 直接指定 keytab 路径:

1
 ./bin/spark-submit --keytab /etc/security/ieevee.zelda1.keytab --principal ieevee/zelda1@ZELDA.COM --master yarn --class org.apache.spark.examples.SparkLR --name SparkLR lib/spark-examples-1.6.1-hadoop2.6.0.jar

Spark SQL 的 thriftserver 是作为一个 spark 作业,通过 spark-submit 提交给 yarn 的,启动之前需要设置 kinit 或者指定 keytab 由 spark-submit 自己 loginfromkeytab。

spark-submit 还可以指定 –proxy-user 参数,可以模拟其他用户来提交job。

日志对比

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
2019-06-05 18:42:43 INFO  HadoopStepsOrchestrator:54 - Hadoop Conf directory: /opt/spark/hconf
2019-06-05 18:42:43 INFO  HadoopConfBootstrapImpl:54 - HADOOP_CONF_DIR defined. Mounting Hadoop specific files
2019-06-05 18:42:44 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
		[UnixLoginModule]: succeeded importing info:
			uid = 0
			gid = 0
			supp gid = 0
			supp gid = 1
			supp gid = 2
			supp gid = 3
			supp gid = 4
			supp gid = 6
			supp gid = 10
Debug is  true storeKey false useTicketCache true useKeyTab false doNotPrompt true ticketCache is null isInitiator true KeyTab is null refreshKrb5Config is false principal is null tryFirstPass is false useFirstPass is false storePass is false clearPass is false
Acquire TGT from Cache
Principal is hdfs/cdh1@IEGBACKUP.COM
		[UnixLoginModule]: added UnixPrincipal,
				UnixNumericUserPrincipal,
				UnixNumericGroupPrincipal(s),
			 to Subject
Commit Succeeded

# kdestroy
# 脚本,不带 kerberos 相关
# 无法从缓存读取到 principal 
/opt/spark/bin/spark-submit \
      --deploy-mode cluster \
      --class org.apache.spark.examples.HdfsTest \
      --master=k8s://https://kubernetes.default.svc      \
      --conf spark.executor.instances=1 \
      --conf spark.app.name=spark-hdfs \
      --conf spark.kubernetes.namespace=runzhliu \
      --conf spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf \
      --conf spark.kubernetes.driver.docker.image=hub.oa.com/runzhliu/spark-driver-runzhliu:0.0.2 \
      --conf spark.kubernetes.executor.docker.image=hub.oa.com/runzhliu/spark-executor-runzhliu:0.0.2 \
      --conf spark.kubernetes.initcontainer.docker.image=hub.oa.com/runzhliu/spark-init-runzhliu:0.0.1 \
      --conf=spark.driver.cores=2 \
      --conf=spark.driver.memory=4096M \
      --conf=spark.executor.cores=2 \
      --conf=spark.executor.memory=4096M \
      --conf=spark.eventLog.dir=hdfs://sh-spark.hdfs.cr.hdfs.db:9000/yongyu/history \
      local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
      hdfs://sh.hdfs.cr.ied.com:9000/tdw-transfer-data/runzhliu/20180411102548571/20190212/part-00326

[root@kerberos-test-5c4458f757-rsjrl /opt/spark]#kdestroy
[root@kerberos-test-5c4458f757-rsjrl /opt/spark]#klist
klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_0)
...
...
2019-06-05 18:49:17 INFO  HadoopStepsOrchestrator:54 - Hadoop Conf directory: /opt/spark/hconf
2019-06-05 18:49:17 INFO  HadoopConfBootstrapImpl:54 - HADOOP_CONF_DIR defined. Mounting Hadoop specific files
2019-06-05 18:49:17 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
		[UnixLoginModule]: succeeded importing info:
			uid = 0
			gid = 0
			supp gid = 0
			supp gid = 1
			supp gid = 2
			supp gid = 3
			supp gid = 4
			supp gid = 6
			supp gid = 10
Debug is  true storeKey false useTicketCache true useKeyTab false doNotPrompt true ticketCache is null isInitiator true KeyTab is null refreshKrb5Config is false principal is null tryFirstPass is false useFirstPass is false storePass is false clearPass is false
Acquire TGT from Cache
Principal is null
null credentials from Ticket Cache
		[Krb5LoginModule] authentication failed
Unable to obtain Principal Name for authentication
		[UnixLoginModule]: added UnixPrincipal,
				UnixNumericUserPrincipal,
				UnixNumericGroupPrincipal(s),
    			 to Subject
警告
本文最后更新于 2017年2月1日,文中内容可能已过时,请谨慎参考。