/images/avatar.png

runzhliu

Spark分布式执行原理

概述 本文整理自: https://zhuanlan.zhihu.com/p/25772054 基本点 让代码分布式运行是所有分布式计算框架需要解决的最基本的问题。 Spark 是大数据领域中相当火热的计算框架,在大数据分析领域有一

Spark和Kerberos

6 Hadoop Security Guide https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_Security_Guide/content/kerberos-overview.html To create secure communication among its various components, HDP uses Kerberos. Kerberos is a third-party authentication mechanism, in which users and services that users wish to access rely on the Kerberos server to authenticate each to the other. This mechanism also supports encrypting all traffic between the user and the service. The Kerberos server itself is known as the Key Distribution Center, or KDC. At

Spark监控问题

概述 Spark 任务监控分为两个维度,三个模块。 两个维度包括: Spark Operator Spark 应用本身 三个模块包括: Spark Operator 通过Spark Operator部署的Spark3应用 Sp

Spark面试

Kafka分布式的情况下,如何保证消息的顺序 https://www.cnblogs.com/haoxinyue/p/5743775.html Kafka 分布式的单位是 Partition。如何保证消息有序,需要分几个情况讨论。 同一个 Partition 用一个 write ahead log