spark 3.0.1高可用集群搭建(超详细)

  • 时间:
  • 来源:互联网
  • 文章标签:

部署前的准备:
1.zookeeper集群搭建完毕
2.hadoop高可用集群搭建完毕


节点规划

三台主机上均部署 Worker 服务。同时为了保证高可用,除了在 hadoop001 上部署主 Master 服务外,还在 hadoop002 和 hadoop003 上分别部署备用的 Master 服务,Master 服务由 Zookeeper 集群进行协调管理,如果主 Master 不可用,则备用 Master 会成为新的主 Master。


一, 下载对应的spark压缩包

官网下载地址**根据自己的hadoop版本选择合适的spark版本
在这里插入图片描述

这个网站时常加载不出来,所以就放在了百度网盘上
spark-3.0.1-bin-hadoop2.7.tgz
链接:https://pan.baidu.com/s/1VCwOuYO34OE6Q4H-G03Tmw
提取码:6alf
spark-2.4.0-bin-hadoop2.7.tgz
链接:https://pan.baidu.com/s/1IM0lbhHL9cQKUA4W3bmFfQ
提取码:69z2

二,上传,解压,配置环境变量

推荐使用xftp进行上传
解压:# tar -zxvf 文件名
配置环境变量# vi /etc/profile

添加:
export SPARK_HOME=/root/software/spark-3.0.1-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin

使环境变量生效# source /etc/profile

三,配置spark配置文件

#进入 ${SPARK_HOME}/conf 目录,拷贝配置样本进行修改:
#cp spark-env.sh.template spark-env.sh
#在 spark-env.sh 中添加以下三行内容:
JAVA_HOME=/root/software/jdk1.8.0_251
HADOOP_CONF_DIR=/root/software/hadoop-2.7.7/etc/hadoop
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -					Dspark.deploy.zookeeper.url=hadoop-01:2181,hadoop-02:2181,hadoop-03:2181 -Dspark.deploy.zookeeper.dir=/spark"

#进入 ${SPARK_HOEM}/conf 目录,拷贝配置文件进行修改:
#cp slaves.template slaves
#在 slaves 中添加自己的集群主机名
hadoop-01
hadoop-02
hadoop-03

四,分发spark文件及profile文件

#scp -r /root/software/spark-3.0.1-bin-hadoop2.7/ hadoop-02:/root/software/
#scp -r /root/software/spark-3.0.1-bin-hadoop2.7/ hadoop-03:/root/software/
#scp /etc/profile hadoop-02:/etc
#scp /etc/profile hadoop-03:/etc

**注意:hadoop-02:/root/software 和 hadoop-03:/root/software必须存在
**注意:分发 /etc/profile 之后一定要在子节点上进行 使环境变量生效的命令 # source /etc/profile

五,测试

1.启动zookeeper集群

这里我使用之前写的 zookeeper群起脚本进行zookeeper集群的启动
[root@hadoop-01 ~]# zkStart-all.sh 
===========start zk cluster :hadoop-01===============
ZooKeeper JMX enabled by default
Using config: /root/software/zookeeper-3.4.8/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
===========start zk cluster :hadoop-02===============
ZooKeeper JMX enabled by default
Using config: /root/software/zookeeper-3.4.8/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
===========start zk cluster :hadoop-03===============
ZooKeeper JMX enabled by default
Using config: /root/software/zookeeper-3.4.8/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
===========checking zk node status :hadoop-01===============
ZooKeeper JMX enabled by default
Using config: /root/software/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
===========checking zk node status :hadoop-02===============
ZooKeeper JMX enabled by default
Using config: /root/software/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: leader
===========checking zk node status :hadoop-03===============
ZooKeeper JMX enabled by default
Using config: /root/software/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower

2.启动hadoop集群

[root@hadoop-01 ~]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hadoop-01 hadoop-02]
hadoop-01: starting namenode, logging to /root/software/hadoop-2.7.7/logs/hadoop-root-namenode-hadoop-01.out
hadoop-02: starting namenode, logging to /root/software/hadoop-2.7.7/logs/hadoop-root-namenode-hadoop-02.out
hadoop-02: starting datanode, logging to /root/software/hadoop-2.7.7/logs/hadoop-root-datanode-hadoop-02.out
hadoop-01: starting datanode, logging to /root/software/hadoop-2.7.7/logs/hadoop-root-datanode-hadoop-01.out
hadoop-03: starting datanode, logging to /root/software/hadoop-2.7.7/logs/hadoop-root-datanode-hadoop-03.out
Starting journal nodes [hadoop-01 hadoop-02 hadoop-03]
hadoop-03: starting journalnode, logging to /root/software/hadoop-2.7.7/logs/hadoop-root-journalnode-hadoop-03.out
hadoop-01: starting journalnode, logging to /root/software/hadoop-2.7.7/logs/hadoop-root-journalnode-hadoop-01.out
hadoop-02: starting journalnode, logging to /root/software/hadoop-2.7.7/logs/hadoop-root-journalnode-hadoop-02.out
Starting ZK Failover Controllers on NN hosts [hadoop-01 hadoop-02]
hadoop-01: starting zkfc, logging to /root/software/hadoop-2.7.7/logs/hadoop-root-zkfc-hadoop-01.out
hadoop-02: starting zkfc, logging to /root/software/hadoop-2.7.7/logs/hadoop-root-zkfc-hadoop-02.out
starting yarn daemons
starting resourcemanager, logging to /root/software/hadoop-2.7.7/logs/yarn-root-resourcemanager-hadoop-01.out
hadoop-01: starting nodemanager, logging to /root/software/hadoop-2.7.7/logs/yarn-root-nodemanager-hadoop-01.out
hadoop-02: starting nodemanager, logging to /root/software/hadoop-2.7.7/logs/yarn-root-nodemanager-hadoop-02.out
hadoop-03: starting nodemanager, logging to /root/software/hadoop-2.7.7/logs/yarn-root-nodemanager-hadoop-03.out
jps进程查看
[root@hadoop-01 ~]# jps
3361 ResourceManager
3473 NodeManager
3058 JournalNode
2840 DataNode
2730 NameNode
3820 Jps
3246 DFSZKFailoverController
2511 QuorumPeerMain

[root@hadoop-02 ~]# jps
3154 Jps
2870 DFSZKFailoverController
2471 QuorumPeerMain
2664 DataNode
2971 NodeManager
2589 NameNode
2767 JournalNode

[root@hadoop-03 ~]# jps
2612 JournalNode
2733 NodeManager
2893 Jps
2415 QuorumPeerMain
2527 DataNode

3.启动spark集群

…启动work节点
[root@hadoop-01 ~]# /root/software/spark-3.0.1-bin-hadoop2.7/sbin/start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /root/software/spark-3.0.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop-01.out
hadoop-02: starting org.apache.spark.deploy.worker.Worker, logging to /root/software/spark-3.0.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop-02.out
hadoop-01: starting org.apache.spark.deploy.worker.Worker, logging to /root/software/spark-3.0.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop-01.out
hadoop-03: starting org.apache.spark.deploy.worker.Worker, logging to /root/software/spark-3.0.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop-03.out
至此在每个节点上jps都应该可以查得到 Worker 进程,主节点上还有 Master 进程
…启用备用master节点
[root@hadoop-02 ~]# /root/software/spark-3.0.1-bin-hadoop2.7/sbin/start-master.sh 
starting org.apache.spark.deploy.master.Master, logging to /root/software/spark-3.0.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop-02.out
在 hadoop-02上进行jps进程查看应该存在 Master 进程
…进行网页查看

在这里插入图片描述
在这里插入图片描述

4.模拟主节点宕机

…停止主节点master进程
在这里直接**关机**hadoop-01
…查看备用主节点是否切换

在这里插入图片描述

本文链接http://www.taodudu.cc/news/show-1944817.html