上一节用伪分布式部署了hadoop环境,分布式环境其实就是把hadoop不同角色拆分到其他机器
环境准备
1.配置jdk
2.免密钥配置:把node01的公钥分发给其他机器
在node01里面分发密钥到其他节点
scp ~/.ssh/id_dsa.pub root@node02:~/.ssh/node01.pub
scp ~/.ssh/id_dsa.pub root@node03:~/.ssh/node01.pub
scp ~/.ssh/id_dsa.pub root@node04:~/.ssh/node01.pub
到各个机器将公钥添加到认证库里面
cat ~/.ssh/node01.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
在node01机器登录其他节点验证是否已经免密登录
[root@node01 ~]# ssh node02
Last login: Sun Aug 18 17:11:46 2019 from localhost
[root@node02 ~]# exit
登出
Connection to node02 closed.
[root@node01 ~]# ssh node03
Last login: Sun Aug 18 17:11:59 2019 from localhost
[root@node03 ~]# exit
登出
Connection to node03 closed.
[root@node01 ~]# ssh node04
Last login: Sun Aug 18 17:12:16 2019 from localhost
[root@node04 ~]# exit
登出
Connection to node04 closed.
[root@node01 ~]#
3.备份$HADOOP_PREFIX/etc/hadoop,方便保存伪分布式的配置
cp -r hadoop hadoop-local
4.如果是直接做分布式环境配置,参考上一节做PATH和JAVA_HOME的二次配置
5.配置core-site.xml
<configuration>
<!-- 指定NameNode节点-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://node01:9000</value>
</property>
<!-- 修改hadoop.tmp.dir默认配置,否则会默认到/tmp下面,容易造成数据丢失,其他地方会引用这个配置-->
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop/full</value>
</property>
</configuration>
6.配置hdfs-site.xml
<configuration>
<!-- 配置文件副本数 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 配置secondaryNameNode位置-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node02:50090</value>
</property>
</configuration>
7.配置slaves,dataNode位置
node02
node03
node04
8.把/opt/hadoop-2.6.5这个目录远程复制到其他节点
scp -r /opt/hadoop-2.6.5/ root@node02:/opt/
scp -r /opt/hadoop-2.6.5/ root@node03:/opt/
scp -r /opt/hadoop-2.6.5/ root@node04:/opt/
9.把/etc/profile远程复制到其他节点
scp /etc/profile root@node02:/etc/
scp /etc/profile root@node03:/etc/
scp /etc/profile root@node04:/etc/
10.到其他机器上编译一下/etc/profile文件
. /etc/profile
11.格式化hdfs nameNode文件系统
12.启动
[root@node01 ~]# start-dfs.sh
Starting namenodes on [node01]
node01: starting namenode, logging to /opt/hadoop-2.6.5/logs/hadoop-root-namenode-node01.out
node04: starting datanode, logging to /opt/hadoop-2.6.5/logs/hadoop-root-datanode-node04.out
node03: starting datanode, logging to /opt/hadoop-2.6.5/logs/hadoop-root-datanode-node03.out
node02: starting datanode, logging to /opt/hadoop-2.6.5/logs/hadoop-root-datanode-node02.out
Starting secondary namenodes [node02]
node02: starting secondarynamenode, logging to /opt/hadoop-2.6.5/logs/hadoop-root-secondarynamenode-node02.out
其他参考上一节
--创建一个文本文件
for i in `seq 100000`;do echo "hello sxt $i" >> test.txt;done
--指定块大小上传文件
hdfs dfs -D dfs.blocksize=1048576 -put ./test.txt /user/root