因为要协作使用,各组件之间的版本要适应,Cloudera对大数据组件进行了匹配,这里我使用CDH5.3.6套件、JDK使用1.8:
准备工作
- 下载压缩包:http://archive.cloudera.com/cdh5/cdh/5/ 下载CDH5.3.6组件
- 启动系统:虚拟机下CentOS 6.7
- 网络设置:
- 创建工作用户和工作目录:
[root@localhost ~]# useradd hadoop
[root@localhost ~]# passwd
[root@localhost ~]# su hadoop
[hadoop@localhost root]# cd
[hadoop@localhost ~]# mkdir cdh
- 上传压缩包到cdh目录
- 解压并移动压缩包到Downloads:
[hadoop@localhost ~]# cd cdh
[hadoop@localhost cdh]# ls *.tar.gz | xargs -n1 tar xzvf
[hadoop@localhost cdh]# find ./ -name "*.tar.gz" | xargs -i mv {} ../Downloads/
- 环境变量:
[hadoop@localhost cdh]# vi ~/.bashrc
export JAVA_HOME=/home/hadoop/cdh/jdk1.8.0_111
export PATH=$JAVA_HOME/bin:$PATH
Hadoop
- 测试
Local (Standalone) Mode
[hadoop@localhost cdh]$ cd hadoop-2.5.0-cdh5.3.6/
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ mkdir input
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ touch input/wc.input
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ vi input/wc.input
hello world !
hello hadoop ~
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar wordcount input/wc.input output
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ cat output/part-r-00000
! 1
hadoop 1
hello 2
world 1
~ 1
Pseudo-Distributed Mode
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ vi etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/cdh/data/tmp</value>
</property>
</configuration>
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ vim etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ bin/hdfs namenode -format
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ sbin/start-dfs.sh
# 提示输入密码,按提示走。如果配置免密登录就不需要输入了。
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ bin/hdfs dfs -mkdir /user
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ bin/hdfs dfs -ls /
Fully-Distributed Mode
// 配置主机名 IP
[hadoop@localhost .ssh]$ su
Password:
[root@localhost .ssh]# vi /etc/hosts
192.168.2.131 master
192.168.2.132 slave1
192.168.2.133 slave2
// 先配置ssh免密登录
# -P表示密码,-P '' 就表示空密码,也可以不用-P参数,这样就要三车回车,用-P就一次回车。
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ cd ~/.ssh
[hadoop@localhost .ssh]$ ssh-keygen -t rsa -P ''
[hadoop@localhost .ssh]$ cat id_rsa.pub >> ./authorized_keys
[hadoop@localhost .ssh]$ chmod 600 authorized_keys
// 修改配置文件
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ vim etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
[hadoop@localhost hadoop-2.5.0-cdh5.3.6]$ vim etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
先启动bin/hadoop start namenode
Sqoop
Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
就是将常用的MapReduce(数据的导入导出)进行封装,通过传递参数的形式,运行mapreduce程序。
安装MySQL
bin/mysqld --initialize --user=hadoop --basedir=/home/hadoop/cdh/mysql --datadir=/home/hadoop/cdh/mysql/data
Flume
# 修改配置文件
$ cd /home/hadoop/cdh/apache-flume-1.5.0-cdh5.3.6-bin
$ cd conf
$ cp flume-conf.properties.template a1.conf
$ vi a1.conf
# define agent
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# define sources
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# define channels
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# define sink
a1.sinks.k1.type = logger
# a1.sinks.k1.maxBytesToLog = 1024
# bind the sources and sinks to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channels = c1
先去 https://pkgs.org/ 搜索并下载以下三个程序,上传到Downloads目录下
# 使用root用户安装上述三个程序 然后重启服务
$ rpm -ivh ./*.rpm
$ /etc/rc.d/init.d/xinetd restart
$ exit
# 启动服务
$ bin/flume-ng agent \
> -c conf \
> -n a1 \
> -f conf/a1.conf \
> -Dflume.root.logger=DEBUG,console
Oozie
system requirement
Linux
Java 1.6
hadoop
ExtJs Library
Hue