我们使用mysql来存储hive的元数据(metadata),这里关于元数据以及元数据的存储方式摘录了这篇文章里的部分内容
Hive安装配置指北(含Hive Metastore详解)
1.1 Metadata、Metastore作用
metadata即元数据。元数据包含用Hive创建的database、table等的元信息。
元数据存储在关系型数据库中。如Derby、MySQL等。
Metastore的作用是:客户端连接metastore服务,metastore再去连接MySQL数据库来存取元数据。有了metastore服务,就可以有多个客户端同时连接,而且这些客户端不需要知道MySQL数据库的用户名和密码,只需要连接metastore 服务即可。
1.2 三种配置方式区别
内嵌模式使用的是内嵌的Derby数据库来存储元数据,也不需要额外起Metastore服务。这个是默认的,配置简单,但是一次只能一个客户端连接,适用于用来实验,不适用于生产环境。
本地元存储和远程元存储都采用外部数据库来存储元数据,目前支持的数据库有:MySQL、Postgres、Oracle、MS SQL Server.在这里我们使用MySQL。
本地元存储和远程元存储的区别是:本地元存储不需要单独起metastore服务,用的是跟hive在同一个进程里的metastore服务。远程元存储需要单独起metastore服务,然后每个客户端都在配置文件里配置连接到该metastore服务。远程元存储的metastore服务和hive运行在不同的进程里。
在生产环境中,建议用远程元存储来配置Hive Metastore。
下面是安装过程:
复制、解压、改名
sudo cp apache-hive-1.1.0-bin.tar.gz /usr/local
cd /usr/local
sudo tar zxvf ./apache-hive-1.1.0-bin.tar.gz
sudo mv apache-hive-1.1.0-bin hive
修改环境变量
sudo nano /etc/profile
在下面加上两行:
export HIVE_HOME=/usr/local/hive
export PATH=$HIVE_HOME/bin:$HIVE_HOME/conf:$PATH
进入MySQL
mysql –u root –p
创建用户hive,密码hive
GRANT USAGE ON *.* TO 'hive'@'%' IDENTIFIED BY 'hive' WITH GRANT OPTION;
create database hive;
grant all on hive.* to hive@'%' identified by 'hive';
grant all on hive.* to hive@'localhost' identified by 'hive';
flush privileges;
exit;
验证hive用户
mysql -uhive -phive
show databases;
看到如下反馈信息,则说明创建成功
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hive |
| test |
+--------------------+
3 rows in set (0.00 sec)
退出mysql
Exit
修改hive-site.xml
sudo cp hive/conf/hive-default.xml.template hive/conf/hive-site.xml
sudo nano hive/conf/hive-site.xml
添加以下属性
<property>
<name>javax.jdo.option.ConnectionURL </name>
<value>jdbc:mysql://master:3306/hive </value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName </name>
<value>com.mysql.jdbc.Driver </value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword </name>
<value>hive </value>
</property>
<property>
<name>hive.hwi.listen.port </name>
<value>9999 </value>
<description>This is the port the Hive Web Interface will listen on </description>
</property>
<property>
<name>datanucleus.autoCreateSchema </name>
<value>true</value>
</property>
<property>
<name>datanucleus.fixedDatastore </name>
<value>false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/usr/local/hive/iotmp</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/usr/local/hive/iotmp</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/usr/local/hive/iotmp</value>
<description>Location of Hive run time structured log file</description>
</property>
拷贝mysql-connector-java-5.1.43-bin.jar到hive的lib下面
mv /home/hadoop-sna/Downloads/mysql-connector-java-5.1.43-bin.jar /usr/local/hive/lib/
把jline-2.12.jar拷贝到hadoop相应的目录下,替代jline-0.9.94.jar,否则启动会报错
cp /usr/local/hive/lib/jline-2.12.jar /usr/local/hadoop-2.6.5/share/hadoop/yarn/lib/
mv /usr/local/hadoop-2.6.5/share/hadoop/yarn/lib/jline-0.9.94.jar /usr/local/hadoop-2.6.5/share/hadoop/yarn/lib/jline-0.9.94.jar.bak
创建hive临时文件夹
mkdir /usr/local/hive/iotmp
遇到的问题
- 运行hive,报错:Unable to instantiate org.apache.hadoop.hive.
尝试修改了一下MySQL的日志格式:
我第一次安装的时候改成这样就可以了,但是在后续安装sqoop的时候,这里改成"ROW"并不可以,当时出现了问题2。mysq -u root -p mysql> set global binlog_format='ROW';
- 当启动Hive的时候报错:
Caused by: javax.jdo.JDOException: Couldnt obtain a new sequence (unique id) : Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine limited to row-based logging. InnoDB is limited to row-logging when transaction isolation level is READ COMMITTED or READ UNCOMMITTED.
NestedThrowables: java.sql.SQLException: Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine limited to row-based logging. InnoDB is limited to row-logging when transaction isolation level is READ COMMITTED or READ UNCOMMITTED.
这个问题是由于hive的元数据存储MySQL配置不当引起的,可以这样解决:
mysql> set global binlog_format='MIXED';
- 再次启动hive,然后会报错java.lang.RuntimeException: java.io.IOException: 权限不够
改完权限后就能成功启动了sudo chown -R hadoop-sna hive sudo chgrp -R hadoop-sna hive