chapter.hive.xml

<chapter id="index"><?dbhtml dir="apache-hive" ?>
	<title>Apache Hive</title>
	<para>Hive是基于Hadoop构建的一套数据仓库分析系统，它提供了丰富的SQL查询方式来分析存储在Hadoop 分布式文件系统中的数据。其在Hadoop的架构体系中承担了一个SQL解析的过程，它提供了对外的入口来获取用户的指令然后对指令进行分析，解析出一个MapReduce程序组成可执行计划，并按照该计划生成对应的MapReduce任务提交给Hadoop集群处理，获取最终的结果。</para>
	<section id="setup">
		<title>安装 Apache Hive</title>
		<para>安装 Apache Hive 需要 Hadoop和MySQL，这里假设你已经懂得如何安装Hadoop和MySQL，所以一下将采用<ulink url="https://github.com/oscm/shell">Netkiller OSCM</ulink>一件安装脚本来初始化Hadoop和MySQL,如果需要详细的安装步骤请参考笔者的相关文章。</para>

		<section>
			<title>MySQL</title>
			<para>默认情况下, Hive 使用内嵌的 Derby 数据库保存元数据, 通常生产环境会使用 MySQL 来存放 Hive 元数据。</para>
			<para>使用下面脚本一键安装MySQL 5.7 安装后会显示mysql的初始密码，是所有初始密码登陆后修改为你的需要密码</para>
			<screen>
			<![CDATA[
curl -s https://raw.githubusercontent.com/oscm/shell/master/database/mysql/5.7/mysql.server.sh | bash

2016-02-16T08:22:58.253030Z 1 [Note] A temporary password is generated for root@localhost: sd%%my.Ak7Ma
			]]>
			</screen>
			<para>安装 MySQL JDBC 连接库。</para>
			<screen>
			<![CDATA[
curl -s https://raw.githubusercontent.com/oscm/shell/master/database/mysql/5.7/mysql-connector-java.sh | bash
			]]>
			</screen>
			<para>创建一个 hive 数据库用来存储 Hive 元数据，且数据库访问的用户名和密码都为 hive。</para>
			<screen>
			<![CDATA[
mysql> CREATE DATABASE hive; 
Query OK, 1 row affected (0.03 sec)
			]]>
			</screen>
			<para>创建用户hive并授权访问hive数据库</para>
			<screen>
			<![CDATA[
mysql> CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive';
Query OK, 0 rows affected (0.04 sec)

mysql> GRANT ALL ON hive.* TO 'hive'@'localhost' IDENTIFIED BY 'hive';
Query OK, 0 rows affected (0.01 sec)

mysql> GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY 'hive';
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

mysql> quit;
Bye
			]]>
			</screen>
		</section>
		<section>
			<title>Hadoop</title>
			<para>安装 Hadoop 采用单机模式</para>
			<screen>
			<![CDATA[
curl -s https://raw.githubusercontent.com/oscm/shell/master/distributed/hadoop/hadoop-2.8.0.sh | bash
curl -s https://raw.githubusercontent.com/oscm/shell/master/distributed/hadoop/single.sh | bash
curl -s https://raw.githubusercontent.com/oscm/shell/master/distributed/hadoop/startup.sh | bash 
			]]>
			</screen>
		</section>
		<section>
			<title>Hive</title>
			<para>可以从 Apache 镜像站点中下载最新稳定版的 apache-hive-2.1.1-bin.tar.gz</para>
			<screen>
			<![CDATA[
cd /usr/local/src
wget http://mirrors.hust.edu.cn/apache/hive/stable-2/apache-hive-2.1.1-bin.tar.gz

tar zxf apache-hive-2.1.1-bin.tar.gz
mv apache-hive-2.1.1-bin /srv/apache-hive-2.1.1
ln -s /srv/apache-hive-2.1.1/ /srv/apache-hive
chown hadoop:hadoop -R /srv/apache-hive-2.1.1
			]]>
			</screen>
			<para></para>
			<screen>
			<![CDATA[
cat > /srv/apache-hive/conf/hive-env.sh <<'EOF'
export JAVA_HOME=/srv/java
export HADOOP_HOME=/srv/apache-hadoop
export HBASE_HOME=/srv/apache-hbase
export HIVE_HOME=/srv/apache-hive
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin
EOF

cat >> ~/.bash_profile <<'EOF'
export JAVA_HOME=/srv/java
export HADOOP_HOME=/srv/apache-hadoop
export HBASE_HOME=/srv/apache-hbase
export HIVE_HOME=/srv/apache-hive
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin
EOF

source ~/.bash_profile
			]]>
			</screen>
			<para>安装JDBC驱动</para>
			<screen>
			<![CDATA[
[root@localhost apache-hive]# ln -s  /usr/share/java/mysql-connector-java.jar /srv/apache-hive/lib/
[root@localhost apache-hive]# ll /srv/apache-hive/lib/mysql-connector-java.jar 
lrwxrwxrwx 1 root root 40 Jun 29 01:59 /srv/apache-hive/lib/mysql-connector-java.jar -> /usr/share/java/mysql-connector-java.jar
			]]>
			</screen>
			<para>修改 hive-site.xml 配置文件，配置工作目录</para>
			<screen>
			<![CDATA[
<property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/hive</value>
</property>
<property>
    <name>system:user.name</name>
    <value>hadoop</value>
</property>

  <property>
    <name>hive.querylog.location</name>
    <value>/tmp/live/hadoop</value>
    <description>Location of Hive run time structured log file</description>
  </property>
  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/tmp/hive</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/tmp/hive/${hive.session.id}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>

  <property>
    <name>hive.querylog.location</name>
    <value>/user/hive/log</value>
    <description>Location of Hive run time structured log file</description>
  </property>
			]]>
			</screen>
			<para>把默认的 Derby 修改为 MySQL 需要在该文件中配置 MySQL 数据库连接信息。</para>
			<screen>
			<![CDATA[
<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:derby:;databaseName=metastore_db;create=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>org.apache.derby.jdbc.EmbeddedDriver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>APP</value>
    <description>Username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>mine</value>
    <description>password to use against metastore database</description>
  </property>
			]]>
			</screen>
			<para>将上面配置项 value 改为下面的配置</para>
			<screen>
			<![CDATA[
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
  </property>
			]]>
			</screen>
		</section>
		<section>
			<title>启动 Hive</title>
			<para>启动 Hive 前你必须做两件事，一是创建HDFS目录，二是初始化 MySQL 数据库。</para>
			<para>为 Hive 创建 HDFS 工作目录并给它们赋相应的权限。</para>
			<screen>
			<![CDATA[
[root@localhost ~]$ su - hadoop
[hadoop@localhost ~]$ /srv/apache-hadoop/bin/hdfs dfs -mkdir -p /user/hive/warehouse
[hadoop@localhost ~]$ /srv/apache-hadoop/bin/hdfs dfs -mkdir -p /tmp/hive
[hadoop@localhost ~]$ /srv/apache-hadoop/bin/hdfs dfs -chmod g+w /user/hive/warehouse
[hadoop@localhost ~]$ /srv/apache-hadoop/bin/hdfs dfs -chmod 777 /tmp/hive
			]]>
			</screen>
			<para>初始化 MySQL 数据库</para>
			<screen>
			<![CDATA[
[hadoop@localhost ~]$ /srv/apache-hive/bin/schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/srv/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/srv/apache-hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:	 jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 hive
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.mysql.sql
Initialization script completed
schemaTool completed
			]]>
			</screen>
			<para>如果使用内嵌数据库 derby 请使用下面命令初始化</para>
			<screen>
schematool -initSchema -dbType derby
			</screen>
		</section>
		<section>
			<title>访问 Hive</title>
			<para>启动 Hadoop</para>
			<screen>
			<![CDATA[
[hadoop@localhost ~]$ /srv/apache-hadoop/sbin/start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /srv/apache-hadoop-2.8.0/logs/hadoop-hadoop-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /srv/apache-hadoop-2.8.0/logs/hadoop-hadoop-datanode-localhost.localdomain.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /srv/apache-hadoop-2.8.0/logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out
starting yarn daemons
starting resourcemanager, logging to /srv/apache-hadoop-2.8.0/logs/yarn-hadoop-resourcemanager-localhost.localdomain.out
localhost: starting nodemanager, logging to /srv/apache-hadoop-2.8.0/logs/yarn-hadoop-nodemanager-localhost.localdomain.out
			]]>
			</screen>
			<para>进入 Hive 然后输入 show databases; 测试安装是否正常。</para>
			<screen>
			<![CDATA[
[hadoop@localhost conf]$ /srv/apache-hive/bin/hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/srv/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/srv/apache-hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in file:/srv/apache-hive-2.1.1/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show databases;
OK
default
Time taken: 0.264 seconds, Fetched: 1 row(s)
hive>			
			]]>
			</screen>
			<para>至此 Apache Hive 已经安装配置完成！</para>
			<para>尝试执行下面的SQL命令测试Hive是否正常</para>
			<screen>
create database test;
use test;
create table test_table (id int,name string) row format delimited fields terminated by ',' stored as textfile
insert into test_table values (1, 'Neo');
select * from test_table;
			</screen>
		</section>
		<section>
			<title>配置 hiveserver2</title>
			<para>hiveserver2 提供远程访问 Hive 服务，用户可以通过IP地址和端口号连接到Hive，类似mysql client</para>
			<screen>
			<![CDATA[
[hadoop@localhost ~]$ /srv/apache-hive/bin/hiveserver2 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/srv/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/srv/apache-hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
			]]>
			</screen>
			<para>检查端口</para>
			<screen>
[hadoop@localhost bin]$ ss -lnt | grep 10000
LISTEN     0      50           *:10000                    *:* 
			</screen>
			<para>测试 beeline 是否可以正常进入</para>
			<screen>
			<![CDATA[
[hadoop@localhost ~]$ /srv/apache-hive/bin/beeline -u jdbc:hive2://
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/srv/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/srv/apache-hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://
17/06/29 22:01:16 [main]: WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
Connected to: Apache Hive (version 2.1.1)
Driver: Hive JDBC (version 2.1.1)
17/06/29 22:01:16 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false.
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.1.1 by Apache Hive
0: jdbc:hive2://> show databases;
OK
+----------------+--+
| database_name  |
+----------------+--+
| default        |
+----------------+--+
1 row selected (1.318 seconds)
			]]>
			</screen>
			<para>如果是生产环境启动请使用下面的方法</para>
			<screen>
			<![CDATA[
[hadoop@localhost ~]$ /srv/apache-hive/bin/hive --service hiveserver2 &
[1] 20375
[hadoop@localhost ~]$ SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/srv/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/srv/apache-hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]			
			]]>
			</screen>
		</section>
	</section>
	<section id="beeline">
		<title>beeline</title>
		<para>beeline 是 hive 提供的一个新的命令行工具，基于SQLLine CLI的JDBC客户端，beeline 与HiveServer2配合使用，支持嵌入模式和远程模式两种，可以像hive client一样访问本机的hive服务，也可以通过指定ip和端口访问远程hive服务。</para>
		<para>hive 官方是推荐使用beeline，因为它还提供了更为友好的交互方式（类似mysql client）</para>
		<para>连接到本机的Hive</para>
		<screen>
		<![CDATA[
[hadoop@localhost ~]$ /srv/apache-hive/bin/beeline -u jdbc:hive2://
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/srv/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/srv/apache-hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://
17/06/29 22:01:16 [main]: WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
Connected to: Apache Hive (version 2.1.1)
Driver: Hive JDBC (version 2.1.1)
17/06/29 22:01:16 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false.
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.1.1 by Apache Hive
0: jdbc:hive2://> show databases;
OK
+----------------+--+
| database_name  |
+----------------+--+
| default        |
+----------------+--+
1 row selected (1.318 seconds)
0: jdbc:hive2://> use default;
OK
No rows affected (0.03 seconds)
0: jdbc:hive2://> show tables;
OK
+-----------+--+
| tab_name  |
+-----------+--+
| invites   |
| member    |
| passwd    |
| v_test    |
| vipuser   |
+-----------+--+
5 rows selected (0.068 seconds)
0: jdbc:hive2://> 			
		]]>
		</screen>
		<para>连接远程主机</para>
		<screen>
		<![CDATA[
[hadoop@localhost ~]$ /srv/apache-hive/bin/beeline -u jdbc:hive2://hadoop@localhost:10000
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/srv/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/srv/apache-hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://hadoop@localhost:10000
Connected to: Apache Hive (version 2.1.1)
Driver: Hive JDBC (version 2.1.1)
17/06/29 23:05:35 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false.
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.1.1 by Apache Hive
0: jdbc:hive2://hadoop@localhost:10000> show databases;
+----------------+--+
| database_name  |
+----------------+--+
| default        |
+----------------+--+
1 row selected (1.332 seconds)
0: jdbc:hive2://hadoop@localhost:10000> use default;
No rows affected (0.038 seconds)
0: jdbc:hive2://hadoop@localhost:10000> show tables;
+-----------+--+
| tab_name  |
+-----------+--+
| invites   |
| member    |
| passwd    |
| t_hive    |
| v_test    |
| vipuser   |
+-----------+--+
6 rows selected (0.049 seconds)
0: jdbc:hive2://hadoop@localhost:10000> select * from member;
+--------------+-------------+-------------+---------------+--+
| member.name  | member.age  | member.sex  | member.phone  |
+--------------+-------------+-------------+---------------+--+
| Neo          | 30          | 1           | 13113668890   | 
+--------------+-------------+-------------+---------------+--+
No rows selected (1.137 seconds)
1: jdbc:hive2://hadoop@localhost:10000> 
		]]>
		</screen>
	</section>
	<section id="hive.shell">
		<title>管理 Hive</title>
		<para>如果你又任何一种关系型数据库的使用经验，那么你将在这里看到非常熟悉的操作。</para>
		<section>
			<title>表管理</title>
			<section>
				<title>创建表</title>
				<literallayout>
CREATE TABLE member (name string, age int, sex int);
				</literallayout>
				<screen>
				<![CDATA[
hive> CREATE TABLE member (name string, age int, sex int);
OK
Time taken: 0.687 seconds
hive>
				]]>
				</screen>
				<para>基于现有的数据建表</para>
				<screen>
				<![CDATA[
hive> create table newtable as select * from oldtable;
				]]>
				</screen>
			</section>
			<section>
				<title>显示表</title>
				<screen>
				<![CDATA[
hive> SHOW TABLES;
OK
test
Time taken: 0.041 seconds, Fetched: 1 row(s)
hive> 
				]]>
				</screen>
				<para>通配符匹配表名称</para>
				<screen>
show tables '*t*';
				</screen>
			</section>
			<section>
				<title>删除表</title>
				<screen>
				<![CDATA[
hive> DROP TABLE test;
OK
Time taken: 1.337 seconds
hive>
				]]>
				</screen>
			</section>
			<section>
				<title>查看表结构</title>
				<screen>
				<![CDATA[
hive> CREATE TABLE member (name string, age int, sex int);
OK
Time taken: 0.273 seconds

hive> desc member;
OK
name                	string              	                    
age                 	int                 	                    
sex                 	int                 	                    
Time taken: 0.035 seconds, Fetched: 3 row(s)
hive> 
				]]>
				</screen>
			</section>
			<section>
				<title>为表增加字段</title>
				<para>增加一个字段 phone 字符串类型</para>
				<screen>
				<![CDATA[
hive> ALTER TABLE member ADD COLUMNS (phone String);
OK
Time taken: 0.188 seconds
hive> desc member;
OK
name                	string              	                    
age                 	int                 	                    
sex                 	int                 	                    
phone               	string              	                    
Time taken: 0.033 seconds, Fetched: 4 row(s)
				]]>
				</screen>
			</section>
			<section>
				<title>修改表名称</title>
				<para>将 test 表重命名为 vipuser</para>
				<screen>
				<![CDATA[
hive> CREATE TABLE test (name string, age int, sex int);
OK
Time taken: 0.311 seconds
hive> ALTER TABLE test RENAME TO vipuser;
OK
Time taken: 0.115 seconds
hive> desc vipuser;
OK
name                	string              	                    
age                 	int                 	                    
sex                 	int                 	                    
Time taken: 0.032 seconds, Fetched: 3 row(s)
hive> 
				]]>
				</screen>
			</section>
			<section>
				<title>使用已有表结构创建新表</title>
				<para>仅仅创建表结构，不会复制数据过来。</para>
				<screen>
				<![CDATA[
hive> CREATE TABLE news_2017 LIKE news;
OK
Time taken: 0.311 seconds
				]]>
				</screen>
			</section>
		</section>
		<section>
			<title>分区表</title>
			<section>
				<title>创建分区表</title>
				<screen>
				<![CDATA[
hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
				]]>
				</screen>
			</section>
			<section>
				<title>显示分区情况</title>
				<screen>
				<![CDATA[
hive> SHOW PARTITIONS passwd;
OK
computer=hadoop
computer=hbase
computer=hive
Time taken: 0.056 seconds, Fetched: 3 row(s)
				]]>
				</screen>
			</section>
			<section>
				<title>增加分区</title>
				<screen>
				<![CDATA[
hive> alter table member add partition (province='shenzhen');
				]]>
				</screen>
			</section>
			<section>
				<title>向分区表导入数据</title>
				<screen>
				<![CDATA[
hive> CREATE TABLE passwd (a string, b string, c string, d string, e string, f string) PARTITIONED BY (computer string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ':';
OK
Time taken: 0.323 seconds
hive> load data local inpath '/etc/passwd' overwrite into table passwd partition(computer="hive");
Loading data to table default.passwd partition (computer=hive)
OK
Time taken: 0.499 seconds
hive> select * from passwd;
OK
root	x	0	0	root	/root	hive
bin	x	1	1	bin	/bin	hive
daemon	x	2	2	daemon	/sbin	hive
adm	x	3	4	adm	/var/adm	hive
lp	x	4	7	lp	/var/spool/lpd	hive
sync	x	5	0	sync	/sbin	hive
shutdown	x	6	0	shutdown	/sbin	hive
halt	x	7	0	halt	/sbin	hive
mail	x	8	12	mail	/var/spool/mail	hive
operator	x	11	0	operator	/root	hive
games	x	12	100	games	/usr/games	hive
ftp	x	14	50	FTP User	/var/ftp	hive
nobody	x	99	99	Nobody	/	hive
dbus	x	81	81	System message bus	/	hive
polkitd	x	999	998	User for polkitd	/	hive
avahi	x	70	70	Avahi mDNS/DNS-SD Stack	/var/run/avahi-daemon	hive
avahi-autoipd	x	170	170	Avahi IPv4LL Stack	/var/lib/avahi-autoipd	hive
postfix	x	89	89		/var/spool/postfix	hive
sshd	x	74	74	Privilege-separated SSH	/var/empty/sshd	hive
ntp	x	38	38		/etc/ntp	hive
rpc	x	32	32	Rpcbind Daemon	/var/lib/rpcbind	hive
qemu	x	107	107	qemu user	/	hive
unbound	x	998	996	Unbound DNS resolver	/etc/unbound	hive
rpcuser	x	29	29	RPC Service User	/var/lib/nfs	hive
nfsnobody	x	65534	65534	Anonymous NFS User	/var/lib/nfs	hive
saslauth	x	997	76	"Saslauthd user"	/run/saslauthd	hive
radvd	x	75	75	radvd user	/	hive
nagios	x	1000	1000		/home/nagios	hive
apache	x	48	48	Apache	/usr/share/httpd	hive
exim	x	93	93		/var/spool/exim	hive
tss	x	59	59	Account used by the trousers package to sandbox the tcsd daemon	/dev/null	hive
git	x	996	994		/var/opt/gitlab	hive
gitlab-www	x	995	993		/var/opt/gitlab/nginx	hive
gitlab-redis	x	994	992		/var/opt/gitlab/redis	hive
gitlab-psql	x	993	991		/var/opt/gitlab/postgresql	hive
nginx	x	992	990	nginx user	/var/cache/nginx	hive
www	x	80	80	Web Application	/www	hive
mysql	x	27	27	MySQL Server	/var/lib/mysql	hive
redis	x	991	989	Redis Database Server	/var/lib/redis	hive
epmd	x	990	988	Erlang Port Mapper Daemon	/tmp	hive
rabbitmq	x	989	987	RabbitMQ messaging server	/var/lib/rabbitmq	hive
solr	x	1001	1001	Apache Solr	/srv/solr	hive
mongodb	x	184	986	MongoDB Database Server	/var/lib/mongodb	hive
test	x	1002	1002		/home/test	hive
sysaccount	x	988	985		/home/sysaccount	hive
systemd-bus-proxy	x	987	983	systemd Bus Proxy	/	hive
systemd-network	x	986	982	systemd Network Management	/	hive
elasticsearch	x	985	980	elasticsearch user	/home/elasticsearch	hive
zabbix	x	984	979	Zabbix Monitoring System	/var/lib/zabbix	hive
mysqlrouter	x	983	978	MySQL Router	/var/lib/mysqlrouter	hive
hadoop	x	1003	1003		/home/hadoop	hive
Time taken: 0.118 seconds, Fetched: 51 row(s)

hive> SHOW PARTITIONS passwd;
OK
computer=hive
Time taken: 0.058 seconds, Fetched: 1 row(s)
				]]>
				</screen>
			</section>
		</section>
		<section>
			<title>视图管理</title>
			<section>
				<title>创建视图</title>
			
				<screen>
				<![CDATA[
hive> CREATE VIEW v_test AS SELECT name,age FROM member where age>20;
hive> select * from v_test;
				]]>
				</screen>
			</section>
			<section>
				<title>删除视图</title>
				<screen>
				<![CDATA[
hive> drop view test;
OK
Time taken: 0.276 seconds			
				]]>
				</screen>
				<para>判断视图是否存在</para>
				<screen>
				<![CDATA[
hive> DROP VIEW IF EXISTS v_test;
OK
Time taken: 0.495 seconds
				]]>
				</screen>
			</section>
		</section>
		<section>
			<title>数据管理</title>
			<section>
				<title>从文本文件导入数据</title>
				<para>首先创建一个文本文件，如下：</para>
				<screen>
				<![CDATA[
[root@localhost ~]# cat /tmp/hive.txt 
1	2	3
2	3	4
3	4	5
6	7	8
				]]>
				</screen>
				<screen>
				<![CDATA[
hive> CREATE TABLE test (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.294 seconds
hive> LOAD DATA LOCAL INPATH '/tmp/hive.txt' OVERWRITE INTO TABLE test;
Loading data to table default.test
OK
Time taken: 0.541 seconds
hive> select * from test;
OK
1	2	3
2	3	4
3	4	5
6	7	8
Time taken: 0.952 seconds, Fetched: 5 row(s)
				]]>
				</screen>
			</section>
			<section>
				<title>从其他表查询数据并创建新表</title>
				<screen>
				<![CDATA[
hive> CREATE TABLE mytable AS SELECT * FROM anytable;
				]]>
				</screen>
			</section>
			
			<section>
				<title>从其他表查询数据然后插入指定表中</title>
				<screen>
				<![CDATA[
 INSERT OVERWRITE TABLE mytable SELECT * FROM other ;
				]]>
				</screen>
			</section>
			<section>
				<title>从现有表中查询数据然后插入到新的分区表中</title>
				<screen>
				<![CDATA[
hive> insert into table table2 partition(created_date) select * from table1;
hive> insert into table newtable partition(type='1') select * from oldtable;
				]]>
				</screen>
			</section>
		</section>
		<section>
			<title>HDFS与本地文件系统管理</title>
			<section>
				<title>HDFS 目录迁移</title>
				<screen>
				<![CDATA[
[hadoop@localhost ~]$ hdfs dfs -ls /user/hive/warehouse
Found 3 items
drwxrwxr-x   - hadoop supergroup          0 2017-06-29 03:36 /user/hive/warehouse/member
drwxrwxr-x   - hadoop supergroup          0 2017-06-29 03:32 /user/hive/warehouse/test
drwxrwxr-x   - hadoop supergroup          0 2017-06-29 03:41 /user/hive/warehouse/vipuser

[hadoop@localhost ~]$ hdfs dfs -cp /user/hive/warehouse/vipuser /user/hive/warehouse/vipuser2
				]]>
				</screen>
			</section>
			<section>
				<title>导出表数据到本地文件</title>
				<screen>
				<![CDATA[
hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/test' SELECT * FROM test;

hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/test' SELECT * FROM member;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20170629040540_ddeda146-efed-44c4-bb20-a6453c21cc8e
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498716998098_0001, Tracking URL = http://localhost:8088/proxy/application_1498716998098_0001/
Kill Command = /srv/apache-hadoop/bin/hadoop job  -kill job_1498716998098_0001
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2017-06-29 04:05:49,221 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1498716998098_0001
Moving data to local directory /tmp/test
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 10.54 seconds	
				]]>
				</screen>
			</section>
			<section>
				<title>导出到HDFS</title>
				<screen>
				<![CDATA[
hive> insert overwrite directory '/usr/tmp/test' select * from test;				
				]]>
				</screen>
			</section>
			<section>
				<title></title>
				<screen>
				<![CDATA[
	
				]]>
				</screen>
			</section>
		</section>
	</section>
	<section id="HiveQL">
		<title>HiveQL - Hive查询语言</title>
		<para>HiveQL 与 SQL 极其相似，SQL语法尽管尝试。</para>
		<section>
			<title>JOIN 连接查询</title>
			<screen>
			<![CDATA[
hive> SELECT t1.a,t1.b,t2.a,t2.b
    > FROM table1 t1 JOIN table2 t2 on t1.a=t2.a
    > WHERE t1.a>10;
			]]>
			</screen>
		</section>
		<section>
			<title>子查询</title>
			<screen>
			<![CDATA[

			]]>
			</screen>
		</section>

	</section>
	<section id="hive.faq">
		<title>FAQ</title>
		<section>
			<title>adoop.security.authorize.AuthorizationException): User: hadoop is not allowed to impersonate anonymous</title>
			<para>编辑 /srv/apache-hadoop/etc/hadoop/core-site.xml 文件，增加如下配置，这里使用 * 匹配，你可以根据实际需要修改。</para>
			<screen>
			<![CDATA[
 	<property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
        <description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.hosts</name>
        <value>*</value>
        <description>The superuser can connect only from host1 and host2 to impersonate a user</description>
    </property>
			]]>
			</screen>
			<para>下面是我的配置文件仅供参考</para>
			<screen>
			<![CDATA[
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://localhost:9000/</value>
	</property>
    <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>hadoop</value>
        <description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.hosts</name>
        <value>192.168.0.1,127.0.0.1,localhost</value>
        <description>The superuser can connect only from host1 and host2 to impersonate a user</description>
    </property>
</configuration>
			]]>
			</screen>
			<para>重启 Hadoop</para>
			<screen>
			<![CDATA[
[hadoop@localhost hadoop]$ /srv/apache-hadoop/sbin/stop-all.sh 
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
localhost: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop

[hadoop@localhost hadoop]$ /srv/apache-hadoop/sbin/start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /srv/apache-hadoop-2.8.0/logs/hadoop-hadoop-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /srv/apache-hadoop-2.8.0/logs/hadoop-hadoop-datanode-localhost.localdomain.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /srv/apache-hadoop-2.8.0/logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out
starting yarn daemons
starting resourcemanager, logging to /srv/apache-hadoop-2.8.0/logs/yarn-hadoop-resourcemanager-localhost.localdomain.out
localhost: starting nodemanager, logging to /srv/apache-hadoop-2.8.0/logs/yarn-hadoop-nodemanager-localhost.localdomain.out
			]]>
			</screen>
			<para>刷新配置</para>
			<screen>
			
			</screen>
		</section>
	</section>
</chapter>