Hive实战之Youtube数据集

1 数据来源

本次实战的数据来自于"YouTube视频统计与社交网络"的数据集,是西蒙弗雷泽大学计算机学院在2008年所爬取的数据
数据集地址

1. 1 Youtube视频表格式如下:

列名 注释
视频ID 一个11位字符串,是唯一的
上传 一个字符串的视频上传者的用户名
年龄 视频上传日期和2007年2月15日之间的整数天(YouTube的设立)
类别 由上传者选择的视频类别的字符串
长度 视频长度的整数v
观看数 一整数的视图
一个浮点数的视频速率
评分 整数的评分
评论数 一整数的评论
相关视频ID 最多20个字符串的相关视频ID

数据之间采用"\t"作为分隔符

具体数据如下:

video ID uploader age category length views rate ratings comments related IDs
ifnlnji-Y4s Hooran 1162 Travel & Events 239 189 4.8 10 3 tpAL3I0urI4 ... ifnlnji-Y4s

数据量大小为1G,条数为500万+

1.2 用户表

列名 uploader videos friends
类型 string int int
解释 上传者 上传视频数 朋友数

2 实战演练准备

2.1 环境搭建

使用环境为
hive-1.1.0-cdh5.4.5
hadoop-2.6.0-cdh5.4.5

演示形式为使用hive shell

2.2 数据清洗

我们一起来看看数据

video ID uploader age category length views rate ratings comments related IDs
ifnlnji-Y4s Hooran 1162 Travel & Events 239 189 4.8 10 3 tpAL3I0urI4 ... ifnlnji-Y4s

主要的问题在于category和relatedIDs处理,由于Hive是支持array格式的,所以我们想到的是使用array来存储category和relatedIDs,但是我们发现category的分割符是"&"而realatedIDs的分隔符是"\t",我们在创建表格的时候能够指定array的分隔符,但是只能指定一个,所以再将数据导入到Hive表格之前我们需要对数据进行一定转换和清洗

并且数据中肯定会存在一些不完整数据和一些奇怪的格式,所以数据的清洗是必要的,我在这里所使用的数据清洗方式是使用Spark进行清洗,也可以使用自定义UDF函数来进行清洗

数据清洗注意点
1)我们可以看到每行数据以"\t"作为分隔符,每行有十列数据,最后一列关联ID可以为空,那么我们对数据进行split之后数组的大小要大于8
2)数据中存在"uNiKXDA8eyQ KRQE 1035 News & Politics 107"这样格式的数据,所以在处理category时需要注意 News & Politics中间的&

处理后的数据如下:

video ID uploader age category length views rate ratings comments related IDs
PkGUU_ggO3k theresident 704 Entertainment 262 11235 3.85 247 280 PkGUU_ggO3k&EYC5bWF0ss8&...shU2hfHKmU0&p0lq5-8IDqY
RX24KLBhwMI lemonette 697 People&Blogs 512 24149 4.22 315 474 t60tW0WevkE&WZgoejVDZlo&...s8xf4QX1UvA&2cKd9ERh5-8

下面的实战都是基于数据清洗后的数据进行的

2.3 创建表格和数据导入

2.3.1 youtube表格的创建和导入

1)youtube1的创建,文件格式为textfile
create table youtube1(videoId string, uploader string, age int, category array<string>, length int, views int, rate float, ratings int, comments int,relatedId array<string>)
row format delimited
fields terminated by "\t"
collection items terminated by "&"
stored as textfile;

2)youtube2的创建,文件格式为orc
create table youtube2(videoId string, uploader string, age int, category array<string>, length int, views int, rate float, ratings int, comments int,relatedId array<string>)
row format delimited
fields terminated by "\t"
collection items terminated by "&"
stored as orc;

3)youtube3的创建,文件格式为orc,进行桶分区
create table youtube3(videoId string, uploader string, age int, category array<string>, length int, views int, rate float, ratings int, comments int,relatedId array<string>)
clustered by (uploader) into 8 buckets
row format delimited
fields terminated by "\t"
collection items terminated by "&"
stored as orc;

数据导入:
1)load data inpath "path" into table youtube1;
2)由于无法将textfile格式的数据导入到orc格式的表格,所以数据需要从youtube1导入到youtube2和youtube3:
insert into table youtube2 select * from youtube1;
insert into table youtube3 select * from youtube1;

2.3.2 user表格的创建和导入

1)user_tmp的创建,文件格式textfile,24buckets
create table user_tmp(uploader string,videos int,friends int)
clustered by (uploader) into 24 buckets
row format delimited
fields terminated by "\t"
stored as textfile;

2)user的创建,文件格式orc,24buckets
create table user(uploader string,videos int,friends int)
clustered by (uploader) into 24 buckets
row format delimited
fields terminated by "\t"
stored as orc;

user表的数据导入也是同理
数据导入:
1)load data inpath "path" into table user_tmp;
2)由于无法将textfile格式的数据导入到orc格式的表格,所以数据需要从user_tmp导入到user:
insert into table user select * from user_tmp;

3 实战需求

1)统计出观看数最多的10个视频
2)统计出视频类别热度的前10个类型
3)统计出视频观看数最高的50个视频的所属类别
4)统计出观看数最多的前N个视频所关联的视频的所属类别排行
5)筛选出每个类别中热度最高的前10个视频
6)筛选出每个类别中评分最高的前10个视频
7)找出用户中上传视频最多的10个用户的所有视频
8)筛选出每个类别中观看数Top10

4 实战演练

4.1 统计出观看数最多的10个视频

select * from youtube3 order by views desc limit 10;

结果如下:

hive> select * from youtube3 order by views desc limit 10;
Query ID = hadoop_20170710155353_4bc057f0-bbd5-4bfe-a66c-ec0e17cb3ca9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
Starting Job = job_1499153664137_0101, Tracking URL = http://master:8088/proxy/application_1499153664137_0101/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0101
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1
2017-07-10 15:53:55,297 Stage-1 map = 0%,  reduce = 0%
...
2017-07-10 15:56:06,210 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 106.93 sec
MapReduce Total cumulative CPU time: 1 minutes 46 seconds 930 msec
Ended Job = job_1499153664137_0101
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 4  Reduce: 1   Cumulative CPU: 106.93 sec   HDFS Read: 574632526 HDFS Write: 2916 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 46 seconds 930 msec
OK
dMH0bHeiRNg judsonlaipply   415 ["Comedy"]  360 79897120    4.65    287260  131356  ["HSoVKUVOnfQ","pv5zWaTEVkI","v1U8f7TmYkA","ZCT3MIUTc3o","mDiBOF8XI44","P9LmHXXWiJs","fo_QVq2lGMs","EtGQgSY9Nn4","5P6UU6m3cqk","61heClTFc5w","h3gdSHGcUU4","OPmYbP0F4Zw","gsOaQGF7kiQ","H2gw9VE16mo","rGkIUlYEQT8","innfyQZHPpo","cu8tUy14zQo","cQ25-glGRzI","WqDbn2iLwwY","jvz0bvYmnto"]
cQ25-glGRzI RCARecords  742 ["Music"]   227 77674728    4.45    188154  186858  ["otMB3WVQNVg","CAoo71VEYhs","6gqSk1UBFyo","pULa7F1gWZM","C2eUCfREAbw","neZw2CH1h9s","2fjW33W7424","aVnl9QCfNyE","DKhnmUdmz74","tPW70sgrHiY","W4kR8OQCrlQ","-WtcXpnMIT8","YhRZx2QIJKg","5iaYFN5eERA","KChJyERIclc","xsRWpK4pf90","lOq-Q60zWQs","ywNZPURFJNU","dMH0bHeiRNg","tvkh29RKFRY"]
12Z3J1uzd0Q kaejane 404 ["Film","Animation"]    615 65341925    3.03    9189    5508    ["innfyQZHPpo","-_CSo1gOd48","1Al4crLW9EM","nhSZs-aAZbo","Af9aPlG2WFw","5P6UU6m3cqk","uBHsOTYO6AI","lW19DnWz6vg","vr3x_RRJdd4","5GE82tqcYYQ","D2kJZOfq7zk","lsO6D1rwrKc","dMH0bHeiRNg","kNLXjXxj3J8","vQbYvjmfbr4","G-HonZGBWus","MuOvqeABHvQ","-2caf6KlSyw","cQ25-glGRzI","pv5zWaTEVkI"]
LpAI8TzQDes IMVUinc 848 ["Entertainment"]   68  65078772    1.38    5678    2499    ["c2eP-UADAi8","5P6UU6m3cqk","cQ25-glGRzI","FYbqcgd97NQ","12Z3J1uzd0Q","VS4wFOWFUt8","RB-wUgnyGv0","sBQLq2VmZcA","w2xUzv6iZWo","oOF3T9Zvizc","244qR7SvvX0","pv5zWaTEVkI","dMH0bHeiRNg","b3u65f4CRLk"]
7AVHXe-ol-s internmarket    603 ["Music"]   264 60349673    3.16    1033    594 ["trrRo3kGRv0","CCsGkN1PEec","l5mQKJDO-nY","DP_4rQcU0G8","7xz5aOvP2YA","kCjKIus3iBA","nD0MILEXZP0","IQcyLMa716k","d05KUL8aW4E","szeeHnu2DM8","7b_wObF6vgs","UazqVaOg9uc","LRBD9l3Nv6Y","108YigkYFgM","_qCau44yfX0","NkkGj4_1m9A","kuyRSrklGU8","uLr7IJYyNnM","IEuyk_277xY","XmAaYo-CQNw"]
244qR7SvvX0 donotasyoudo    960 ["Entertainment"]   6   57790943    1.45    66412   14913   ["v3ARyAb_1Bs","nhSZs-aAZbo","2pNTrYd-4FQ","dMH0bHeiRNg","kHmvkRoEowc","yh0xYO3dYx8","5GE82tqcYYQ","ktUSIJEiOug","6PrDw6T3-rM","-xEzGIuY7kw","innfyQZHPpo","pv5zWaTEVkI","E7QOUJfRs3Y","9oRQJK61MWs"]
ePyRrb2-fzs 1988basti   826 ["Music"]   204 45984219    4.87    51039   27065   ["fm0T7_SGee4","h8oBykb_Pqs","iWg3IMN_rhU","MdOAr_4FJvc","xNp7OxgFJJM","nzaw_Gk9BAU","u9rl4apDFfY","fMzMm4sJYGo","b5eEa5a2Rio","aNR0tW_afdk","Kj_MceyjS6g","GojTUmjxVHU","CGPUuPHdHQg","SIM4DCn7AlE","ktUSIJEiOug","Dn6kGzRSjhU","gxxU68z8quM","HRSrFm_8YK8","2__Qdd11rfA","XewBty95JVg"]
xsRWpK4pf90 universalmusicgroup 902 ["Music"]   240 44614530    4.73    99373   53441   ["a4X7eFbP3u4","4WAapKx2TvM","PgZgubuT2DM","FIUv3dOBbCk","Dk8NQB_T1ws","7NAbTHGIPP8","mekQxrnhQ3I","OouU4PFUw3Y","rwgXvL9o0QQ","4eeyhtlJp5A","ktUSIJEiOug","UZStqFeYk3Y","sF84pIhP5UM","FKXm3Qg7sBo","a1ti0KccvOg","ERV-wh4VwZI","iWg3IMN_rhU","i1PKmEaUqes","gsAN_Zfc1nc","CmBCLWkrysY"]
ktUSIJEiOug aliciakeys  953 ["Music"]   251 43583367    4.85    103074  59672   ["_VD-PDzmW4M","vmjl5ARtQWM","X_SmJpAmirw","glBHQSasVMQ","IGj3T7C-RdE","6U_4ANczMPY","oh3Pkw6cokk","j97WOHviXbE","FqsGSvrcfDE","2PWfB4lurT4","Yci8zVNMEdg","sF84pIhP5UM","dDfEjBSWj54","Zdm7FJktDOM","a4X7eFbP3u4","jhPAK8HjcPI","ePyRrb2-fzs","8_8KJfSNgC4","kN9vm95SocU","4DC4Rb9quKk"]
b3u65f4CRLk srcrecords  729 ["Music"]   256 43511791    4.76    55148   27818   ["4fZF9UsS8LY","5f7kfhHHH_A","H1vaszd6NnA","6Di-dAIR9RA","HiBcQeIax4g","JVciSFc5Wrw","Md6rURKhZmA","IG5ReXP0SSg","D9g2szHsoz0","EflOarvoeVs","7PxBGHjABnU","Gjq1g3j7WFc","mDvCcrU-Ob8","V9YE8dHMxvw","iWg3IMN_rhU","Un2dbprtZFE","WpgMuHwAdC4","RtQ5JENdx5I","ZEYgAgKVuO4","D_2jb8D8AOI"]
Time taken: 172.566 seconds, Fetched: 10 row(s)

4.2 统计出视频类别热度的前10个类型

select tagId, count(a.videoid) as sum from (select videoid,tagId from youtube3 lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc limit 10;

结果:

hive> select tagId, count(a.videoid) as sum from (select videoid,tagId from youtube3 lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc limit 10;
Query ID = hadoop_20170710155757_614ec9dd-ffa0-465d-8d62-c47b5ad585f0
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Defaulting to jobconf value of: 2
Starting Job = job_1499153664137_0102, Tracking URL = http://master:8088/proxy/application_1499153664137_0102/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0102
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 2
2017-07-10 15:58:23,797 Stage-1 map = 0%,  reduce = 0%
...
2017-07-10 15:59:16,341 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 19.51 sec
MapReduce Total cumulative CPU time: 19 seconds 510 msec
Ended Job = job_1499153664137_0102
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1499153664137_0103, Tracking URL = http://master:8088/proxy/application_1499153664137_0103/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0103
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2017-07-10 15:59:51,645 Stage-2 map = 0%,  reduce = 0%
2017-07-10 16:00:18,373 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 0.97 sec
2017-07-10 16:00:48,410 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 2.57 sec
MapReduce Total cumulative CPU time: 2 seconds 570 msec
Ended Job = job_1499153664137_0103
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 4  Reduce: 2   Cumulative CPU: 19.51 sec   HDFS Read: 47327614 HDFS Write: 900 SUCCESS
Stage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 2.57 sec   HDFS Read: 5596 HDFS Write: 148 SUCCESS
Total MapReduce CPU Time Spent: 22 seconds 80 msec
OK
Entertainment   1304724
Music   1274825
Comedy  449652
Blogs   447581
People  447581
Film    442109
Animation   442109
Sports  390619
Politics    186753
News    186753
Time taken: 185.399 seconds, Fetched: 10 row(s)

4.3 统计出视频观看数最高的50个视频的所属类别

select tagId, count(a.videoid) as sum from (select videoid,tagId from (select * from youtube3 order by views desc limit 20) e lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc;

结果:

hive> select tagId, count(a.videoid) as sum from (select videoid,tagId from (select * from youtube3 order by views desc limit 20) e lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc;
Query ID = hadoop_20170710160909_c6cbbe29-4df3-4c0b-ad70-bd34857acc80
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
Starting Job = job_1499153664137_0104, Tracking URL = http://master:8088/proxy/application_1499153664137_0104/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0104
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1
2017-07-10 16:09:48,197 Stage-1 map = 0%,  reduce = 0%
2017-07-10 16:10:17,734 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 3.09 sec
...
2017-07-10 16:10:59,048 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 47.74 sec
MapReduce Total cumulative CPU time: 47 seconds 740 msec
Ended Job = job_1499153664137_0104
Launching Job 2 out of 3
Number of reduce tasks not specified. Defaulting to jobconf value of: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1499153664137_0105, Tracking URL = http://master:8088/proxy/application_1499153664137_0105/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0105
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 2
2017-07-10 16:11:35,860 Stage-2 map = 0%,  reduce = 0%
2017-07-10 16:12:04,633 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 1.07 sec
2017-07-10 16:12:23,187 Stage-2 map = 100%,  reduce = 50%, Cumulative CPU 2.61 sec
2017-07-10 16:12:32,480 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 3.93 sec
MapReduce Total cumulative CPU time: 3 seconds 930 msec
Ended Job = job_1499153664137_0105
Launching Job 3 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1499153664137_0106, Tracking URL = http://master:8088/proxy/application_1499153664137_0106/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0106
Hadoop job information for Stage-3: number of mappers: 2; number of reducers: 1
2017-07-10 16:13:09,244 Stage-3 map = 0%,  reduce = 0%
2017-07-10 16:13:37,263 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.18 sec
2017-07-10 16:14:05,048 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 3.5 sec
MapReduce Total cumulative CPU time: 3 seconds 500 msec
Ended Job = job_1499153664137_0106
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 4  Reduce: 1   Cumulative CPU: 47.74 sec   HDFS Read: 57839263 HDFS Write: 228 SUCCESS
Stage-Stage-2: Map: 1  Reduce: 2   Cumulative CPU: 3.93 sec   HDFS Read: 6065 HDFS Write: 324 SUCCESS
Stage-Stage-3: Map: 2  Reduce: 1   Cumulative CPU: 3.5 sec   HDFS Read: 6600 HDFS Write: 53 SUCCESS
Total MapReduce CPU Time Spent: 55 seconds 170 msec
OK
Music   12
Comedy  3
Entertainment   3
Film    2
Animation   2
Time taken: 297.466 seconds, Fetched: 5 row(s)

4.4 统计出观看数最多的前50个视频所关联的视频的所属类别排行

思路:

  1. 首先筛选出前50个视频所关联的视频,
    select * from youtube3 order by views desc limit 50
  2. 再将结果和youtube3进行join
    select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId
  3. 然后去重
    select distinct(b.videoId),c.category from (select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId
  4. 最后得出所有视频的类别排行
    select tagId, count(e.videoid) as sum from (select videoid,tagId from (select distinct(b.videoId),c.category from (select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId) d lateral view explode(category) catetory as tagId) e group by tagId order by sum desc;

结果:

hive> select tagId, count(e.videoid) as sum from (select videoid,tagId from (select distinct(b.videoId),c.category from (select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId) d lateral view explode(category) catetory as tagId) e group by tagId order by sum desc;
Query ID = hadoop_20170710170808_cfbfde68-c016-4c5d-860b-fe840a2d50cb
Total jobs = 7
Launching Job 1 out of 7
Number of reduce tasks determined at compile time: 1

Starting Job = job_1499153664137_0111, Tracking URL = http://master:8088/proxy/application_1499153664137_0111/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0111
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1
2017-07-10 17:08:48,662 Stage-1 map = 0%,  reduce = 0%
2017-07-10 17:09:25,175 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 20.83 sec
...
2017-07-10 17:10:15,276 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 81.01 sec
MapReduce Total cumulative CPU time: 1 minutes 21 seconds 10 msec
Ended Job = job_1499153664137_0111
Stage-10 is filtered out by condition resolver.
Stage-11 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
17/07/10 17:10:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Execution log at: /tmp/hadoop/hadoop_20170710170808_cfbfde68-c016-4c5d-860b-fe840a2d50cb.log
2017-07-10 05:10:23 Starting to launch local task to process map join;  maximum memory = 518979584
2017-07-10 05:10:34 Dump the side-table for tag: 0 with group count: 743 into file: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-10_17-08-08_628_1950388697756865753-1/-local-10009/HashTable-Stage-8/MapJoin-mapfile30--.hashtable
2017-07-10 05:10:34 Uploaded 1 File to: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-10_17-08-08_628_1950388697756865753-1/-local-10009/HashTable-Stage-8/MapJoin-mapfile30--.hashtable (22611 bytes)
2017-07-10 05:10:34 End of local task; Time Taken: 11.392 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 3 out of 7
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1499153664137_0112, Tracking URL = http://master:8088/proxy/application_1499153664137_0112/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0112
Hadoop job information for Stage-8: number of mappers: 4; number of reducers: 0
2017-07-10 17:11:13,493 Stage-8 map = 0%,  reduce = 0%
2017-07-10 17:11:57,229 Stage-8 map = 100%,  reduce = 0%, Cumulative CPU 15.79 sec
MapReduce Total cumulative CPU time: 15 seconds 790 msec
Ended Job = job_1499153664137_0112
Launching Job 4 out of 7
Number of reduce tasks not specified. Defaulting to jobconf value of: 2
Starting Job = job_1499153664137_0113, Tracking URL = http://master:8088/proxy/application_1499153664137_0113/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0113
Hadoop job information for Stage-3: number of mappers: 3; number of reducers: 2
2017-07-10 17:12:31,982 Stage-3 map = 0%,  reduce = 0%
2017-07-10 17:13:17,467 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 6.53 sec
MapReduce Total cumulative CPU time: 6 seconds 530 msec
Ended Job = job_1499153664137_0113
Launching Job 5 out of 7
Number of reduce tasks not specified. Defaulting to jobconf value of: 2
Starting Job = job_1499153664137_0114, Tracking URL = http://master:8088/proxy/application_1499153664137_0114/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0114
Hadoop job information for Stage-4: number of mappers: 2; number of reducers: 2
2017-07-10 17:13:55,123 Stage-4 map = 0%,  reduce = 0%
2017-07-10 17:14:22,876 Stage-4 map = 50%,  reduce = 0%, Cumulative CPU 0.91 sec
2017-07-10 17:14:23,905 Stage-4 map = 100%,  reduce = 0%, Cumulative CPU 1.99 sec
2017-07-10 17:14:40,601 Stage-4 map = 100%,  reduce = 50%, Cumulative CPU 3.35 sec
2017-07-10 17:14:52,033 Stage-4 map = 100%,  reduce = 100%, Cumulative CPU 4.96 sec
MapReduce Total cumulative CPU time: 4 seconds 960 msec
Ended Job = job_1499153664137_0114
Launching Job 6 out of 7
Number of reduce tasks determined at compile time: 1
Starting Job = job_1499153664137_0115, Tracking URL = http://master:8088/proxy/application_1499153664137_0115/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0115
Hadoop job information for Stage-5: number of mappers: 2; number of reducers: 1
2017-07-10 17:15:27,647 Stage-5 map = 0%,  reduce = 0%
2017-07-10 17:15:54,560 Stage-5 map = 50%,  reduce = 0%, Cumulative CPU 0.91 sec
2017-07-10 17:15:55,587 Stage-5 map = 100%,  reduce = 0%, Cumulative CPU 2.03 sec
2017-07-10 17:16:23,357 Stage-5 map = 100%,  reduce = 100%, Cumulative CPU 3.59 sec
MapReduce Total cumulative CPU time: 3 seconds 590 msec
Ended Job = job_1499153664137_0115
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 4  Reduce: 1   Cumulative CPU: 81.01 sec   HDFS Read: 463074822 HDFS Write: 26949 SUCCESS
Stage-Stage-8: Map: 4   Cumulative CPU: 15.79 sec   HDFS Read: 47322670 HDFS Write: 30829 SUCCESS
Stage-Stage-3: Map: 3  Reduce: 2   Cumulative CPU: 6.53 sec   HDFS Read: 43694 HDFS Write: 1126 SUCCESS
Stage-Stage-4: Map: 2  Reduce: 2   Cumulative CPU: 4.96 sec   HDFS Read: 8888 HDFS Write: 725 SUCCESS
Stage-Stage-5: Map: 2  Reduce: 1   Cumulative CPU: 3.59 sec   HDFS Read: 7001 HDFS Write: 207 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 51 seconds 880 msec
OK
Music   399
Entertainment   103
Comedy  94
People  36
Blogs   36
Film    19
Animation   19
Pets    12
Animals 12
UNA 11
Style   6
Politics    6
News    6
Howto   6
Sports  4
Vehicles    3
Autos   3
Travel  2
Events  2
Technology  1
Science 1
Time taken: 496.822 seconds, Fetched: 21 row(s)

4.5 筛选出某个类别(如music)中热度最高的前10个视频

思路:

  • 创建一个表格,存储每个视频对应一个标签的信息
    create table youtube_category(videoId string, uploader string, age int, categoryId string, length int, views int, rate float, ratings int, comments int,relatedId array<string>)
    row format delimited
    fields terminated by "\t"
    collection items terminated by "&"
    stored as orc;
  • 将转换后的数据进行插入
    insert into table youtube_category select videoid,uploader,age,categoryId,length,views,rate,ratings,comments,relatedId from youtube3 lateral view explode(category) catetory as categoryId;
  • 根据观看数和类别进行查询
    select * from youtube_category where categoryId="Music" order by views desc limit 10;

结果如下:

hive> create table youtube_category(videoId string, uploader string, age int, categoryId string, length int, views int, rate float, ratings int, comments int,relatedId array<string>)
    > row format delimited
    > fields terminated by "\t"
    > collection items terminated by "&"
    > stored as orc;
OK
Time taken: 0.256 seconds
hive> insert into youtube_category select videoid,uploader,age,categoryId,length,views,rate,ratings,comments,relatedId from youtube3 lateral view explode(category) catetory as categoryId;
Query ID = hadoop_20170711091010_7c76d7bd-5af0-43f9-812f-e30c612ee60b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1499153664137_0116, Tracking URL = http://master:8088/proxy/application_1499153664137_0116/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0116
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 0
2017-07-11 09:11:18,741 Stage-1 map = 0%,  reduce = 0%
...
2017-07-11 09:19:05,239 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 402.71 sec
MapReduce Total cumulative CPU time: 6 minutes 42 seconds 710 msec
Ended Job = job_1499153664137_0116
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://bydcluster1/user/hive/warehouse/youtube_category/.hive-staging_hive_2017-07-11_09-10-38_970_8191962088714912782-1/-ext-10000
Loading data to table default.youtube_category
Table default.youtube_category stats: [numFiles=4, numRows=6742209, totalSize=608748677, rawDataSize=10187373874]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 4   Cumulative CPU: 404.25 sec   HDFS Read: 574634998 HDFS Write: 608749047 SUCCESS
Total MapReduce CPU Time Spent: 6 minutes 44 seconds 250 msec
OK
Time taken: 511.914 seconds
hive> select * from youtube_category where categoryId="music" order by views desc limit 10;
Query ID = hadoop_20170711091919_a48aa98b-9cfa-4f58-a106-9da85484f0dd
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1499153664137_0117, Tracking URL = http://master:8088/proxy/application_1499153664137_0117/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0117
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2017-07-11 09:20:37,607 Stage-1 map = 0%,  reduce = 0%
...
2017-07-11 09:21:32,382 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 36.15 sec
MapReduce Total cumulative CPU time: 36 seconds 150 msec
Ended Job = job_1499153664137_0117
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 36.15 sec   HDFS Read: 607706126 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 36 seconds 150 msec
OK
Time taken: 96.836 seconds
hive> select * from youtube_category where catagoryId="Music" order by views desc limit 10;
FAILED: SemanticException [Error 10004]: Line 1:37 Invalid table alias or column reference 'catagoryId': (possible column names are: videoid, uploader, age, categoryid, length, views, rate, ratings, comments, relatedid)
hive> select * from youtube_category where catagoryid="Music" order by views desc limit 10;
FAILED: SemanticException [Error 10004]: Line 1:37 Invalid table alias or column reference 'catagoryid': (possible column names are: videoid, uploader, age, categoryid, length, views, rate, ratings, comments, relatedid)
hive> select * from youtube_category where categoryid="Music" order by views desc limit 10;
Query ID = hadoop_20170711092525_53f32ccf-7d04-4615-b446-9009cf16dc7f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1499153664137_0118, Tracking URL = http://master:8088/proxy/application_1499153664137_0118/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0118
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2017-07-11 09:26:08,727 Stage-1 map = 0%,  reduce = 0%
...
2017-07-11 09:27:17,297 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 57.89 sec
MapReduce Total cumulative CPU time: 57 seconds 890 msec
Ended Job = job_1499153664137_0118
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 58.14 sec   HDFS Read: 607706243 HDFS Write: 3043 SUCCESS
Total MapReduce CPU Time Spent: 58 seconds 140 msec
OK
cQ25-glGRzI RCARecords  742 Music   227 77674728    4.45    188154  186858  ["otMB3WVQNVg","CAoo71VEYhs","6gqSk1UBFyo","pULa7F1gWZM","C2eUCfREAbw","neZw2CH1h9s","2fjW33W7424","aVnl9QCfNyE","DKhnmUdmz74","tPW70sgrHiY","W4kR8OQCrlQ","-WtcXpnMIT8","YhRZx2QIJKg","5iaYFN5eERA","KChJyERIclc","xsRWpK4pf90","lOq-Q60zWQs","ywNZPURFJNU","dMH0bHeiRNg","tvkh29RKFRY"]
7AVHXe-ol-s internmarket    603 Music   264 60349673    3.16    1033    594 ["trrRo3kGRv0","CCsGkN1PEec","l5mQKJDO-nY","DP_4rQcU0G8","7xz5aOvP2YA","kCjKIus3iBA","nD0MILEXZP0","IQcyLMa716k","d05KUL8aW4E","szeeHnu2DM8","7b_wObF6vgs","UazqVaOg9uc","LRBD9l3Nv6Y","108YigkYFgM","_qCau44yfX0","NkkGj4_1m9A","kuyRSrklGU8","uLr7IJYyNnM","IEuyk_277xY","XmAaYo-CQNw"]
ePyRrb2-fzs 1988basti   826 Music   204 45984219    4.87    51039   27065   ["fm0T7_SGee4","h8oBykb_Pqs","iWg3IMN_rhU","MdOAr_4FJvc","xNp7OxgFJJM","nzaw_Gk9BAU","u9rl4apDFfY","fMzMm4sJYGo","b5eEa5a2Rio","aNR0tW_afdk","Kj_MceyjS6g","GojTUmjxVHU","CGPUuPHdHQg","SIM4DCn7AlE","ktUSIJEiOug","Dn6kGzRSjhU","gxxU68z8quM","HRSrFm_8YK8","2__Qdd11rfA","XewBty95JVg"]
xsRWpK4pf90 universalmusicgroup 902 Music   240 44614530    4.73    99373   53441   ["a4X7eFbP3u4","4WAapKx2TvM","PgZgubuT2DM","FIUv3dOBbCk","Dk8NQB_T1ws","7NAbTHGIPP8","mekQxrnhQ3I","OouU4PFUw3Y","rwgXvL9o0QQ","4eeyhtlJp5A","ktUSIJEiOug","UZStqFeYk3Y","sF84pIhP5UM","FKXm3Qg7sBo","a1ti0KccvOg","ERV-wh4VwZI","iWg3IMN_rhU","i1PKmEaUqes","gsAN_Zfc1nc","CmBCLWkrysY"]
ktUSIJEiOug aliciakeys  953 Music   251 43583367    4.85    103074  59672   ["_VD-PDzmW4M","vmjl5ARtQWM","X_SmJpAmirw","glBHQSasVMQ","IGj3T7C-RdE","6U_4ANczMPY","oh3Pkw6cokk","j97WOHviXbE","FqsGSvrcfDE","2PWfB4lurT4","Yci8zVNMEdg","sF84pIhP5UM","dDfEjBSWj54","Zdm7FJktDOM","a4X7eFbP3u4","jhPAK8HjcPI","ePyRrb2-fzs","8_8KJfSNgC4","kN9vm95SocU","4DC4Rb9quKk"]
b3u65f4CRLk srcrecords  729 Music   256 43511791    4.76    55148   27818   ["4fZF9UsS8LY","5f7kfhHHH_A","H1vaszd6NnA","6Di-dAIR9RA","HiBcQeIax4g","JVciSFc5Wrw","Md6rURKhZmA","IG5ReXP0SSg","D9g2szHsoz0","EflOarvoeVs","7PxBGHjABnU","Gjq1g3j7WFc","mDvCcrU-Ob8","V9YE8dHMxvw","iWg3IMN_rhU","Un2dbprtZFE","WpgMuHwAdC4","RtQ5JENdx5I","ZEYgAgKVuO4","D_2jb8D8AOI"]
iWg3IMN_rhU TimbalandMusic  853 Music   216 43323757    4.8 58761   36142   ["SIM4DCn7AlE","GojTUmjxVHU","ePyRrb2-fzs","2__Qdd11rfA","P_TKpULdDvo","Dn6kGzRSjhU","1iLDIj0pDHk","xsRWpK4pf90","XewBty95JVg","y4jiyjDQGLY","h8oBykb_Pqs","xNp7OxgFJJM","b3u65f4CRLk","S-783rzHIS4","cZd1Js0QaOI","kZGEgVxyPHU","43o0vwAmFM8","BMMUWvavORI","v_-1peCW6Ok","9gkjyM7ZOUc"]
innfyQZHPpo chai0322    468 Music   210 41564032    2.61    13564   6126    ["nhSZs-aAZbo","hh0nVc0NYTE","12Z3J1uzd0Q","V1s9queYhF8","1Al4crLW9EM","61e1h4vALS0","8h98jb9Lk74","Z-HjmP7BCVc","o_pIQIV_NuU","82BDV1gYjdM","Af9aPlG2WFw","cC27PTFP4V8","-_CSo1gOd48","00c08ijIlHo","cIKxlWuTviE","dMH0bHeiRNg","iWg3IMN_rhU","v3ARyAb_1Bs","5P6UU6m3cqk","QjA5faZF1A8"]
Lt6o8NlrbHg seankingston    867 Music   257 41171303    4.74    101352  67579   ["qwflMOAaOf0","aVB5ViG_0yE","_QmajsQI9SM","Skz6h2gb-t8","t2dkCGDgVzk","HJgsbsMSe5w","BXhN2DbI0hc","sFJQAfW_jgw","RZJ32D-lrTE","U1bf7XLGGRE","ktUSIJEiOug","TWEez0U-9ag","BhkIjh-nuRo","a4X7eFbP3u4","b3u65f4CRLk","sm2fTDpuyyM","9pzro0T8rgY","TBzfqR4mrgE","c2cyose42jU","iWg3IMN_rhU"]
QjA5faZF1A8 guitar90    308 Music   320 40294882    4.83    329108  169563  ["r2BOApUvFpw","ATub40Npxik","m7Jh1BV1EOc","owAj5LiXG5w","Ddn4MGaS3N4","GxplDa3M5Io","by8oyJztzwo","aZpD0btOZx8","pZ9jrBg4Lwc","heISA256CRo","6wpPk8qk3uQ","i4BYMvVvMg0","2xjJXT0C0X4","p-VR-cXghko","-9ao_vOsZkg","wfqu4YEufYc","WetVXbYRfWk","JdxkVQy7QLM","fdjamy75C4A","dVUgd8ot6BE"]
Time taken: 111.631 seconds, Fetched: 10 row(s)

4.6 筛选出每个类别中评分最高的前10个视频

select * from youtube_category where categoryId="Music" order by ratings desc limit 10;

结果如下:

hive> select * from youtube_category where categoryId="Music" order by ratings desc limit 10;
Query ID = hadoop_20170711093838_6ab5573a-964f-486e-b0dc-77e5853fcfb5
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1499153664137_0119, Tracking URL = http://master:8088/proxy/application_1499153664137_0119/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0119
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2017-07-11 09:39:27,903 Stage-1 map = 0%,  reduce = 0%
...
2017-07-11 09:40:36,298 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 57.41 sec
MapReduce Total cumulative CPU time: 57 seconds 410 msec
Ended Job = job_1499153664137_0119
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 57.41 sec   HDFS Read: 607706243 HDFS Write: 2957 SUCCESS
Total MapReduce CPU Time Spent: 57 seconds 410 msec
OK
QjA5faZF1A8 guitar90    308 Music   320 40294882    4.83    329108  169563  ["r2BOApUvFpw","ATub40Npxik","m7Jh1BV1EOc","owAj5LiXG5w","Ddn4MGaS3N4","GxplDa3M5Io","by8oyJztzwo","aZpD0btOZx8","pZ9jrBg4Lwc","heISA256CRo","6wpPk8qk3uQ","i4BYMvVvMg0","2xjJXT0C0X4","p-VR-cXghko","-9ao_vOsZkg","wfqu4YEufYc","WetVXbYRfWk","JdxkVQy7QLM","fdjamy75C4A","dVUgd8ot6BE"]
cQ25-glGRzI RCARecords  742 Music   227 77674728    4.45    188154  186858  ["otMB3WVQNVg","CAoo71VEYhs","6gqSk1UBFyo","pULa7F1gWZM","C2eUCfREAbw","neZw2CH1h9s","2fjW33W7424","aVnl9QCfNyE","DKhnmUdmz74","tPW70sgrHiY","W4kR8OQCrlQ","-WtcXpnMIT8","YhRZx2QIJKg","5iaYFN5eERA","KChJyERIclc","xsRWpK4pf90","lOq-Q60zWQs","ywNZPURFJNU","dMH0bHeiRNg","tvkh29RKFRY"]
pv5zWaTEVkI OkGo    531 Music   184 32022043    4.83    121516  39974   ["dMH0bHeiRNg","x49WZRyXGe0","RbdbVhBGETQ","yRmqZRPgK1w","DjCL0_0Il7w","gq7r3F1SoX0","k66epna2Sss","1LNbzqoOPu4","vr3x_RRJdd4","ERV-wh4VwZI","xsRWpK4pf90","iAQZ_uui1SY","5P6UU6m3cqk","bav63MWNUKg"]
ktUSIJEiOug aliciakeys  953 Music   251 43583367    4.85    103074  59672   ["_VD-PDzmW4M","vmjl5ARtQWM","X_SmJpAmirw","glBHQSasVMQ","IGj3T7C-RdE","6U_4ANczMPY","oh3Pkw6cokk","j97WOHviXbE","FqsGSvrcfDE","2PWfB4lurT4","Yci8zVNMEdg","sF84pIhP5UM","dDfEjBSWj54","Zdm7FJktDOM","a4X7eFbP3u4","jhPAK8HjcPI","ePyRrb2-fzs","8_8KJfSNgC4","kN9vm95SocU","4DC4Rb9quKk"]
Lt6o8NlrbHg seankingston    867 Music   257 41171303    4.74    101352  67579   ["qwflMOAaOf0","aVB5ViG_0yE","_QmajsQI9SM","Skz6h2gb-t8","t2dkCGDgVzk","HJgsbsMSe5w","BXhN2DbI0hc","sFJQAfW_jgw","RZJ32D-lrTE","U1bf7XLGGRE","ktUSIJEiOug","TWEez0U-9ag","BhkIjh-nuRo","a4X7eFbP3u4","b3u65f4CRLk","sm2fTDpuyyM","9pzro0T8rgY","TBzfqR4mrgE","c2cyose42jU","iWg3IMN_rhU"]
xsRWpK4pf90 universalmusicgroup 902 Music   240 44614530    4.73    99373   53441   ["a4X7eFbP3u4","4WAapKx2TvM","PgZgubuT2DM","FIUv3dOBbCk","Dk8NQB_T1ws","7NAbTHGIPP8","mekQxrnhQ3I","OouU4PFUw3Y","rwgXvL9o0QQ","4eeyhtlJp5A","ktUSIJEiOug","UZStqFeYk3Y","sF84pIhP5UM","FKXm3Qg7sBo","a1ti0KccvOg","ERV-wh4VwZI","iWg3IMN_rhU","i1PKmEaUqes","gsAN_Zfc1nc","CmBCLWkrysY"]
K2cYWfq--Nw FrEckleStudios  841 Music   224 17005186    4.82    90991   50788   ["SyIC3Munnyw","cZd1Js0QaOI","lLYD_-A_X5E","alqM0IYeH54","wQVEPFzkhaM","oGECJP3phyY","nPLOiBM8hLk","VpBDqtUEWcM","MJPdVVOmbz4","bPZJYQXQsm8","nPBmXEO3yUU","3jzSh_MLNcY","_EXeBLvmllg","Cva_sGN_0VA","Sr2JneittqQ","SYpYmkcadRA","71eBjNbdXDg","DgBgnoEY4iM","bl6RJyZdBSU","FAK_jtOf70g"]
-xEzGIuY7kw alyankovic  580 Music   171 24095019    4.84    90091   45517   ["HYokLWfqbaU","N26KWq7MmSc","JCAt9WcCFbM","Rt1_6uz_sVU","Nh9mVsBKwYs","p9Zt8mn14hY","8n7ncJEFuSw","FT060JGp9sQ","E6Zc9NyYH-k","ixyTNd-Ln38","zIllRdSzSug","GsfVw9xxoNY","XkDeJgGrtdU","fqz1ojIQTBk","5GE82tqcYYQ","ODdGhOOUOpI","v3ARyAb_1Bs","XbVtbc_XzrI","U1ULxKM75rY","Jw00EUh0GT4"]
EwTZ2xpQwpA TayZonday   796 Music   292 16841569    4.23    83514   129200  ["2x2W12A8Qow","P6dUCOS1bM0","NattlyH0IeM","nTQOpibv_OA","9mSKBgvHdoE","hjD6iigdB-g","caIBKOztlAo","xUz2YMmiq0k","aWY3eYOX3U0","mD5_GUovjiM","1oFS-q8BIps","deXAEN70CDY","0pElTyjfxe0","eyDuGwlrFRs","m6SjPfc_xNA","qYGvGWY1FDs","N0amCfgnwY8","JPu4uErBFks","pgSA-ErKd8c","ZZgGGlOGyUg"]
xWHf_vYZzQ8 universalmusicgroup 771 Music   262 25607299    4.79    83419   70529   ["jvz0bvYmnto","zElEs8yw7fw","9FcBnaLjxY4","95wgKdSJGDo","ueOgZPBXY4A","k3O5uy-MBBk","O3sGTaQ9s9c","aT434G38OBg","4PdDPrwIwhI","Y-VifE8EK8w","9wpxno6qUd0","B17SYROT3GI","wJtUuxmm-B0","LBDnkJ5h1ho","CfzpBjKpSPE","QF2AmC2xyXM","eoa6Gx4HxTc","mvPvcV44rCc"]
Time taken: 109.053 seconds, Fetched: 10 row(s)

4.7 找出用户中上传视频最多的10个用户的所有视频

思路:

  • 筛选出用户表中上传视频数最多的前十位
    select * from user order by videos desc limit 10;
  • 将筛选结果跟youtube3进行join得出结果
    select b.*,a.videos,a.friends from (select * from user order by videos desc limit 10) a join youtube3 b on a.uploader = b.uploader order by views desc limit 20;

结果如下:

hive> select b.*,a.videos,a.friends from (select * from user order by videos desc limit 10) a join youtube3 b on a.uploader = b.uploader order by views desc limit 20;
Query ID = hadoop_20170711101616_d9ebb69e-7fbe-4f09-ad55-54f4dfd11ebe
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1499153664137_0125, Tracking URL = http://master:8088/proxy/application_1499153664137_0125/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0125
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-07-11 10:17:30,594 Stage-1 map = 0%,  reduce = 0%
2017-07-11 10:18:03,439 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.03 sec
2017-07-11 10:18:25,025 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 10.28 sec
MapReduce Total cumulative CPU time: 10 seconds 280 msec
Ended Job = job_1499153664137_0125
Stage-8 is filtered out by condition resolver.
Stage-9 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
17/07/11 10:18:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Execution log at: /tmp/hadoop/hadoop_20170711101616_d9ebb69e-7fbe-4f09-ad55-54f4dfd11ebe.log
2017-07-11 10:18:33 Starting to launch local task to process map join;  maximum memory = 518979584
2017-07-11 10:18:44 Dump the side-table for tag: 0 with group count: 10 into file: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-11_10-16-50_653_2968482866464413222-1/-local-10007/HashTable-Stage-6/MapJoin-mapfile90--.hashtable
2017-07-11 10:18:44 Uploaded 1 File to: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-11_10-16-50_653_2968482866464413222-1/-local-10007/HashTable-Stage-6/MapJoin-mapfile90--.hashtable (611 bytes)
2017-07-11 10:18:44 End of local task; Time Taken: 11.244 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 3 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1499153664137_0126, Tracking URL = http://master:8088/proxy/application_1499153664137_0126/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0126
Hadoop job information for Stage-6: number of mappers: 4; number of reducers: 0
2017-07-11 10:19:25,285 Stage-6 map = 0%,  reduce = 0%
...
2017-07-11 10:20:26,162 Stage-6 map = 100%,  reduce = 0%, Cumulative CPU 44.09 sec
MapReduce Total cumulative CPU time: 44 seconds 90 msec
Ended Job = job_1499153664137_0126
Launching Job 4 out of 5
Number of reduce tasks determined at compile time: 1
Starting Job = job_1499153664137_0127, Tracking URL = http://master:8088/proxy/application_1499153664137_0127/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0127
Hadoop job information for Stage-3: number of mappers: 2; number of reducers: 1
2017-07-11 10:21:01,746 Stage-3 map = 0%,  reduce = 0%
2017-07-11 10:21:30,462 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 4.27 sec
2017-07-11 10:21:48,968 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 6.36 sec
MapReduce Total cumulative CPU time: 6 seconds 360 msec
Ended Job = job_1499153664137_0127
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 10.28 sec   HDFS Read: 9796823 HDFS Write: 434 SUCCESS
Stage-Stage-6: Map: 4   Cumulative CPU: 44.09 sec   HDFS Read: 574641054 HDFS Write: 2133846 SUCCESS
Stage-Stage-3: Map: 2  Reduce: 1   Cumulative CPU: 6.36 sec   HDFS Read: 2144607 HDFS Write: 6335 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 0 seconds 730 msec
OK
lUOe76YPY7M expertvillage   801 ["Howto","Style"]   119 1014983 3.36    596 689 ["5cEsa-rswrg","HOoQPbcF8Rk","ht3DMDx1spo","VFnf17cp3Eg","IP35rm_31RQ","hXuiuwZPX0M","s8pH5jgSuP8","KIEkwm_6DD4","ovFSahIPaVQ","rQtwcSDh3Hk","COeT3WR7SFc","ZNLBKRpWt7w","qncPBZ9DRRk","tOBqg3yDlyE","YgfWUXsbr7w","qBB3AsgjN1M","XX8ouWx6xm8","BtF9dYRuiGU","NRanI5qR81I","evfl0NGLjVE"]   86228   5659
2Aj2wx9PYxI expertvillage   815 ["Howto","Style"]   170 918553  3.82    1596    1533    ["hb5QaCfm7bg","YVDPSZe21KI","C1elpMjZ4wE","8gD1RG-tnNk","d-IDrRSVkCQ","IAOr6SYGAyw","3mbx03mP5eg","o-pN8qAiZhQ","K_qw03-3gFg","JgsO3A_vbtk","ffKr98Ium-M","WKYcBKa8_7Q","dgeX8CZmqvo","jgUCs72RDHs","PaNhROW-wEo","WhRCVm-1r2k","5I5O8P-r5Rk","reJSIZ3ugsE","-51iHvqP0Rs","9wItsn3r_kc"]   86228   5659
VBGHer3yFyc libertaddigitaltv   998 ["News","Politics"] 29  828616  4.68    1337    10763   ["HL9p8I3lOFA","Fb3HZ7K1gTk","AARX6XOA5h0","t3DPDKbRxio","VAmz8MNZdlk","WrC6_uBe4eA","d9X8DYOA5DE","pZuZv8UK7YA","NwswzwoA4pA","g1J2gvjp4fs","3SMhrQZdABc","h6M42Il4kN8","izM_JwEkWPs","_FM_IoPIFq4","uMZCCPtl_Do","5cZFinVFubQ","om0iHwO4d6o","lUZxlXkbaxM","iSJ3qKpC-3o","tRdq9_Si0Ws"]   6874    1
07jnqD8wvyE expertvillage   776 ["Howto","Style"]   217 574365  4.55    1073    1526    ["3wNpOp50uGI","-taU9d26wT4","_kly-fVUi1I","kZtRmnly_sU","LKe50AJvhz8","0GAUnuuBkW4","eM6Bpz6-dio","i9xf62PKC5M","mwFfk7igYC0","ilLTA4p4MMw","qyoLuTjguJA","aC-KOYQsIvU","0V6LWQZjYRk","_uZkvzYEXp0","auQbi_fkdGE","qgiUSEpg8Xc","4I79yc3uKfE","7Lz-XGjynN8","taHOwsdQXWI","2Aj2wx9PYxI"]   86228   5659
NDwZcLGekE8 expertvillage   801 ["Howto","Style"]   94  491582  4.35    553 588 ["CEmYX76UJy0","0sF-PoXR9ZU","qDfoNYpozxk","T4P3yp6mFfY","rzcj_Wq6BIE","A2LTVhqHAdo","4gyH0mJPqY8","m3PsQCMz3sY","wckYj_PAhgM","YkvwdM9Xstc","Ph2UAjGfeB4","vT_uXPfSAVE","89mAouUu8YM","0t5pPZ4AXX0","sLNFN75CYbw","QQGHXEPfMcQ","p6gcz4hkOmw","qk6R-X-YmUw","NR_9VlHFOu8","0gA_3BAxtVM"]   86228   5659
CEmYX76UJy0 expertvillage   801 ["Howto","Style"]   90  422284  4.48    448 459 ["NDwZcLGekE8","jRAKgd50Jh0","0sF-PoXR9ZU","IjjJWXgqWL4","m3PsQCMz3sY","8Di58l1dfPA","DpEVqy954OY","dmYMeImNy1s","qk6R-X-YmUw","wckYj_PAhgM","qZSD3JLPURk","YkvwdM9Xstc","fdSFh_z0HJY","n5LDyWyHJTo","Dmm5O1DPQYc","A2LTVhqHAdo","I2mM0r76b7o","iE16Z9Gh3TQ","rzcj_Wq6BIE","8szPfWiK-ag"]   86228   5659
E6VWi7IaroA expertvillage   946 ["Howto","Style"]   228 338889  4.46    300 251 ["fLy3M2jfe4w","cC1FBmbYgMk","xEJhInSxsQg","RxMLoqzKZ8s","Z9a--4RQ9O8","7neOXKLrgzk","JS-lS9ngMtY","aKIFfgPLY8I","EsqPNhtkWIo","NJUPNtZN2rc","vV7pajY0zOk","GTtZapaRDbg","-0UVDD6ZwYE","B9jNaUP59Vo","2QDGPe50E4Y","lJDZO_pX4aw","wuD2CemlvP0","ZTx2ZFI1spg","CzZghYdupiQ","5ueEKSJ8J-U"]   86228   5659
0sF-PoXR9ZU expertvillage   801 ["Sports"]  77  326365  4.21    512 663 ["24FJ58Q22D4","m3PsQCMz3sY","qk6R-X-YmUw","CEmYX76UJy0","NDwZcLGekE8","QiIunH47qew","I2mM0r76b7o","c_qSPdLPReM","5eYBaQ5Cg-c","fdSFh_z0HJY","8szPfWiK-ag","Dmm5O1DPQYc","A2LTVhqHAdo","n-Cvzk84X8o","T4P3yp6mFfY","1vo3CCBIcaY","fnGFR1AFCJc","n5LDyWyHJTo","YkvwdM9Xstc","emNfLmLTQb0"]    86228  5659
ohdiRHLSw5Y expertvillage   447 ["Howto","Style"]   78  321521  3.79    139 102 ["12_nJamoyTk","dXLhjYgMZ68","ukzFwRoFj4k","sOUgrJo2kIg","HhO39nCDfMg","3iVP0tzwhVc","MFNA0PqLynY","ZlcMzLhiBjg","7F8ajh_DDYs","6vaPIz6S6sk","-L_uhGXOtF0","-libzR5AV58","CSdSH7XKTtQ","KZX5jXPAWIk","Vkj5fbrQpNs","YCgnFIk5Acg","hjPDmVf6KJw","BZnhMl85dq4","iuqVpMdb1NM","GO2_3q6euug"]   86228   5659
0sF-PoXR9ZU expertvillage   801 ["Howto","Style"]   77  294737  4.21    438 599 ["24FJ58Q22D4","m3PsQCMz3sY","CEmYX76UJy0","NDwZcLGekE8","A2LTVhqHAdo","qk6R-X-YmUw","5eYBaQ5Cg-c","fdSFh_z0HJY","jGdXcitOUzY","n-Cvzk84X8o","wckYj_PAhgM","I2mM0r76b7o","QMZaxtjhZ5k","Dmm5O1DPQYc","vT_uXPfSAVE","8szPfWiK-ag","QiIunH47qew","YkvwdM9Xstc","DpEVqy954OY","c_qSPdLPReM"]   86228   5659
3Xc-kxSznZ0 expertvillage   430 ["Howto","Style"]   107 292329  3.33    565 964 ["9YAKfkVvvEc","wEkcPjBaHjs","zMl3ixv1kHw","Q4ZUPEgbzu0","-jIEGZwLPvo","YxdxGLBCCRA","Jp0LaJ_ftT0","PgCRWvwdFHM","yZL2MOFk6I0","qDJk3-ofbk0","e4nHAAuDiPE","cuRk9bTbU6A","z53cPMciVio","MREw0dIWaQ0","237AQAvXtw8","a3dnOupmt7Y","3-va5g_QVss","mFdaiY8YhEY","D6li4tYmKf0","bEVHBHKqBfQ"]   86228   5659
VxYkdycN3WE expertvillage   801 ["Howto","Style"]   77  257702  4.19    153 137 ["ZpCE7mnYJK0","NYDdCGtbr8E","904S9W9AZg8","Rl5KEODnCfI","7-i2gl9288I","TC1XlhRwhsU","rD_4nCKf5Ns","pFKZZ120fyU","oN4Esih54V8","N1Ov6cCUXkc","rDM97S4jPiw","IK2uLNND3i0","OZXbM6kRuuA","5pvyXkFTa18","bExUJAnCLZw","VoREkZvkyI8","BWhfpzAItx4","EEybT3IeyzA","QteH0KOJx0g","dbXHvd_SGkk"]   86228   5659
-libzR5AV58 expertvillage   447 ["Howto","Style"]   391 249682  3.34    73  62  ["-L_uhGXOtF0","4LvTYBbgMjE","-wO07skEauo","CSdSH7XKTtQ","kzQDoXKDgTM","e1kzLj-FZ6k","Guhgb3pWRAQ","KIbyySDSGEc","DGUBzs-GNxE","KZX5jXPAWIk","ohdiRHLSw5Y","6vaPIz6S6sk","5MrAuq_F2dU","gOv-yPqZ3LE","BZnhMl85dq4","oCDdDCqygOg","12_nJamoyTk","7F8ajh_DDYs","HhO39nCDfMg","X3-6IT-7S80"]   86228   5659
HL9p8I3lOFA libertaddigitaltv   998 ["News","Politics"] 17  235313  4.61    114 581 ["VBGHer3yFyc","Fb3HZ7K1gTk","Z3Pe8ff37gk","t3DPDKbRxio","pZuZv8UK7YA","_FM_IoPIFq4","1P_EQjd7lq4","yEzNx2g7ors","_neKPvLdG8Y","7i_WDprxACU","zCM0nsym0cE","om0iHwO4d6o","h6M42Il4kN8","Bjk1McjrWtg","VAmz8MNZdlk","-t2hwljZmA8","gH6PHFrA6g8","ysAwzxqbWD8","U5lX_EIFOOA","4zu_X_3qHLA"]   6874    1
oE5ZhBHy6Rs expertvillage   776 ["Howto","Style"]   143 234551  4.64    358 280 ["rCWRUtqJgzI","476vNb6thyM","eg3eo8x_9rw","grK43Poye1U","HGncDLMNPRI","iylaKSDfLSc","JFqiDcvRW2Y","XrbXGRLIDTc","NmxoR9lsu-c","JW8FJefw82k","n0Mi550FDng","QXsW44Wh6tY","6oJDYT5MlEA","IJy05ssELhg","TvT-M7tTQHE","GYNVEWCO_X0","5pJ_o19GVRA","q_Nn_CnvD9s","uqpQNkFxVAU","KKP8vWeEc30"]   86228   5659
wOnP_oAXUMA expertvillage   429 ["Howto","Style"]   120 228842  3.56    236 601 ["BlDWdfTAx8o","WF1kRLJ_4B8","6bu9csQC45c","xTAYAl4g7HE","871VTEF7qIY","UheCchftswc","TKg_cdwq9l4","ADQ4Bjq42wE","cSJCDcAKShA","SYklbxHP2tI","fktuPPNkQpQ","L6267-JZu3E","_IwJ3Aj2XrQ","hf-4ppNmH-U","aO8evzfTR8E","E-kNUEv0YgA","Hsr1-xcXFL8","VCiLZMjo_PQ","rMEgKgZOJi8","YeYU_OE5oIU"]   86228   5659
MVPCc4eODys expertvillage   448 ["Howto","Style"]   69  225879  4.29    171 315 ["iPPgmvcsJNw","xc8sysT31mQ","svt_fnKE_80","kBjUDCyDCuI","ysa50-plo48","yutDlQL9Ct0","gCra4qOrjFw","CN1QJDGinwE","dFJjaj7pXsA","LvJzFdcYSag","4I79yc3uKfE","AcryZ1o4RQE","WyyB0M6wAV8","A0sOVKbYR20","8XCyd0XEejc","5BmVKKpu_CU","8_QNzZ9WPRo","pNiX_l-HEGM","CQJSZs-euZU","ZwHHYzK8UCg"]   86228   5659
7neOXKLrgzk expertvillage   946 ["Howto","Style"]   111 215943  4.07    215 314 ["xEJhInSxsQg","PSR5kinMX3Y","E6VWi7IaroA","5ovGW0CzQp4","DeMcNhz0Vz8","CzZghYdupiQ","72EYsQEw35k","tScm-eZInBE","Hz4lnt7d88s","dgLUbXSqwSs","G3P4VFrB_zM","pU0O2-o67C0","pDGP8Q3Zzb8","hQhdK8l6CuY","VK9VflDsunI","LWgzG8I9bWY","csL7WNTWK9U","LE93Q5k5-fI","Uirspi_t3xM","wGLNJK4zcdE"]   86228   5659
AaO1aLwk53Y expertvillage   443 ["Howto","Style"]   156 204450  4.14    210 290 ["Q8YYZGvKM-8","Bn59zha-uAQ","BjnkKuI-YAY","aQeXHXkL6ow","Qtp2ibwd2Ms","nstbrkjk3OM","Odj4ATUPh9w","v2mm2-9JQ-I","328YVJVtMKU","B-jzkyKxDLo","yB-yGNxNEZg","b0msCCnzmNA","sfh5AJhRto8","SEILiSkGzrY","jm4uxgVDU6o","svOBjuvvy-k","y_1nj8fBQOE","zb8ArkvxOxo","VgWayeJdV0w","Yor1dHH6orE"]   86228   5659
m3PsQCMz3sY expertvillage   801 ["Howto","Style"]   99  203912  4.38    177 207 ["CEmYX76UJy0","0sF-PoXR9ZU","NDwZcLGekE8","qk6R-X-YmUw","DpEVqy954OY","Ph2UAjGfeB4","fdSFh_z0HJY","YkvwdM9Xstc","8szPfWiK-ag","Dmm5O1DPQYc","T4P3yp6mFfY","I2mM0r76b7o","7GtVwKZBf9U","mugebyUqK6o","qDfoNYpozxk","XA7dRqkp8YA","a28QcMXjHyI","n5LDyWyHJTo","ovoNU_CbmfA","Q2j6MlACIoc"]   86228   5659
Time taken: 300.414 seconds, Fetched: 20 row(s)

4.8 筛选出每个类别中观看数Top10

** select a.* from (select videoId,categoryId,views,row_number() over(partition by categoryId order by views desc) rank from youtube_category) a where rank<=10;**

hive> select a.* from (select videoId,categoryId,views,row_number() over(partition by categoryId order by views desc) rank from youtube_category) a where rank<=10;
Query ID = hadoop_20170713170101_76ffdd80-e72c-4c7f-b9ad-9b8c3f7e17ab
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 3
Starting Job = job_1499153664137_0143, Tracking URL = http://master:8088/proxy/application_1499153664137_0143/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0143
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 3
2017-07-13 17:01:52,690 Stage-1 map = 0%,  reduce = 0%
2017-07-13 17:02:24,374 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 10.49 sec
...
2017-07-13 17:03:08,803 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 89.07 sec
MapReduce Total cumulative CPU time: 1 minutes 29 seconds 70 msec
Ended Job = job_1499153664137_0143
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 3  Reduce: 3   Cumulative CPU: 89.07 sec   HDFS Read: 73070026 HDFS Write: 7487 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 29 seconds 70 msec
OK
qruSOZq-wJg Activism    2673823 1
tFxk7glmMbo Activism    1063928 2
dYN278GB2Kg Activism    831836  3
n-Akqik3hME Activism    778987  4
LtqRAYWjz2Q Activism    768422  5
2v-whrgAbf4 Activism    742172  6
ja4d02s0WKQ Activism    733014  7
IpAOYskH1s8 Activism    714580  8
ieuVTOJyLBs Activism    640994  9
6CXBgbP_ZKs Activism    562163  10
12Z3J1uzd0Q Animation   65341925    1
bFytHZHFXhA Animation   38937813    2
sdUUx5FdySs Animation   16151661    3
vQbYvjmfbr4 Animation   12114231    4
rNKefRRvV7g Animation   11884311    5
6HIavxnUHls Animation   11735507    6
gm5DHKI8o5o Animation   9381674 7
316BF17k5d8 Animation   9372064 8
ZuQMn6Z0k_w Animation   9195588 9
6B26asyGKDo Animation   8915176 10
dMH0bHeiRNg Comedy  79897120    1
5P6UU6m3cqk Comedy  42525795    2
Tx1XIm6q4r4 Comedy  38378910    3
Q5im0Ssyyus Comedy  21170471    4
k66epna2Sss Comedy  17944367    5
AYxu_MQSTTY Comedy  15665340    6
pYak2F1hUYA Comedy  15634357    7
_OBlgSz8sSM Comedy  14567743    8
wCF3ywukQYA Comedy  14227802    9
sHzdsFiBbFc Comedy  13606292    10
Mz9BlgiV45I Education   2807232 1
VnGb6UQvwJs Education   2441432 2
j6lADdhmAec Education   2306689 3
pYfTQpsErsM Education   2066429 4
7GNE6AhL0qI Education   2037257 5
NiGZf15W9xc Education   1867126 6
-FeAK-q5Cok Education   1741504 7
N5P6o2YaqVI Education   1656512 8
Dl-agXA63nw Education   1529566 9
JX3VmDgiFnY Education   1373008 10
cQ25-glGRzI Music   77674728    1
7AVHXe-ol-s Music   60349673    2
ePyRrb2-fzs Music   45984219    3
xsRWpK4pf90 Music   44614530    4
ktUSIJEiOug Music   43583367    5
b3u65f4CRLk Music   43511791    6
iWg3IMN_rhU Music   43323757    7
innfyQZHPpo Music   41564032    8
Lt6o8NlrbHg Music   41171303    9
QjA5faZF1A8 Music   40294882    10
qruSOZq-wJg Nonprofits  2673823 1
tFxk7glmMbo Nonprofits  1063928 2
dYN278GB2Kg Nonprofits  831836  3
n-Akqik3hME Nonprofits  778987  4
LtqRAYWjz2Q Nonprofits  768422  5
2v-whrgAbf4 Nonprofits  742172  6
ja4d02s0WKQ Nonprofits  733014  7
IpAOYskH1s8 Nonprofits  714580  8
ieuVTOJyLBs Nonprofits  640994  9
6CXBgbP_ZKs Nonprofits  562163  10
SkELRp4wKPs Politics    12730681    1
AFFQrUyi8-s Politics    11953598    2
YgW7or1TuFk Politics    8516433 3
JgiGrXpOhYg Politics    8211043 4
up5jmbSjWkw Politics    7869023 5
hr23tpWX8lM Politics    7209885 6
Kje7NUNebL8 Politics    6707868 7
I4u3449L5VI Politics    6675798 8
jjXyqcx-mYY Politics    6462051 9
a9Vde3FHMmc Politics    6028642 10
8h98jb9Lk74 UNA 33880568    1
AR5yq0aMxI8 UNA 18409445    2
DH7FVB15EPU UNA 15487752    3
Qvfx6UiAAG0 UNA 13977592    4
n-FdoZdfpmE UNA 12241555    5
MddPeH1DAvY UNA 12211192    6
sI8Sus_KRpY UNA 10460047    7
14oqm9ywLIw UNA 10344366    8
LytRWuhn4BM UNA 9899270 9
iSByZARPJSU UNA 9316897 10
Oro28yg7W74 Gaming  727015  1
0to1HDxtkM4 Gaming  529346  2
M-w-gLVfi30 Gaming  396471  3
i6aH2F1WjsY Gaming  382350  4
EgbUSsblCSQ Gaming  266718  5
7SMnWr9MqwQ Gaming  210147  6
NQMBIRipp5A Gaming  207783  7
vi1lVqJSbsM Gaming  186709  8
aId2hK4pI2M Gaming  186054  9
tWPWcyLdrko Gaming  181979  10
sLGLum5SyKQ Howto   31121122    1
eMhGpzyFdhE Howto   16320501    2
KPOOWvP_dd8 Howto   11131527    3
91wuBqlny50 Howto   8579613 4
mr5ghuaTK14 Howto   8361812 5
GfPJeDssBOM Howto   6101232 6
6gmP4nk0EOE Howto   4970382 7
mM-30cmM33s Howto   4936417 8
STQ3nhXuuEM Howto   4655542 9
XZGgeGHU1Bs Howto   4424698 10
LU8DDYz68kM Pets    27721690    1
epUk3T2Kfno Pets    10352882    2
z3U0udLH974 Pets    9461084 3
kkT7A3jegBc Pets    9269896 4
TZ860P4iTaM Pets    9009434 5
7tRWRSfcDuQ Pets    8538635 6
Qit3ALTelOo Pets    7939352 7
Zi9GOvR3Ynw Pets    7351184 8
Kxa0mnDj0bs Pets    7289545 9
PadauuWF94w Pets    6271287 10
W1czBcnX1Ww Science 3234852 1
D99NHb6B03s Science 3176792 2
tk_F2Y-F2kE Science 3121903 3
nhyH7lQ6D2k Science 2879861 4
JCbKv9yiLiQ Science 2672391 5
U5vs2ly_grk Science 2611389 6
M0ODskdEPnQ Science 2555284 7
p4ebtj1jR7c Science 2536109 8
8wTlureUMP8 Science 2477804 9
NZNTgglPbUA Science 2230729 10
vt4X7zFfv4k Sports  12598542    1
P-bWsOK-h98 Sports  12101588    2
OS5tQvQOB-Y Sports  9047732 3
P9LmHXXWiJs Sports  7415813 4
NIKdK-T-jZM Sports  7175403 5
euMu1SKi-ak Sports  7040023 6
zKQgTiqhPbw Sports  7017699 7
yeXoxNP8_xY Sports  6422039 8
q8t7iSGAKik Sports  6312667 9
COcczatkNP4 Sports  6258611 10
sLGLum5SyKQ Style   31121122    1
eMhGpzyFdhE Style   16320501    2
KPOOWvP_dd8 Style   11131527    3
91wuBqlny50 Style   8579613 4
mr5ghuaTK14 Style   8361812 5
GfPJeDssBOM Style   6101232 6
6gmP4nk0EOE Style   4970382 7
mM-30cmM33s Style   4936417 8
STQ3nhXuuEM Style   4655542 9
XZGgeGHU1Bs Style   4424698 10
ZeBd_F2Bz5Y Vehicles    8623041 1
ju6t-yyoU8s Vehicles    7325889 2
uUurALr_Ckk Vehicles    7258142 3
WShY1ObPvhQ Vehicles    7047156 4
q3idQKi5EqM Vehicles    6470816 5
9JWywFpZkg4 Vehicles    6465830 6
tth9krDtxII Vehicles    6394563 7
npTRXr4Sgxg Vehicles    5450317 8
aCamHfJwSGU Vehicles    5354163 9
S-ppGiwc0wQ Vehicles    5039415 10
LU8DDYz68kM Animals 27721690    1
epUk3T2Kfno Animals 10352882    2
z3U0udLH974 Animals 9461084 3
kkT7A3jegBc Animals 9269896 4
TZ860P4iTaM Animals 9009434 5
7tRWRSfcDuQ Animals 8538635 6
Qit3ALTelOo Animals 7939352 7
Zi9GOvR3Ynw Animals 7351184 8
Kxa0mnDj0bs Animals 7289545 9
PadauuWF94w Animals 6271287 10
ZeBd_F2Bz5Y Autos   8623041 1
ju6t-yyoU8s Autos   7325889 2
uUurALr_Ckk Autos   7258142 3
WShY1ObPvhQ Autos   7047156 4
q3idQKi5EqM Autos   6470816 5
9JWywFpZkg4 Autos   6465830 6
tth9krDtxII Autos   6394563 7
npTRXr4Sgxg Autos   5450317 8
aCamHfJwSGU Autos   5354163 9
S-ppGiwc0wQ Autos   5039415 10
v3ARyAb_1Bs Blogs   31812447    1
5GE82tqcYYQ Blogs   30209692    2
ervaMPt4Ha0 Blogs   23859297    3
uWow42TCwzg Blogs   22389389    4
-_CSo1gOd48 Blogs   21176701    5
D2kJZOfq7zk Blogs   20458159    6
nhSZs-aAZbo Blogs   20423158    7
GuMMfgWhm3g Blogs   18634621    8
4jbkRGPxvaM Blogs   15980887    9
hMnk7lh9M3o Blogs   13553069    10
LpAI8TzQDes Entertainment   65078772    1
244qR7SvvX0 Entertainment   57790943    2
1uwOL4rB-go Entertainment   39883413    3
w2xUzv6iZWo Entertainment   25222946    4
vr3x_RRJdd4 Entertainment   25093671    5
lj3iNxZ8Dww Entertainment   23737579    6
RB-wUgnyGv0 Entertainment   23067889    7
lsO6D1rwrKc Entertainment   21758300    8
7iYWxfNSjYk Entertainment   19029677    9
5pGJCkCDK5A Entertainment   18858695    10
p0aQvKDA1K0 Events  12239023    1
bNF_P281Uu4 Events  9125026 2
AlPqL7IUT6M Events  6441558 3
J833f9fqWBA Events  4776832 4
xIvIWJbzimo Events  4284770 5
QuTj9a04o-s Events  4053317 6
eejQPUyeNiY Events  3573812 7
3QL97xldoXc Events  3010296 8
r43yCiKlbCo Events  2806995 9
z42fchrzhHY Events  2565257 10
12Z3J1uzd0Q Film    65341925    1
bFytHZHFXhA Film    38937813    2
sdUUx5FdySs Film    16151661    3
vQbYvjmfbr4 Film    12114231    4
rNKefRRvV7g Film    11884311    5
6HIavxnUHls Film    11735507    6
gm5DHKI8o5o Film    9381674 7
316BF17k5d8 Film    9372064 8
ZuQMn6Z0k_w Film    9195588 9
6B26asyGKDo Film    8915176 10
SkELRp4wKPs News    12730681    1
AFFQrUyi8-s News    11953598    2
YgW7or1TuFk News    8516433 3
JgiGrXpOhYg News    8211043 4
up5jmbSjWkw News    7869023 5
hr23tpWX8lM News    7209885 6
Kje7NUNebL8 News    6707868 7
I4u3449L5VI News    6675798 8
jjXyqcx-mYY News    6462051 9
a9Vde3FHMmc News    6028642 10
v3ARyAb_1Bs People  31812447    1
5GE82tqcYYQ People  30209692    2
ervaMPt4Ha0 People  23859297    3
uWow42TCwzg People  22389389    4
-_CSo1gOd48 People  21176701    5
D2kJZOfq7zk People  20458159    6
nhSZs-aAZbo People  20423158    7
GuMMfgWhm3g People  18634621    8
4jbkRGPxvaM People  15980887    9
hMnk7lh9M3o People  13553069    10
W1czBcnX1Ww Technology  3234852 1
D99NHb6B03s Technology  3176792 2
tk_F2Y-F2kE Technology  3121903 3
nhyH7lQ6D2k Technology  2879861 4
JCbKv9yiLiQ Technology  2672391 5
U5vs2ly_grk Technology  2611389 6
M0ODskdEPnQ Technology  2555284 7
p4ebtj1jR7c Technology  2536109 8
8wTlureUMP8 Technology  2477804 9
NZNTgglPbUA Technology  2230729 10
p0aQvKDA1K0 Travel  12239023    1
bNF_P281Uu4 Travel  9125026 2
AlPqL7IUT6M Travel  6441558 3
J833f9fqWBA Travel  4776832 4
xIvIWJbzimo Travel  4284770 5
QuTj9a04o-s Travel  4053317 6
eejQPUyeNiY Travel  3573812 7
3QL97xldoXc Travel  3010296 8
r43yCiKlbCo Travel  2806995 9
z42fchrzhHY Travel  2565257 10
Time taken: 121.381 seconds, Fetched: 250 row(s)

5 总结

上面的8个例子向大家展示Hive中简单和稍微复杂的操作,有的启动一个Job就可以完成,有的则需要启动多个Job才能完成,我们也可以看到,启动的Job越多运行时间就会越长,但是实际工作中的操作只会远比我们所演示要更加复杂,越是复杂的操作就更加需要去优化,来达到减少运行时间的目的,所以下一篇我们来看看Hive的优化实践

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 199,393评论 5 467
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 83,790评论 2 376
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 146,391评论 0 330
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 53,703评论 1 270
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 62,613评论 5 359
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,003评论 1 275
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,507评论 3 390
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,158评论 0 254
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,300评论 1 294
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,256评论 2 317
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,274评论 1 328
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 32,984评论 3 316
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,569评论 3 303
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,662评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,899评论 1 255
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,268评论 2 345
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 41,840评论 2 339

推荐阅读更多精彩内容