/*
以下这段文字出自 https://blog.csdn.net/guodong2k/article/details/79459282
explode与lateral view在关系型数据库中本身是不该出现的,因为他的出现本身就是在操作不满足第一范式的数据(每个属性都不可再分),本身已经违背了数据库的设计原理(不论是业务系统还是数据仓库系统),不过大数据技术普及后,很多类似pv,uv的数据,在业务系统中是存贮在非关系型数据库中,用json存储的概率比较大,直接导入hive为基础的数仓系统中,就需要经过ETL过程解析这类数据,explode与lateral view在这种场景下大显身手。
*/
1. explode
explode是用于处理array、map、struct类型的UDTF,能够将一行记录按要求拆分为多行,达到行转列的效果。
用法是select explode(expression(col)) from table_name; 要求explode的参数必须是map或者array类型。
为了演示用法与效果,建立一张hive外部表laternal_explode_test
CREATE EXTERNAL TABLE laternal_explode_test
(
name STRING,
skill ARRAY<STRING>,
friend MAP<STRING,STRING>,
appearance struct<face:STRING,body:STRING>,
sigil STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '***';
通过文本向表中灌入数据,内容如下
AegonTargaryen|sword,dragon,wife|before:Velaryon|beautiful,strong|blood and fire
TyrionLannister|wisedom,name,bravery|always:Jamie|ugly,short|hear me roar
TonyStark|money,technology,genuis|now:SteveRogers|handsome,powerful|winter is coming
导入数据后,查询该表,发现可以显示全部三条记录
select * from laternal_explode_test;
| laternal_explode_test.name | laternal_explode_test.skill | laternal_explode_test.friend | laternal_explode_test.appearance | laternal_explode_test.sigil |
| AegonTargaryen | ["sword","dragon","wife"] | {"before":"Velaryon"} | {"face":"beautiful","body":"strong"} | blood and fire |
| TyrionLannister | ["wisedom","name","bravery"] | {"always":"Jamie"} | {"face":"ugly","body":"short"} | hear me roar |
| TonyStark | ["money","technology","genuis"] | {"now":"SteveRogers"} | {"face":"handsome","body":"powerful"} | winter is coming |
使用explode拆分部分字段
select explode(skill) as skills from laternal_explode_test;
| skills |
| sword |
| dragon |
| wife |
| wisedom |
| name |
| bravery |
| money |
| technology |
| genuis
select explode(friend) from laternal_explode_test;
| key | value |
| before | Velaryon |
| always | Jamie |
| now | SteveRogers |
select explode(split(sigil,' ')) from laternal_explode_test;
| blood |
| and |
| fire |
| hear |
| me |
| roar |
| winter |
| is |
| coming |
可以看到explode确实有爆炸效果,能够准确快速拆分map和array。
但是继续使用却会发现explode有明显的缺陷:
1.不支持同时select其他字段。如select name,explode(split(sigil,' ')) from laternal_explode_test;是不允许的,会报错。
2.不支持嵌套。select explode(explode(friend)) from laternal_explode_test; 也是不行的。
3.不能与group by、sort by 和cluster by一起使用。
2. lateral view
lateral view专用于与split、explode、stack等UDTF函数结合使用,主要解决在select使用UDTF做查询的过程中查询不能包含其它字段以及多个UDTF的问题。典型用法是:
table a LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)
对于上面的select name,explode(split(sigil,' ')) from laternal_explode_test;如果我们一定要同时select其他字段,可以改写为
select name,cc
from laternal_explode_test t1
lateral view explode(split(sigil,' ')) t2 as cc;
上面语句的执行过程可以简述为,首先通过explode函数将sigil列拆分成多行,再将拆分后的sigil视为一个虚拟表,即t2,这个表只有一列,别名叫cc。最后将多行cc与其对应的name组合起来,select出结果。故其结果是
| name | cc |
+------------------+---------+
| AegonTargaryen | blood |
| AegonTargaryen | and |
| AegonTargaryen | fire |
| TyrionLannister | hear |
| TyrionLannister | me |
| TyrionLannister | roar |
| TonyStark | winter |
| TonyStark | is |
| TonyStark | coming |
lateral view支持连续使用,如以下用法
select name,cc,bb
from laternal_explode_test t1
lateral view explode(split(sigil,' ')) t2 as cc
lateral view explode(skill) t3 as bb;
执行过程是从左到右,t1先和t2组合成一个虚拟表,然后整个虚拟表和t3再组成一个新的虚拟表。结果如下
| name | cc | bb |
+------------------+---------+-------------+
| AegonTargaryen | blood | sword |
| AegonTargaryen | blood | dragon |
| AegonTargaryen | blood | wife |
| AegonTargaryen | and | sword |
| AegonTargaryen | and | dragon |
| AegonTargaryen | and | wife |
| AegonTargaryen | fire | sword |
| AegonTargaryen | fire | dragon |
| AegonTargaryen | fire | wife |
| TyrionLannister | hear | wisedom |
| TyrionLannister | hear | name |
| TyrionLannister | hear | bravery |
| TyrionLannister | me | wisedom |
| TyrionLannister | me | name |
| TyrionLannister | me | bravery |
| TyrionLannister | roar | wisedom |
| TyrionLannister | roar | name |
| TyrionLannister | roar | bravery |
| TonyStark | winter | money |
| TonyStark | winter | technology |
| TonyStark | winter | genuis |
| TonyStark | is | money |
| TonyStark | is | technology |
| TonyStark | is | genuis |
| TonyStark | coming | money |
| TonyStark | coming | technology |
| TonyStark | coming | genuis |
Hive中的函数和用法真是的丰富多彩,每次摸索到了一个新用法,就仿佛是在广阔的宝山中寻得了一件新宝物。