HiveSQL核心技能之常用函数

目标：
1、掌握hive基础语法、常用函数及其组合使用
2、掌握一些基本业务指标的分析思路与实现技巧

一、基础语法

1、SELECT ... A... FORM ...B... WHERE ...C...

1）某次经营活动中，商家发起了“异性拼团购”，试着针对某个地区的用户进行推广，找出匹配用户。

"选出城市在北京，性别位女的10个用户名"
select user_name
from user_info
where city='beijing' and sex='female'
limit 10;

注意：如果该表是一个分区表，则where条件中必须对分区字段进行限制

2）选出在2018年12月31日，购买的商品品类是food的用户名、购买数量、支付金额

select user_name
    ,piece
    ,pay_amount
from user_trade
where dt='2018-12-31' and goods_category='food';

2、GROUP BY（分类汇总）

3）试着对本公司2019年第一季度商品的热度与价值度进行分析。
"2019年一月到三月，每个品类有多少人购买，累计金额是多少"

    SELECT goods_category
        ,count(user_name) as user_num
        ,sum(pay_amount) as total_amount
    from user_trade
    WHERE dt between '2019-01-01' and '2019-03-31'
    group by goods_category

常用的聚会函数：
1、count():计数  count(distinct...)去重计数
2、sum():求和
3、avg():平均值
4、max():最大值
5、min():最小值

GROUP BY ... HAVING（分类汇总过滤）

4）找出在2019年4月支付金额超过5万元的用户，给这些VIP用户赠送优惠券

    SELECT user_name,sum(pay_amount) as total_amount
    from user_trade
    WHERE dt between '2019-04-01' and '2019-04-30'
    group by user_name
    HAVING sum(pay_amount)>50000;

3、ORDER BY(排序)

5）2019年4月，支付金额最多的TOP5用户

    SELECT user_name,sum(pay_amount) as total_amount
    from user_trade
    WHERE dt between '2019-04-01' and '2019-04-30'
    group by user_name
    order by total_amount desc
    limit 5;
    （由于执行顺序的原因，ORDER BY 后面要使用重新定义的列名进行排序，具体执行顺序参考MySQL）

    ASC：升序（默认） DESC：降序
    对多个字段进行排序：ORDER BY A ASC, B DESC，每个字段都要指定升序或者降序

二、常用函数

查看Hive中的函数：show functions；
查看具体函数的用法：1）desc function 函数名；2）desc function extended函数名；

1、如何把时间戳转化为日期（from_unixtime）？

6）

SELECT pay_time,
            from_unixtime(pay_time,'yyyy-MM-dd hh:mm:ss')
    from user_trade
    WHERE dt='2019-04-09'

from_unixtime(bigint unixtime,string format):将时间戳转化为指定格式的日期
format：
1.yyyy-MM-dd hh:mm:ss
2.yyyy-MM-dd hh
3.yyyy-MM-dd hh:mm
4.yyyyMMdd
拓展：把日期转化为时间戳：unix——timestamp(string date)

2、如何计算日期间隔？（datediff 函数）

7）用户的首次激活时间，与2019年5月1日的日期间隔。

SELECT user_name
         ,datediff('2019-05-01',to_date(firstactivetime))
 from user_info
 limit 10;

datediff(string enddate,string startdate):结束日期减去开始日期的天数

拓展:日期增加函数（增加天数）、减少函数（减少天数） —— date_add、date_sub（类型要是string类型的）
date_add（string startdate,int days)、date_sub（string startdate,int days)

3、条件函数（case when 和 if 函数）

case when 函数

8）统计以下四个年龄段20岁以下、20-30岁、30-40岁、40岁以上的用户数

SELECT
        case 
        when age<20 then '20岁以下'
        when  age<30 then '20-30岁'
        when  age<40 then '30-40岁'
        else '40岁以上' end as age_type,
        count(distinct  user_id) as user_num
    from user_info
    group by ( case 
        when age<20 then '20岁以下'
        when  age<30 then '20-30岁'
        when  age<40 then '30-40岁'
        else '40岁以上' end )

case when 的时候不用加group by，只有在聚合函数的时候才需要group by

if 函数

9）统计每个性别的用户等级高低分布情况（假设level大于5为高级，注意列名不能使用中文的）

SELECT sex,
       if(level>5,'高级','低级') as level_type,
       count(distinct user_id) as user_num
from user_info
group by sex,
if(level>5,'高级','低级');

4、字符串函数（substr 和 substring 函数）

10）分析每个月都拉新情况

    select substring(firstactivetime,1,7) as month
            ,count(distinct user_id) as user_num
    from user_info
    group by substring(firstactivetime,1,7);

substring(stringA,INT start,int len),substr(stringA,INT start,int len)，截取起始位置和截取长度

json文本格式解析函数

extra1需要解析json字段，然后用$.key取出想要获取的value;
extra2使用的是中括号加引号的方式进行字段提取和分组；
两种写法要看字段类型的不同采取不同的方式

11）求不同手机品牌的用户数

    extra1(string):
    {"systemtype":"ios","education":"master","marriage_status":"1","phonebrand":"iphoneX"}
    
    extra2(map<string,string>):
    {"systemtype":"ios","education":"master","marriage_status":"1","phonebrand":"iphone X"}



第一种情况：

 select get_json_object(extra1,'$.phonebrand') as phone_brand
     ,count(distinct user_id) as user_num
 from user_info
 group by  get_json_object(extra1,'$.phonebrand');


第二种情况：

 select extra2["phonebrand"] as phone_brand
     ,count(distinct user_id) as user_num
 from user_info
 group by  extra2["phonebrand"];


 extra1需要解析json字段，然后用$.key取出想要获取的value;
 extra2使用的是中括号加引号的方式进行字段提取和分组；
 两种写法要看字段类型的不同采取不同的方式

5、聚合统计函数

12）ELLA用户的2018年的平均每次支付金额，以及2018年最大的支付日期和最小的支付日期的间隔

SELECT  AVG(pay_amount),
        max(from_unixtime(pay_time,'yyyy-MM-dd')),
        min(from_unixtime(pay_time,'yyyy-MM-dd')),
        datediff(max(from_unixtime(pay_time,'yyyy-MM-dd')), min(from_unixtime(pay_time,'yyyy-MM-dd')))
from user_trade
WHERE substr(dt,1,4)='2018' and user_name='ELLA'


#如果要使用user_name，则要多加group by进行分组，having后用不了本题的限制条件

SELECT  user_name,
        AVG(pay_amount),
        max(from_unixtime(pay_time,'yyyy-MM-dd')),
        min(from_unixtime(pay_time,'yyyy-MM-dd'))
from user_trade
WHERE substr(dt,1,4)='2018'
GROUP BY user_name

三、重点练习

13）2018年购买的商品品类在两个以上的用户数

   步骤一：

    SELECT user_name,count(distinct goods_category)
    from user_trade
    where substr(dt,1,4)='2018'
    group by user_name
    having count(distinct goods_category)>=2;


   步骤二： 

    select count(a.user_name)
    from
        (SELECT user_name,count(distinct goods_category)
        from user_trade
        where substr(dt,1,4)='2018'
        group by user_name
        having count(distinct goods_category)>=2) as a

步骤总结：
1、先求出每个人购买的商品品类书
2、筛选出购买商品品类书大于2的用户
3、统计符合条件的用户有多少个

14）用户激活时间在2018年，年龄段在20-30岁和30-40岁的婚姻情况分布

错误方法（也能得出正确的结果，只是计算方式会比较慢）： 
    select b.age_type,count(distinct b.user_id),if(b.marriage_status=1,"已婚","未婚")   
    from
    (select user_id
            ,user_name
            ,(case 
                when age<20 then "20岁以下"
                when age>=20 and age<30 then "20-30岁"
                when age>=30 and age<40 then "30-40岁"
                else '40岁以上'
                end) as age_type
            ,extra2["marriage_status"] as marriage_status
    from user_info
    where year(firstactivetime)='2018' 
    group by user_id
            ,user_name
            ,(case 
            when age<20 then "20岁以下"
            when age>=20 and age<30 then "20-30岁"
            when age>=30 and age<40 then "30-40岁"
            else '40岁以上'
            end)
            ,extra2["marriage_status"])as b
    where age_type in ("20-30岁", "30-40岁")
    group by b.age_type,if(b.marriage_status=1,"已婚","未婚")


正确方式：
    select b.age_type,count(distinct b.user_id),if(b.marriage_status=1,"已婚","未婚")     
    from
         (select user_id
            ,user_name
            ,(case 
                when age<20 then "20岁以下"
                when age>=20 and age<30 then "20-30岁"
                when age>=30 and age<40 then "30-40岁"
                else '40岁以上'
                end) as age_type
            ,extra2["marriage_status"] as marriage_status
        from user_info
        where year(firstactivetime)='2018' )as b
    where age_type in ("20-30岁", "30-40岁")
    group by b.age_type,if(b.marriage_status=1,"已婚","未婚")

步骤总结：
1、先选出激活时间在2018年的用户，并把他们所在的年龄段计算好，并提取出婚姻状况；
如何select后面没有进行聚合，则可以不用使用group by，直接使用where进行过滤就可以；
2、取出年龄段在20-30岁和30-40岁的用户，把他们的婚姻状况转义成可理解的说明；
3、聚合计算，针对年龄段，婚姻状况的聚合

最后编辑于：2021.01.28 17:46:40

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 202,529评论 5赞 475
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,015评论 2赞 379
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 149,409评论 0赞 335
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,385评论 1赞 273
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,387评论 5赞 364
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,466评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,880评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,528评论 0赞 256
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,727评论 1赞 295
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,528评论 2赞 319
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,602评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,302评论 4赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,873评论 3赞 306
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,890评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,132评论 1赞 259
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,777评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,310评论 2赞 342