案例: 使用mysql 进行数据分析
老师指路->https://www.jianshu.com/u/1f32f227da5f
使用工具:MySQL、Navicat
一、准备好订单数据和用户数据:
a、打开命令窗口或者 navicat 、workbench 等数据库软件
b、任意选择一个数据库,或者创建一个新的数据库并进入数据库
create database data charset utf8;
use data;
c、执行以下source 命令,注意改成自己的路径并且不能包含中文
#将.sql文件拖到命令行source后 即可显示文件路径
source C:\Users\data\Desktop\Mysql_case\orderinfo.sql
source C:\Users\data\Desktop\Mysql_case\userinfo.sql
注意:后面不需要加分号,上面的命令会把 **.sql 文件的语句都执行一遍
desc orderinfo;--显示表结构
select * from orderinfo limit 10; #显示10条记录
select count(*) from orderinfo;#显示记录条数
二、题目要求:
1、统计不同月份的下单人数
2、统计用户三月份的回购率和复购率
3、统计男女用户消费频次是否有差异
4、统计多次消费的用户,第一次和最后一次消费间隔是多少天
5、统计不同年龄段,用户的消费金额是否有差异
6、统计消费的二八法则,消费的top20%用户,贡献了多少消费额
涉及表:
orderinfo 订单详情表
| orderid 订单id
| userid 用户id
| isPaid 是否支付
| price 付款价格
| paidTime 付款时间
userinfo 用户信息表
| userid 用户id
| sex 用户性别
| birth 用户出生日期
1、统计不同月份的下单人数
观察数据,过滤未支付数据,去重
不同月份下单人数
select year(paidTime)as year_ ,
month(paidTime) as month_ ,
count(distinct userid) as cons
from orderinfo
where isPaid='已支付'
group by year(paidTime),month(paidTime);
2、统计用户三月份的回购率和复购率
三月份交易量
select count(orderid)as all_order from orderinfo
where month(paidTime)=3;
三月份下单总人数
select
count(distinct userid) as cons
from orderinfo
where month(paidTime)=3;
复购率:当月购买了多次用户占当月用户比例
回购率:上月购买用户中有多少用户本月又再次购买
(回购率:本月购买用户中有多少用户下个月又再次购买)
复购率:
a、先筛选出3月份消费情况
select
*
from orderinfo
where isPaid='已支付' and month(paidTime)=3;
b、统计每个用户在3月份消费了多少次
select
userid, count(1) as cons
from orderinfo
where isPaid='已支付'
and month(paidTime)=3
group by userid;
c、对购买次数进行判断,统计出消费多次的用户数(大于1次)
对用户购买次数进行判断,并sum求和,如果购买次数大于1,用户数加1
复购率:当月购买了多次用户占当月用户比例
select
count(1) as userid_cons,
sum(if (cons>1,1,0)) as fugou_cons,
sum(if (cons>1,1,0))/count(1) as fugou_rate
from(select
userid,
count(1) as cons
from orderinfo
where isPaid='已支付'
and month(paidTime)=3
group by userid
) a;
回购率:本月购买用户中有多少用户下个月又再次购买
计算所有月份回购率
举例:三月份的回购率=3月用户中4月又再次购买的人数/3月用户总数
a、统计每年每月的用户消费情况
#分组:每个月 有哪些用户ID
select
userid,
date_format(paidTime,'%Y-%m-01') as month_dt,
count(1) as cons
from orderinfo
where isPaid='已支付'
group by userid,date_format(paidTime,'%Y-%m-01');
b、相邻月份 进行关联,能关联上的用户为回购
自己与自己关联,用户对应,月份对应
此处使用left join ,保留用户总数;
inner join 会删掉关联不上的用户,三月用户总数会减少
select
*
from(select
userid,
date_format(paidTime,'%Y-%m-01') as month_dt,
count(1) as cons
from orderinfo
where isPaid='已支付'
group by userid,date_format(paidTime,'%Y-%m-01')
) a
left join (select
userid,
date_format(paidTime,'%Y-%m-01') as month_dt,
count(1) as cons
from orderinfo
where isPaid='已支付'
group by userid,date_format(paidTime,'%Y-%m-01')
) b on a.userid=b.userid
and date_sub(b.month_dt,interval 1 month)=a.month_dt;
c、用统计出四月份有关联的 用户数据量 除以 三月份用户数据量
count(a.userid)>count(b.userid) 4月份有null情况
select
a.month_dt,
count(a.userid),
count(b.userid),
count(b.userid)/ count(a.userid) as '回购率'
from(select
userid,
date_format(paidTime,'%Y-%m-01') as month_dt,
count(1) as cons
from orderinfo
where isPaid='已支付'
group by userid,date_format(paidTime,'%Y-%m-01')
) a
left join (select
userid,
date_format(paidTime,'%Y-%m-01') as month_dt,
count(1) as cons
from orderinfo
where isPaid='已支付'
group by userid,date_format(paidTime,'%Y-%m-01')
) b on a.userid=b.userid
and date_sub(b.month_dt,interval 1 month)=a.month_dt
group by a.month_dt;
3、统计男女用户消费频次是否有差异
获取用户 性别 消费次数 男生用户消费总次数/男生人数
a、统计每个用户消费次数、带性别,筛选出性别不为空的用户
select
a.userid,
sex,
count(1) as cons
from orderinfo a
inner join (select * from userinfo where sex!='') b
on a.userid=b.userid
group by a.userid,sex ;
b、对性别做一个消费次数平均计算
select
sex,
avg(cons) as avg_cons
from(select
a.userid,
sex,
count(1) as cons
from orderinfo a
inner join (select * from userinfo where sex!='') b
on a.userid=b.userid
group by a.userid,sex
) c
group by sex;
4、统计多次消费的用户,第一次和最后一次消费间隔是多少天
a、取出多次消费用户
select
userid
from orderinfo
where isPaid='已支付'
group by userid
having count(1)>1;
b、取出第一次与第二次购物时间
计算时间差
select
userid,
min(paidTime),
max(paidTime),
datediff(max(paidTime),min(paidTime)) as jiange
from orderinfo
where isPaid='已支付'
group by userid
having count(1)>1;
5、统计不同年龄段,用户的消费金额是否有差异
a、计算每个用户年龄,并对用户年龄进行分层
0-10:1 11-20:2 21-30:3 31-40 41-50 10岁为一层
ceil 向上取整
select
userid,
birth,
now(),
ceil(timestampdiff(year,birth,now())/10) as age
from userinfo
where birth> '1900-01-01';
b、关联订单信息,获取不同年龄段的消费频次和消费数据
select
a.userid,
age,
count(1) as cons ,
sum(price) as prices
from orderinfo a
inner join (
select
userid,
birth,
now(),
ceil(timestampdiff(year,birth,now())/10) as age
from userinfo
where birth> '1900-01-01'
) b
on a.userid=b.userid
group by b.userid,b.age;
c、对不同年龄层 进行聚合 最终得到不同年龄层消费情况
select
age,
avg(cons),
avg(prices)
from (
select
a.userid,
age,
count(1) as cons ,
sum(price) as prices
from orderinfo a
inner join (
select
userid,
birth,
now(),
ceil(timestampdiff(year,birth,now())/10) as age
from userinfo
where birth> '1900-01-01'
) b
on a.userid=b.userid
group by b.userid,b.age
) c
group by age ;
6、统计消费的二八法则,消费的top20%用户,贡献了多少消费额
用户消费占比,20%用户 消费贡献占80%
a、统计每个用户消费金额,并进行降序排序
select
userid,
sum(price) as total_prices
from orderinfo a
where isPaid='已支付'
group by userid
order by total_prices desc ;
b、统计一共有多少用户,以及用户总消费金额
select
count(1) as cons,
sum(total_prices) as all_prices
from ( select
userid,
sum(price) as total_prices
from orderinfo a
where isPaid='已支付'
group by userid ) b;
c、取出消费前20%用户,进行金额统计
8万用户去20%-17000名
select
count(1) as cons,
sum(total_prices) as all_prices
from (
select
userid,
sum(price) as total_prices
from orderinfo a
where isPaid='已支付'
group by userid
order by total_prices desc
limit 17000) b;
select (2.7/3.1);