背景:有时我们要观察各个分数区间的用户,在各个特征上的表现有无差异。在进行分组时,除了使用PROC FORMAT手工定义区间之外,也可以使用PROC RANK和PROC FORMAT,利用分数(或者其他数据)的分位数等统计量,实现自动化分组排序。
PROC RANK
proc rank data=test out=r_test【输出的数据集】;
var spend【对spend进行排序】;
ranks r_spend【序号变量命名为r_spend】; �
run;
PROC UNIVARIATE
proc univariate data=events noprint;
var neg_score;
output out=p pctlpre=P_【分位数变量名称的前缀为P_】
pctlpts=10 to 100 by 10;
weight SamplingWeight;
run;
proc transpose data=p out=pt;
run;
proc sort data=pt
nodupkey force noequals;
by COL1;
run;
Generating deciles, quartiles, percentiles or other groups from numeric variables. The GROUPS optionis used here to specify the binning. Deciles are created by specifying GROUPS=10, quartiles can be generated by GROUPS=4, and percentiles are created with setting GROUPS=100.