常见的调节变量有类别变量和连续变量,两种变量可以互相转换。
类别变量比如性别、年级,可以设置虚拟变量,将其赋值,变为连续变量(比如男性赋值为1,女性赋值为2);
连续变量通过计算均值mean和标准差sd,定义Low为mean-sd,Medium为mean,High为mean+sd,把连续变量分成了三类。
先说类别变量的调节效应,回归方程为Y=β1*X+β2*Z+β3*X*Z+β0+ε。
Y为因变量(连续变量),β为前面的系数,X为自变量(连续变量),ε为随机误差项(不用管)。
#任意方式导入数据
mydata <- read.table(file = "clipboard", header = TRUE)
#构建回归方程
regression <- lm(y ~ x + z + x*z)
summary(regression)
Call:
lm(formula = y ~ x + z + x * z, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-16.7361 -4.4863 -0.1334 5.5795 17.6777
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.1186 10.4104 -0.107 0.914761
x 0.8632 0.2427 3.557 0.000707 ***
z 14.2038 6.3678 2.231 0.029167 *
x:z -0.3906 0.1471 -2.655 0.009950 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.553 on 65 degrees of freedom
Multiple R-squared: 0.2325, Adjusted R-squared: 0.197
F-statistic: 6.562 on 3 and 65 DF, p-value: 0.0006055
计算出结果后,将结果带入回归方程。z是性别,分别赋值为1和2(注:也可以选择0和1)。
Y=0.8632*X+14.2038*Z+(-0.3906)*X*Z+(-1.1186)
再带入性别,得到两个方程:
Y=0.8632*X+14.2038*1+(-0.3906)*X*1+(-1.1186)
Y=0.8632*X+14.2038*2+(-0.3906)*X*2+(-1.1186)
简化后,根据结果画图:
Y=0.4726*X+13.0852
Y=0.082*+27.289
plot(x, y, type = 'n')
abline(13.0852,0.4726)
abline(27.289,0.082,lty=2,col='red')
legend('topright',c('male','female'),lty = c(1,2),col = c('black','red'))
points(mydata$x[mydata$z==1],mydata$y[mydata$z==1],pch=19)
points(mydata$x[mydata$z==2],mydata$y[mydata$z==2],col='red')
连续变量的调节效应方法类似,以年龄为例:
#任意方式导入数据
mydata2 <- read.table(file = "clipboard", header = TRUE)
#构建回归方程(此处z是年龄)
regression2 <- lm(y ~ x + z + x*z)
summary(regression2)
Call:
lm(formula = y ~ x + z + x * z)
Residuals:
Min 1Q Median 3Q Max
-15.8951 -4.8414 0.9287 5.0560 17.5495
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 89.35148 32.28101 2.768 0.00734 **
x -1.38904 0.78072 -1.779 0.07989 .
z -3.49763 1.65353 -2.115 0.03825 *
x:z 0.08439 0.04016 2.101 0.03950 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.751 on 65 degrees of freedom
Multiple R-squared: 0.1917, Adjusted R-squared: 0.1543
F-statistic: 5.137 on 3 and 65 DF, p-value: 0.002997
#计算年龄的均值和标准差
mean(z)
[1] 19.31884
sd(z)
[1] 1.851015
计算出结果后,将结果带入回归方程。
Y=(-1.389)*X+(-3.49763)*Z+0.08439*X*Z+89.35148
再带入年龄,分为三档:
Low = mean - sd = 17.47;
Medium = mean = 19.32;
High = mean + sd = 21.17。
Y=(-1.389)*X+(-3.49763)*17.47+0.08439*X*17.47+89.35148
Y=(-1.389)*X+(-3.49763)*19.32+0.08439*X*19.32+89.35148
Y=(-1.389)*X+(-3.49763)*21.17+0.08439*X*21.17+89.35148
简化后,根据结果画图(因为三类放散点图会很乱,所以就没加):
Y=0.0853*X+28.2479
Y=0.2414*X+21.7772
Y=0.3975*X+15.3067
plot(x,y,type = 'n')
abline(28.2479,0.0853)
abline(21.7772,0.2414,lty=2,col='blue')
abline(15.3067,0.3975,lty=3,col='red')
legend('topright',c('low','medium','high'),lty = c(1,2,3),col = c('black','blue','red'))
参考:
Zhang, Z. & Wang, L. (2017). Advanced statistics using R. [https://advstats.psychstat.org]. Granger, IN: ISDSA Press. ISBN: 978-1-946728-01-2.