语法
给个链接到之前的文档:如何定义两物种之间基因表达量的保守性(内有lme4包使用说明)
基本的表达式为:resp ~ FEexpr + (REexpr1 | factor1) + (REexpr2 | factor2) + ...
固定效应的截距和斜率不会由于随机效应项的变化而变化
这里介绍以下几种常用的方法:
fm1 <- lmer(Reaction ~ Days + ( Days | Subject), dat)
fm1 <- lmer(Reaction ~ Days + (1 | Subject), dat)
fm1 <- lmer(Reaction ~ Days + (Subject | Days), dat)
fm1 <- lmer(Reaction ~ Days + (1 | Days), dat)
fm1 <- lmer(Reaction ~ Subject + (Days | Subject), dat)
fm1 <- lmer(Reaction ~ Subject + (Subject | Days), dat)
fm1 <- lmer(Reaction ~ Subject + (1 | Subject), dat)
fm1 <- lmer(Reaction ~ Subject + (1 | Days), dat)
要点:
- 固定效应部分写 1 代表只考虑固定效应截距项而不考虑固定效应的斜率;若赋予变量则代表同时考虑固定效应截距项和固定效应的斜率
- 随机效应部分 (随机效应项 | 随机分组因子),随机效应项写 1 代表只考虑固定效应截距项而不考虑固定效应的斜率;若赋予变量则代表同时考虑随机效应截距项和随机效应的斜率;随机分组因子则是对随机效应项进行进一步分组,比方随机分组因子有A,B,C三个水平分组,那么线性模型将会分A,B,C作拟合
fm1 <- lmer(Reaction ~ Subject + (Subject | Subject), dat)
等价于fm1 <- lmer(Reaction ~ Subject + (1 | Subject), dat)
;fm1 <- lmer(Reaction ~ Days+ (Days | Days), dat)
等价于fm1 <- lmer(Reaction ~ Days+ (1 | Days), dat)
例子
其中 Days为连续变量,Subject为因子变量
对应第一个表达式
library(lme4)
data(sleepstudy)
dat = sleepstudy
dat$Subject = as.factor(c(rep('A',60),rep('B',60),rep('C',60)))
fm1 <- lmer(Reaction ~ Days + (Days | Subject), dat)
summary(fm1)
coef(fm1)
> coef(fm1)
$Subject
(Intercept) Days
A 249.2140 8.502512
B 252.3907 11.351088
C 252.6106 11.548257
attr(,"class")
[1] "coef.mer"
表达式fm1 <- lmer(Reaction ~ Days + ( Days | Subject), dat)
的意义是以Days为固定效应(考虑固定效应的截距和斜率),还是以Days为随机效应,即考虑随机效应的截距和斜率,分组的随机因子为Subject;因此这种情况相当于没有随机效应项,只有分组,即分组的固定效应回归模型,没有随机效应的斜率和截距,全是固定效应的斜率和截距
- 当Days = 0,Subject = A 时,Reaction 值为 8.502512 × 0 + 249.2140
- 当Days = 1,Subject = A 时,Reaction 值为 8.502512 × 1 + 249.2140
- 当Days = 2,Subject = A 时,Reaction 值为 8.502512 × 2 + 249.2140
- 当Days = 0,Subject = B 时,Reaction 值为 11.351088 × 0 + 252.3907
- 当Days = 1,Subject = B 时,Reaction 值为 11.351088 × 1 + 252.3907
- 当Days = 2,Subject = B 时,Reaction 值为 11.351088 × 2 + 252.3907
- 当Days = 0,Subject = C 时,Reaction 值为 11.548257 × 0 + 252.6106
- 当Days = 1,Subject = C 时,Reaction 值为 11.548257 × 1 + 252.6106
- 当Days = 2,Subject = C 时,Reaction 值为 11.548257 × 2 + 252.6106
因此它的线性关系需要分组来看:
对应第二个表达式
library(lme4)
data(sleepstudy)
dat = sleepstudy
dat$Subject = as.factor(c(rep('A',60),rep('B',60),rep('C',60)))
fm1 <- lmer(Reaction ~ Days + (1 | Subject), dat)
summary(fm1)
coef(fm1)
> coef(fm1)
$Subject
(Intercept) Days
A 241.1497 10.46729
B 255.8238 10.46729
C 257.2418 10.46729
attr(,"class")
[1] "coef.mer"
表达式fm1 <- lmer(Reaction ~ Days + ( 1 | Subject), dat)
的意义是以Days为固定效应(考虑固定效应的截距和斜率),随机效应项写 1 代表不考虑随机效应的斜率项,考虑其截距项(即将随机因子的影响转换为截距加入到方程中),分组的随机因子为Subject,例如 A 中的截距 241.1497,B中的截距为255.8238 受不同Subject分组而不同,并且它们的截距为固定效应截距与随机效应截距之和,而斜率为固定效应斜率
因此它的线性关系需要分组来看:
对应第三个表达式
library(lme4)
data(sleepstudy)
dat = sleepstudy
dat$Subject = as.factor(c(rep('A',60),rep('B',60),rep('C',60)))
fm1 <- lmer(Reaction ~ Days + (Subject | Days), dat)
summary(fm1)
coef(fm1)
> coef(fm1)
$Days
SubjectB SubjectC (Intercept) Days
0 2.399840e-08 2.920663e-08 251.4051 10.46729
1 4.067253e-08 4.970416e-08 251.4051 10.46729
2 -8.451885e-09 -1.019267e-08 251.4051 10.46729
3 9.618661e-09 1.181750e-08 251.4051 10.46729
4 -9.571652e-09 -1.162255e-08 251.4051 10.46729
5 4.944233e-08 6.047095e-08 251.4051 10.46729
6 -6.091995e-09 -7.603941e-09 251.4051 10.46729
7 2.267801e-08 2.766016e-08 251.4051 10.46729
8 5.934868e-08 7.253005e-08 251.4051 10.46729
9 7.305345e-08 8.927133e-08 251.4051 10.46729
attr(,"class")
[1] "coef.mer"
表达式fm1 <- lmer(Reaction ~ Days + ( Subject | Days), dat)
的意义是以Days为固定效应,随机效应项为因子变量Subject(即将随机效应项的斜率作为影响距加入到方程中,忽略随机效应截距),分组的随机因子为Days,按照Days进行分组计算;由于此时模型将Subject当作因子变量,所以默认SubjectA为参考物,SubjectB和SubjectC分别和SubjectA对比(此时因子变量忽略截距,即忽略随机效应截距);由于Subject为因子变量(此时因子变量忽略截距),截距=251.4051为固定效应的截距,斜率=10.46729为固定效应的斜率,此时随机效应的影响用斜率体现
因此它的线性关系需要分组来看:
- 当Days = 0,Subject = A 时,Reaction 值为 10.46729 × 0 + 251.4051
- 当Days = 0,Subject = B 时,Reaction 值为 10.46729 × 0 + 251.4051 + 2.399840e-08
- 当Days = 0,Subject = C 时,Reaction 值为 10.46729 × 0 + 251.4051 + 2.920663e-08
- 当Days = 1,Subject = A 时,Reaction 值为 10.46729 × 1 + 251.4051
- 当Days = 1,Subject = B 时,Reaction 值为 10.46729 × 1 + 251.4051 + 2.399840e-08
- 当Days = 1,Subject = C 时,Reaction 值为 10.46729 × 1 + 251.4051 + 2.920663e-08
- 当Days = 2,Subject = A 时,Reaction 值为 10.46729 × 2 + 251.4051
- 当Days = 2,Subject = B 时,Reaction 值为 10.46729 × 2 + 251.4051 + 2.399840e-08
- 当Days = 2,Subject = C 时,Reaction 值为 10.46729 × 2 + 251.4051 + 2.920663e-08
对应第四个表达式
library(lme4)
data(sleepstudy)
dat = sleepstudy
dat$Subject = as.factor(c(rep('A',60),rep('B',60),rep('C',60)))
fm1 <- lmer(Reaction ~ Days + (1 | Days), dat)
summary(fm1)
coef(fm1)
> coef(fm1)
$Days
(Intercept) Days
0 251.4051 10.46729
1 251.4051 10.46729
2 251.4051 10.46729
3 251.4051 10.46729
4 251.4051 10.46729
5 251.4051 10.46729
6 251.4051 10.46729
7 251.4051 10.46729
8 251.4051 10.46729
9 251.4051 10.46729
attr(,"class")
[1] "coef.mer"
表达式fm1 <- lmer(Reaction ~ Days + ( 1 | Days), dat)
的意义是以Days为固定效应,随机效应项写 1 代表不考虑随机效应的斜率,而考虑随机效应的截距;此时随机效应的截距等同固定效应的截距,因此此时只有固定效应的斜率和截距,并且随机分组因子为Days,相当于完全不考虑随机效应项,因此结果完全为固定效应的系数(斜率和截距)
- 当Days = 0,Reaction 值为 251.4051 + 10.46729 × 0
- 当Days = 1,Reaction 值为 251.4051 + 10.46729 × 1
- 当Days = 2,Reaction 值为 251.4051 + 10.46729 × 2
- 当Days = 3,Reaction 值为 251.4051 + 10.46729 × 3
- 当Days = 4,Reaction 值为 251.4051 + 10.46729 × 4
- 当Days = 5,Reaction 值为 251.4051 + 10.46729 × 5
- 当Days = 6,Reaction 值为 251.4051 + 10.46729 × 6
- 当Days = 7,Reaction 值为 251.4051 + 10.46729 × 7
- 当Days = 8,Reaction 值为 251.4051 + 10.46729 × 8
- 当Days = 9,Reaction 值为 251.4051 + 10.46729 × 9
对应第五个表达式
library(lme4)
data(sleepstudy)
dat = sleepstudy
dat$Subject = as.factor(c(rep('A',60),rep('B',60),rep('C',60)))
fm1 <- lmer(Reaction ~ Subject + (Days | Subject), dat)
summary(fm1)
coef(fm1)
> coef(fm1)
$Subject
Days (Intercept) SubjectB SubjectC
A 7.680847 250.1575 1.854289 4.273369
B 11.291513 251.7820 1.854289 4.273369
C 11.187903 251.7354 1.854289 4.273369
attr(,"class")
[1] "coef.mer"
表达式fm1 <- lmer(Reaction ~ Subject + (Days | Subject), dat)
的意义是以Subject 为固定效应(由于Subject为因子变量,所以不考虑固定效应的截距),随机效应项为Days(即考虑随机效应的斜率和截距),分组的随机因子为Days,按照Days进行分组计算;由于此时模型将Subject当作因子变量,所以默认SubjectA为参考物,SubjectB和SubjectC分别和SubjectA对比(此时因子变量忽略截距,忽略随机效应截距);因此固定效应只有统一的斜率项:斜率B = 1.854289,斜率C = 4.273369
- 当Days = 0,Subject = A 时,Reaction 值为 7.680847 × 0 + 250.1575
- 当Days = 0,Subject = B 时,Reaction 值为 11.291513 × 0 + 251.7820 + 1.854289
- 当Days = 0,Subject = C 时,Reaction 值为 11.187903 × 0 + 251.7354 + 4.273369
- 当Days = 1,Subject = A 时,Reaction 值为 7.680847 × 1 + 250.1575
- 当Days = 1,Subject = B 时,Reaction 值为 11.291513 × 1 + 251.7820 + 1.854289
- 当Days = 1,Subject = C 时,Reaction 值为 11.187903 × 1 + 251.7354 + 4.273369
- 当Days = 2,Subject = A 时,Reaction 值为 7.680847 × 2 + 250.1575
- 当Days = 2,Subject = B 时,Reaction 值为 1.291513 × 2 + 251.7820 + 1.854289
- 当Days = 2,Subject = C 时,Reaction 值为 11.187903 × 2 + 251.7354 + 4.273369
对应第六个表达式
library(lme4)
data(sleepstudy)
dat = sleepstudy
dat$Subject = as.factor(c(rep('A',60),rep('B',60),rep('C',60)))
fm1 <- lmer(Reaction ~ Subject + (Subject | Days), dat)
summary(fm1)
coef(fm1)
> coef(fm1)
$Days
(Intercept) SubjectB SubjectC
0 257.1725 5.629783 8.06793
1 263.1822 8.705003 11.02711
2 262.9781 8.600570 10.92662
3 273.9903 14.235679 16.34910
4 277.5391 16.051596 18.09649
5 291.3354 23.111346 24.88986
6 292.6354 23.776574 25.52998
7 298.5781 26.817544 28.45621
8 310.3491 32.840869 34.25225
9 319.4526 37.499269 38.73487
attr(,"class")
[1] "coef.mer"
表达式fm1 <- lmer(Reaction ~ Subject + (Subject | Days), dat)
的意义是以Subject 为固定效应,还是以Subject 为随机效应,即考虑随机效应的截距和斜率,分组的随机因子为Days,按照Days分组计算;因此这种情况相当于没有随机效应项,只有分组,即分组的固定效应回归模型,没有随机效应的斜率和截距,全是分组后的固定效应的斜率和截距
- 当 Subject = A 时,Days = 0,Reaction = 257.1725
- 当 Subject = B 时,Days = 0,Reaction = 257.1725 + 5.629783
- 当 Subject = C 时,Days = 0,Reaction = 257.1725 + 8.06793
- 当 Subject = A 时,Days = 1,Reaction = 263.1822
- 当 Subject = B 时,Days = 1,Reaction = 263.1822 + 8.705003
- 当 Subject = C 时,Days = 1,Reaction = 263.1822 + 11.02711
- 当 Subject = A 时,Days = 2,Reaction = 262.9781
- 当 Subject = B 时,Days = 2,Reaction = 262.9781 + 8.600570
- 当 Subject = C 时,Days = 2,Reaction = 262.9781 + 10.92662
对应第七个表达式
当然如果将只考虑Subject的固定效应项
library(lme4)
data(sleepstudy)
dat = sleepstudy
dat$Subject = as.factor(c(rep('A',60),rep('B',60),rep('C',60)))
fm1 <- lmer(Reaction ~ Subject + (1 | Subject), dat)
summary(fm1)
coef(fm1)
> coef(fm1)
$Subject
(Intercept) SubjectB SubjectC
A 284.7213 19.72682 21.63304
B 284.7213 19.72682 21.63304
C 284.7213 19.72682 21.63304
attr(,"class")
[1] "coef.mer"
表达式fm1 <- lmer(Reaction ~ Subject + ( 1 | Subject), dat)
的意义是以Subject 为固定效应,随机效应项写 1 代表不考虑随机效应的斜率,而考虑随机效应的截距;此时随机效应的截距等同固定效应的截距,因此此时只有固定效应的斜率和截距,并且随机分组因子为 Subject,相当于完全不考虑随机效应项,因此结果完全为固定效应的系数(斜率和截距);由于Subject为因子变量,所以默认 Subject A 为参考物,Subject B和Subject C分别与Subject A作比较
- 当 Subject = A 时,Reaction = 284.7213
- 当 Subject = B 时,Reaction = 284.7213 + 19.72682
- 当 Subject = C 时,Reaction = 284.7213 + 21.63304
对应第八个表达式
library(lme4)
data(sleepstudy)
dat = sleepstudy
dat$Subject = as.factor(c(rep('A',60),rep('B',60),rep('C',60)))
fm1 <- lmer(Reaction ~ Subject + (1 | Days), dat)
summary(fm1)
coef(fm1)
> coef(fm1)
$Days
(Intercept) SubjectB SubjectC
0 248.0516 19.72682 21.63304
1 254.9236 19.72682 21.63304
2 255.6824 19.72682 21.63304
3 271.1280 19.72682 21.63304
4 276.0844 19.72682 21.63304
5 293.4914 19.72682 21.63304
6 296.6977 19.72682 21.63304
7 302.4557 19.72682 21.63304
8 318.1192 19.72682 21.63304
9 330.5787 19.72682 21.63304
attr(,"class")
[1] "coef.mer"
表达式fm1 <- lmer(Reaction ~ Subject + ( 1 | Days), dat)
的意义是以Subject 为固定效应,随机效应项写 1 代表不考虑随机效应的斜率,而考虑随机效应的截距(即将随机因子的影响转换为截距加入到方程中),并且随机分组因子为 Days,按Days进行分组计算;由于Subject为因子变量,所以默认 Subject A 为参考物,Subject B和Subject C分别与Subject A作比较;而Days作为连续变量带入对应数值运算;因此结果的截距为固定效应的截距和随机效应的截距之和,斜率为固定效应的斜率:斜率B=19.72682,斜率C=21.63304
- 当 Subject = A 时,Days =0,Reaction = 248.0516
- 当 Subject = A 时,Days =1,Reaction = 254.9236
- 当 Subject = A 时,Days =2,Reaction = 255.6824
- 当 Subject = B 时,Days =0,Reaction = 248.0516 + 19.72682
- 当 Subject = B 时,Days =1,Reaction = 254.9236 + 19.72682
- 当 Subject = B 时,Days =2,Reaction = 255.6824 + 19.72682
- 当 Subject = C 时,Days =0,Reaction = 248.0516 + 21.63304
- 当 Subject = C 时,Days =1,Reaction = 254.9236 + 21.63304
- 当 Subject = C 时,Days =2,Reaction = 255.6824 + 21.63304
复合情形
library(lme4)
data(sleepstudy)
dat = sleepstudy
dat$Subject = as.factor(c(rep('A',60),rep('B',60),rep('C',60)))
fm1 <- lmer(Reaction ~ Subject + (1 | Days) + (1 | Subject), dat)
summary(fm1)
coef(fm1)
> coef(fm1)
$Days
(Intercept) SubjectB SubjectC
0 248.0518 19.72682 21.63304
1 254.9238 19.72682 21.63304
2 255.6826 19.72682 21.63304
3 271.1280 19.72682 21.63304
4 276.0844 19.72682 21.63304
5 293.4914 19.72682 21.63304
6 296.6977 19.72682 21.63304
7 302.4556 19.72682 21.63304
8 318.1190 19.72682 21.63304
9 330.5784 19.72682 21.63304
$Subject
(Intercept) SubjectB SubjectC
A 284.7213 19.72682 21.63304
B 284.7213 19.72682 21.63304
C 284.7213 19.72682 21.63304
attr(,"class")
[1] "coef.mer"
这种随机效应相加的情形相当于将这两项随机效应分开来看,详见第七个表达式和第八个表达式