R和SPSS计算的卡方值和p值不一样，WHY

有下面这个数据
分A和B两组人群
下面4行是不同疾病患病数

image.png

# 首先我们建立一个dataframe
dat <- data.frame(low=c(13,7,21,6),
                  high=c(77,22,21,71))
# 而A组总共有66个样本，B组有128个样本
total_no <- c(66,128)

# 先以dat第一行建立一个四格表
#  low high
#  13   77
#  53   51
tmp <- chisq.test(rbind(dat[1,], total_no-dat[1,]))
# 提取卡方和p值
tmp$statistic
tmp$p.value

image.png

# 其实可以手动计算另外3行，但是想试一试循环
# 先建立一个空的向量
k <- rep(NA, 4)
p <- rep(NA, 4)  
# 接下来开始循环
for (i in c(1:4)) {
  a <- chisq.test(rbind(dat[i,], total_no-dat[i,]))
  k[i] <- a$statistic
  p[i] <- a$p.value
}

results <- rbind(k,p)
results

最后得到结果

image.png

故事还没有结束。。。。
用SPSS做出的结果和R的结果有出入

而R做出来的卡方值是

image.png

为什么？为什么？

寻找原因

R的数值录入有问题？

所以重新录入，模仿SPSS
使用t()函数对数据进行转化

image.png

dat <- data.frame(low=c(13,7,21,6),
                 high=c(77,22,21,71))
total_no <- c(66,128)

# 在这步加入t()转换
tmp <- chisq.test(t(rbind(dat[1,], total_no-dat[1,])))
tmp$statistic
tmp$p.value

但是结果依旧是

image.png

R和SPSS的参数不同？

查看R的帮助文档，发现蛛丝马迹

image.png

原来有一个叫Yates Correction的东西在搞鬼（主要是我的统计知识太菜）
再次跑R

image.png

bingo！和SPSS的卡方值一样了

Yates Correction是什么东西

以下参考：
https://www.statisticshowto.datasciencecentral.com/what-is-the-yates-correction/

为什么要用yates correction？

The Yates correction is a correction made to account for the fact that both Pearson’s chi-square test and McNemar’s chi-square test are biased upwards for a 2 x 2 contingency table. An upwards bias tends to make results larger than they should be. If you are creating a 2 x 2 contingency table that uses either of these two tests, the Yates correction is usually recommended, especially if the expected cell frequencies are below 10 (some authors put that figure at 5).

Chi2 tests are biased upwards when used on 2 x 2 contingency tables. The reason is that the statistical Chi2 distribution is continuous and the 2 x 2 contingency table is dichotomous (in other words, it isn’t continuous, there are two variables). All you really need to know is that if your expected cell frequencies are below 10, you probably should be using the Yates correction.

而R默认是使用yates correction，所以有了上面这个故事。

最后编辑于：2019.07.15 23:49:26