长期以来,对应分析(Correspondence analysis ,CA)是分析物种有无或多度数据最受欢迎的工具之一。原始数据首先被转化成一个描述样方对对Pearson 卡方统计量的贡献率的矩阵,将获得的矩阵通过奇异值分解(SVD)技术进行特征根和特征向量的提取。因此,CA的排序结果展示的是样方之间的卡方距离,而不是欧式距离。卡方距离不受零值的影响,因此,CA非常适用于原始的物种多度分析,要求数据非负和同纲量就行。
和PCA一样,正交的CA排序轴所承载的变差(variation)也是按顺序逐步降低,但与PAC不同的是,这里的总变差不是用总方差来表示,而是通过一个叫总惯量(total inertia)的指标来表示。
CA 也有两种类型的标尺。
- 1型标尺:行(样方)是列(物种)的形心。关注的对象,对象之间的距离是卡方距离。一个样方的点靠近一个物种的点,表示物种对于该样方的贡献比较大。
- 2型标尺:列(物种)是行(样方)的形心(centroid)。物种之间的距离是卡方距离。一个物种的点靠近样方,表示该物种在该样方中存在的可能性很大。
Kaiser-cuttman和断棍模型同样适用于CA排序轴的取舍。
# ======================================
# 导入本章所需的程序包
library(ade4)
library(vegan)
library(gclus)
library(ape)
rm(list = ls())
setwd("D:\\Users\\Administrator\\Desktop\\RStudio\\数量生态学\\DATA")
# 导入CSV文件数据
spe <- read.csv("DoubsSpe.csv", row.names=1)
env <- read.csv("DoubsEnv.csv", row.names=1)
spa <- read.csv("DoubsSpa.csv", row.names=1)
# 删除没有数据的样方8
spe <- spe[-8,]
env <- env[-8,]
spa <- spa[-8,]
# 原始物种多度数据的对应分析(CA)
# *******************************
# 计算CA
spe.ca <- cca(spe)
spe.ca
Call: cca(X = spe)
Inertia Rank
Total 1.167
Unconstrained 1.167 26
Inertia is scaled Chi-square
Eigenvalues for unconstrained axes:
CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8
0.6010 0.1444 0.1073 0.0834 0.0516 0.0418 0.0339 0.0288
(Showed only 8 of all 26 unconstrained eigenvalues)
summary(spe.ca) #默认scaling= 2
Call:
cca(X = spe)
Partitioning of scaled Chi-square:
Inertia Proportion
Total 1.167 1
Unconstrained 1.167 1
Eigenvalues, and their contribution to the scaled Chi-square
Importance of components:
CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8 CA9 CA10 CA11 CA12
Eigenvalue 0.601 0.1444 0.10729 0.08337 0.05158 0.04185 0.03389 0.02883 0.01684 0.010826 0.010142 0.007886
Proportion Explained 0.515 0.1237 0.09195 0.07145 0.04420 0.03586 0.02904 0.02470 0.01443 0.009278 0.008691 0.006758
Cumulative Proportion 0.515 0.6387 0.73069 0.80214 0.84634 0.88220 0.91124 0.93594 0.95038 0.959655 0.968346 0.975104
CA13 CA14 CA15 CA16 CA17 CA18 CA19 CA20 CA21 CA22 CA23
Eigenvalue 0.006123 0.004867 0.004606 0.003844 0.003067 0.001823 0.001642 0.001295 0.0008775 0.0004217 0.0002149
Proportion Explained 0.005247 0.004171 0.003948 0.003294 0.002629 0.001562 0.001407 0.001110 0.0007520 0.0003614 0.0001841
Cumulative Proportion 0.980351 0.984522 0.988470 0.991764 0.994393 0.995955 0.997362 0.998472 0.9992238 0.9995852 0.9997693
CA24 CA25 CA26
Eigenvalue 0.0001528 8.949e-05 2.695e-05
Proportion Explained 0.0001309 7.669e-05 2.310e-05
Cumulative Proportion 0.9999002 1.000e+00 1.000e+00
Scaling 2 for species and site scores
* Species are scaled proportional to eigenvalues
* Sites are unscaled: weighted dispersion equal on all dimensions
Species scores
CA1 CA2 CA3 CA4 CA5 CA6
CHA 1.50075 -1.410293 0.26049 -0.307203 0.271777 -0.003465
TRU 1.66167 0.444129 0.57571 0.166073 -0.261870 -0.326590
VAI 1.28545 0.285328 -0.04768 0.018126 0.043847 0.200732
LOC 0.98662 0.360900 -0.35265 -0.009021 -0.012231 0.253429
OMB 1.55554 -1.389752 0.80505 -0.468471 0.471301 0.225409
(......)
Site scores (weighted averages of species scores)
CA1 CA2 CA3 CA4 CA5 CA6
1 2.76488 3.076306 5.3657489 1.99192 -5.07714 -7.80447
2 2.27540 2.565531 1.2659130 0.87538 -1.89139 -0.13887
3 2.01823 2.441224 0.5144079 0.79436 -1.03741 0.56015
4 1.28485 1.935664 -0.2509482 0.76470 0.54752 0.10579
(......)
summary(spe.ca, scaling=1)
Call:
cca(X = spe)
Partitioning of scaled Chi-square:
Inertia Proportion
Total 1.167 1
Unconstrained 1.167 1
Eigenvalues, and their contribution to the scaled Chi-square
Importance of components:
CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8 CA9 CA10 CA11 CA12
Eigenvalue 0.601 0.1444 0.10729 0.08337 0.05158 0.04185 0.03389 0.02883 0.01684 0.010826 0.010142 0.007886
Proportion Explained 0.515 0.1237 0.09195 0.07145 0.04420 0.03586 0.02904 0.02470 0.01443 0.009278 0.008691 0.006758
Cumulative Proportion 0.515 0.6387 0.73069 0.80214 0.84634 0.88220 0.91124 0.93594 0.95038 0.959655 0.968346 0.975104
CA13 CA14 CA15 CA16 CA17 CA18 CA19 CA20 CA21 CA22 CA23
Eigenvalue 0.006123 0.004867 0.004606 0.003844 0.003067 0.001823 0.001642 0.001295 0.0008775 0.0004217 0.0002149
Proportion Explained 0.005247 0.004171 0.003948 0.003294 0.002629 0.001562 0.001407 0.001110 0.0007520 0.0003614 0.0001841
Cumulative Proportion 0.980351 0.984522 0.988470 0.991764 0.994393 0.995955 0.997362 0.998472 0.9992238 0.9995852 0.9997693
CA24 CA25 CA26
Eigenvalue 0.0001528 8.949e-05 2.695e-05
Proportion Explained 0.0001309 7.669e-05 2.310e-05
Cumulative Proportion 0.9999002 1.000e+00 1.000e+00
Scaling 1 for species and site scores
* Sites are scaled proportional to eigenvalues
* Species are unscaled: weighted dispersion equal on all dimensions
Species scores
CA1 CA2 CA3 CA4 CA5 CA6
CHA 1.93586 -3.71167 0.79524 -1.06393 1.19669 -0.01694
TRU 2.14343 1.16888 1.75759 0.57516 -1.15306 -1.59651
VAI 1.65814 0.75094 -0.14555 0.06277 0.19306 0.98127
LOC 1.27267 0.94983 -1.07661 -0.03124 -0.05385 1.23887
OMB 2.00654 -3.65761 2.45774 -1.62244 2.07523 1.10190
BLA 1.28617 -3.89487 -1.46646 0.27497 -0.46548 -1.62514
HOT -0.70838 -0.13563 0.03428 -0.33249 -1.68537 0.65900
TOX -0.23836 -1.15198 -1.75354 1.46935 -2.58533 0.44908
VAN 0.01724 -0.25092 -1.76067 0.73427 0.55774 -1.90211
CHE 0.01391 0.36998 -1.06276 -1.86417 0.81585 0.81679
BAR -0.43036 -0.79135 -0.15048 0.59208 -0.69219 0.50384
(......)
Site scores (weighted averages of species scores)
CA1 CA2 CA3 CA4 CA5 CA6
1 2.14343 1.168878 1.7575907 0.575155 -1.153061 -1.59651
2 1.76398 0.974804 0.4146591 0.252762 -0.429551 -0.02841
3 1.56461 0.927572 0.1684981 0.229368 -0.235605 0.11459
(......)
第一轴有一个很大的特征根。在CA里面,如果特征根超过0.6,代表数据结构梯度明显。第一轴特征根占总惯量多少比例呢?需要注意的是,两类标尺下,特征根一样。标尺的选择,只影响特征向量,不影响特征根。
#尺下,特征根一样。标尺的选择,只影响特征向量,不影响特征根。
# 绘制每轴的特征根和方差百分比
(ev2 <- spe.ca$CA$eig)
evplot(ev2)
CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8 CA9
6.009926e-01 1.443709e-01 1.072938e-01 8.337321e-02 5.157826e-02 4.184649e-02 3.388638e-02 2.882547e-02 1.684112e-02
CA10 CA11 CA12 CA13 CA14 CA15 CA16 CA17 CA18
1.082639e-02 1.014213e-02 7.885549e-03 6.123133e-03 4.867260e-03 4.606481e-03 3.843808e-03 3.067492e-03 1.823032e-03
CA19 CA20 CA21 CA22 CA23 CA24 CA25 CA26
1.641868e-03 1.295163e-03 8.775034e-04 4.217149e-04 2.148505e-04 1.527935e-04 8.948679e-05 2.695049e-05
> evplot(ev2)
#这里,断棍模型比Kaiser-Guttman准则更保守。无论是数量分析结果、还是
#条形图都显示第一轴占绝对优势。
# CA双序图
# *********
par(mfrow=c(1,2))
# 1型标尺:样方点是物种点的形心
plot(spe.ca, scaling=1, main="鱼类多度CA双序图(1型标尺)")
# 2型标尺(默认):物种点是样方点的形心
plot(spe.ca, main="鱼类多度CA双序图(2型标尺)")
1 型标尺更适合解释样方之家的关系和样方的梯度排列;2型标尺更适合解释物种之间的关系和梯度分布。
CA排序中被动加入环境因子
plot(spe.ca, main="鱼类多度CA双序图(2型标尺)")
# CA排序中被动加入环境因子
# 调用最后生成CA结果对象(2型标尺)
spe.ca.env <- envfit(spe.ca, env)
plot(spe.ca.env)
# 这个命令的目的是在最后双序图加入环境变量
#新加入的环境变量信息对解读双序图是否有帮助?
基于CA排序结果的数据表格重排
vegemite(spe, spe.ca)
2322222222211 11 1 11 11 1
40867235190985976604547312231
PCH .53122..14...................
BBO 155244..3421.................
BCO .53234..3411.................
GRE 255454.135211................
ANG .54232..24211..1.............
ABL 5555552355532..2.............
ROT .52112.12221.2...............
BOU .54233..35322..1.............
PSO 134233..25211..11............
CAR .54123..23111..11............
HOT 111113..22221..1.............
GAR 255455115555254211...........
SPI .51111..23323..41............
TAN .54354..4342131112.11........
BAR .33245..45423..32...21.......
GOU 154345.2554422.1211121.......
PER .52114..342134.211.2.........
BRO .43243.1352114.111.2.1.1.....
TOX .21.12..22233..44............
CHE 23423411243132522221311.11...
VAN .32123.1232225.3512.3.1......
LOC ..1111..11253234554554551432.
BLA .........1.11..25...43.....2.
VAI ........11133314344545454445.
CHA ............1..12...33..12.2.
OMB .........1..1..1....24..12.3.
TRU .........1..12.23314455535553
sites species
29 27
#当前输出的表格与传统的群落数据表格排列方式相反,现在以行为物种,以#列为样方。物种排列顺序和样方排列顺序依赖于排序轴的方向(其实是任
#意的)。可以发现,单纯基于第一轴的结果重新排列数据表格,并没有达
#到最佳的效果。因为第二轴所反映的上游(样方1-10)到中游(样方11-18)#的梯度,以及这些样方的特征种,在这个表格里并没有聚集,而是分散的。
使用函数CA()进行对应分析
# ************************
source("CA.R") #导入CA.R脚本,此脚本必须在当前工作目录下或给路径
spe.CA.PL <- CA(spe)
biplot(spe.CA.PL, cex=1)
# 用CA第一轴排序结果重新排列数据表格
# 重新排列数据表格与vegemite()输出的结果一样
summary(spe.CA.PL)
Length Class Mode
total.inertia 1 -none- numeric
eigenvalues 26 -none- numeric
rel.eigen 26 -none- numeric
rel.cum.eigen 26 -none- numeric
U 702 -none- numeric
Uhat 754 -none- numeric
F 754 -none- numeric
Fhat 702 -none- numeric
V 702 -none- numeric
Vhat 754 -none- numeric
site.names 29 -none- character
sp.names 27 -none- character
color.sites 1 -none- character
color.sp 1 -none- character
call 2 -none- call
t(spe[order(spe.CA.PL$F[,1]),order(spe.CA.PL$V[,1])])
24 30 28 26 27 22 23 25 21 29 20 19 18 5 9 17 16 6 10 4 15 14 7 3 11 12 2 13 1
PCH 0 5 3 1 2 2 0 0 1 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
BBO 1 5 5 2 4 4 0 0 3 4 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
BCO 0 5 3 2 3 4 0 0 3 4 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
GRE 2 5 5 4 5 4 0 1 3 5 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ANG 0 5 4 2 3 2 0 0 2 4 2 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
ABL 5 5 5 5 5 5 2 3 5 5 5 3 2 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0
ROT 0 5 2 1 1 2 0 1 2 2 2 1 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
BOU 0 5 4 2 3 3 0 0 3 5 3 2 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
PSO 1 3 4 2 3 3 0 0 2 5 2 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
CAR 0 5 4 1 2 3 0 0 2 3 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
HOT 1 1 1 1 1 3 0 0 2 2 2 2 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
GAR 2 5 5 4 5 5 1 1 5 5 5 5 2 5 4 2 1 1 0 0 0 0 0 0 0 0 0 0 0
SPI 0 5 1 1 1 1 0 0 2 3 3 2 3 0 0 4 1 0 0 0 0 0 0 0 0 0 0 0 0
TAN 0 5 4 3 5 4 0 0 4 3 4 2 1 3 1 1 1 2 0 1 1 0 0 0 0 0 0 0 0
BAR 0 3 3 2 4 5 0 0 4 5 4 2 3 0 0 3 2 0 0 0 2 1 0 0 0 0 0 0 0
GOU 1 5 4 3 4 5 0 2 5 5 4 4 2 2 0 1 2 1 1 1 2 1 0 0 0 0 0 0 0
PER 0 5 2 1 1 4 0 0 3 4 2 1 3 4 0 2 1 1 0 2 0 0 0 0 0 0 0 0 0
BRO 0 4 3 2 4 3 0 1 3 5 2 1 1 4 0 1 1 1 0 2 0 1 0 1 0 0 0 0 0
TOX 0 2 1 0 1 2 0 0 2 2 2 3 3 0 0 4 4 0 0 0 0 0 0 0 0 0 0 0 0
CHE 2 3 4 2 3 4 1 1 2 4 3 1 3 2 5 2 2 2 2 1 3 1 1 0 1 1 0 0 0
VAN 0 3 2 1 2 3 0 1 2 3 2 2 2 5 0 3 5 1 2 0 3 0 1 0 0 0 0 0 0
LOC 0 0 1 1 1 1 0 0 1 1 2 5 3 2 3 4 5 5 4 5 5 4 5 5 1 4 3 2 0
BLA 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 2 5 0 0 0 4 3 0 0 0 0 0 2 0
VAI 0 0 0 0 0 0 0 0 1 1 1 3 3 3 1 4 3 4 4 5 4 5 4 5 4 4 4 5 0
CHA 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 2 0 0 0 3 3 0 0 1 2 0 2 0
OMB 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 2 4 0 0 1 2 0 3 0
TRU 0 0 0 0 0 0 0 0 0 1 0 0 1 2 0 2 3 3 1 4 4 5 5 5 3 5 5 5 3
奇异值分解(SVD)详解及其应用
奇异值分解(SVD)原理详解及推导
奇异值的物理意义是什么?
对应分析中总惯量的意义是什么?
排序--3--CA对应分析Correspondence analysis