博文

数量生态学笔记||非约束排序||CA

已有 3332 次阅读 2018-8-11 22:33 |系统分类:科研笔记

长期以来，对应分析（Correspondence analysis ，CA）是分析物种有无或多度数据最受欢迎的工具之一。原始数据首先被转化成一个描述样方对对Pearson 卡方统计量的贡献率的矩阵，将获得的矩阵通过奇异值分解（SVD）技术进行特征根和特征向量的提取。因此，CA的排序结果展示的是样方之间的卡方距离，而不是欧式距离。卡方距离不受零值的影响，因此，CA非常适用于原始的物种多度分析，要求数据非负和同纲量就行。

和PCA一样，正交的CA排序轴所承载的变差（variation）也是按顺序逐步降低，但与PAC不同的是，这里的总变差不是用总方差来表示，而是通过一个叫总惯量（total inertia）的指标来表示。

CA 也有两种类型的标尺。

1型标尺：行（样方）是列（物种）的形心。关注的对象，对象之间的距离是卡方距离。一个样方的点靠近一个物种的点，表示物种对于该样方的贡献比较大。
2型标尺：列（物种）是行（样方）的形心（centroid）。物种之间的距离是卡方距离。一个物种的点靠近样方，表示该物种在该样方中存在的可能性很大。

Kaiser-cuttman和断棍模型同样适用于CA排序轴的取舍。

# ======================================
# 导入本章所需的程序包 
library(ade4)
library(vegan)
library(gclus)  
library(ape)
rm(list = ls())
setwd("D:\\Users\\Administrator\\Desktop\\RStudio\\数量生态学\\DATA")
# 导入CSV文件数据 
spe <- read.csv("DoubsSpe.csv", row.names=1)
env <- read.csv("DoubsEnv.csv", row.names=1)
spa <- read.csv("DoubsSpa.csv", row.names=1)
# 删除没有数据的样方8
spe <- spe[-8,]
env <- env[-8,]
spa <- spa[-8,]

# 原始物种多度数据的对应分析(CA)
# *******************************
# 计算CA
spe.ca <- cca(spe)
spe.ca
Call: cca(X = spe)

              Inertia Rank
Total           1.167     
Unconstrained   1.167   26
Inertia is scaled Chi-square 

Eigenvalues for unconstrained axes:
   CA1    CA2    CA3    CA4    CA5    CA6    CA7    CA8 
0.6010 0.1444 0.1073 0.0834 0.0516 0.0418 0.0339 0.0288 
(Showed only 8 of all 26 unconstrained eigenvalues)

summary(spe.ca)        #默认scaling= 2

Call:
cca(X = spe) 

Partitioning of scaled Chi-square:
              Inertia Proportion
Total           1.167          1
Unconstrained   1.167          1

Eigenvalues, and their contribution to the scaled Chi-square 

Importance of components:
                        CA1    CA2     CA3     CA4     CA5     CA6     CA7     CA8     CA9     CA10     CA11     CA12
Eigenvalue            0.601 0.1444 0.10729 0.08337 0.05158 0.04185 0.03389 0.02883 0.01684 0.010826 0.010142 0.007886
Proportion Explained  0.515 0.1237 0.09195 0.07145 0.04420 0.03586 0.02904 0.02470 0.01443 0.009278 0.008691 0.006758
Cumulative Proportion 0.515 0.6387 0.73069 0.80214 0.84634 0.88220 0.91124 0.93594 0.95038 0.959655 0.968346 0.975104
                          CA13     CA14     CA15     CA16     CA17     CA18     CA19     CA20      CA21      CA22      CA23
Eigenvalue            0.006123 0.004867 0.004606 0.003844 0.003067 0.001823 0.001642 0.001295 0.0008775 0.0004217 0.0002149
Proportion Explained  0.005247 0.004171 0.003948 0.003294 0.002629 0.001562 0.001407 0.001110 0.0007520 0.0003614 0.0001841
Cumulative Proportion 0.980351 0.984522 0.988470 0.991764 0.994393 0.995955 0.997362 0.998472 0.9992238 0.9995852 0.9997693
                           CA24      CA25      CA26
Eigenvalue            0.0001528 8.949e-05 2.695e-05
Proportion Explained  0.0001309 7.669e-05 2.310e-05
Cumulative Proportion 0.9999002 1.000e+00 1.000e+00

Scaling 2 for species and site scores
* Species are scaled proportional to eigenvalues
* Sites are unscaled: weighted dispersion equal on all dimensions

Species scores

         CA1       CA2      CA3       CA4       CA5       CA6
CHA  1.50075 -1.410293  0.26049 -0.307203  0.271777 -0.003465
TRU  1.66167  0.444129  0.57571  0.166073 -0.261870 -0.326590
VAI  1.28545  0.285328 -0.04768  0.018126  0.043847  0.200732
LOC  0.98662  0.360900 -0.35265 -0.009021 -0.012231  0.253429
OMB  1.55554 -1.389752  0.80505 -0.468471  0.471301  0.225409
(......)

Site scores (weighted averages of species scores)

        CA1       CA2        CA3      CA4      CA5      CA6
1   2.76488  3.076306  5.3657489  1.99192 -5.07714 -7.80447
2   2.27540  2.565531  1.2659130  0.87538 -1.89139 -0.13887
3   2.01823  2.441224  0.5144079  0.79436 -1.03741  0.56015
4   1.28485  1.935664 -0.2509482  0.76470  0.54752  0.10579
(......)

summary(spe.ca, scaling=1)
Call:
cca(X = spe) 

Partitioning of scaled Chi-square:
              Inertia Proportion
Total           1.167          1
Unconstrained   1.167          1

Eigenvalues, and their contribution to the scaled Chi-square 

Importance of components:
                        CA1    CA2     CA3     CA4     CA5     CA6     CA7     CA8     CA9     CA10     CA11     CA12
Eigenvalue            0.601 0.1444 0.10729 0.08337 0.05158 0.04185 0.03389 0.02883 0.01684 0.010826 0.010142 0.007886
Proportion Explained  0.515 0.1237 0.09195 0.07145 0.04420 0.03586 0.02904 0.02470 0.01443 0.009278 0.008691 0.006758
Cumulative Proportion 0.515 0.6387 0.73069 0.80214 0.84634 0.88220 0.91124 0.93594 0.95038 0.959655 0.968346 0.975104
                          CA13     CA14     CA15     CA16     CA17     CA18     CA19     CA20      CA21      CA22      CA23
Eigenvalue            0.006123 0.004867 0.004606 0.003844 0.003067 0.001823 0.001642 0.001295 0.0008775 0.0004217 0.0002149
Proportion Explained  0.005247 0.004171 0.003948 0.003294 0.002629 0.001562 0.001407 0.001110 0.0007520 0.0003614 0.0001841
Cumulative Proportion 0.980351 0.984522 0.988470 0.991764 0.994393 0.995955 0.997362 0.998472 0.9992238 0.9995852 0.9997693
                           CA24      CA25      CA26
Eigenvalue            0.0001528 8.949e-05 2.695e-05
Proportion Explained  0.0001309 7.669e-05 2.310e-05
Cumulative Proportion 0.9999002 1.000e+00 1.000e+00

Scaling 1 for species and site scores
* Sites are scaled proportional to eigenvalues
* Species are unscaled: weighted dispersion equal on all dimensions

Species scores

         CA1      CA2      CA3      CA4      CA5      CA6
CHA  1.93586 -3.71167  0.79524 -1.06393  1.19669 -0.01694
TRU  2.14343  1.16888  1.75759  0.57516 -1.15306 -1.59651
VAI  1.65814  0.75094 -0.14555  0.06277  0.19306  0.98127
LOC  1.27267  0.94983 -1.07661 -0.03124 -0.05385  1.23887
OMB  2.00654 -3.65761  2.45774 -1.62244  2.07523  1.10190
BLA  1.28617 -3.89487 -1.46646  0.27497 -0.46548 -1.62514
HOT -0.70838 -0.13563  0.03428 -0.33249 -1.68537  0.65900
TOX -0.23836 -1.15198 -1.75354  1.46935 -2.58533  0.44908
VAN  0.01724 -0.25092 -1.76067  0.73427  0.55774 -1.90211
CHE  0.01391  0.36998 -1.06276 -1.86417  0.81585  0.81679
BAR -0.43036 -0.79135 -0.15048  0.59208 -0.69219  0.50384
(......)

Site scores (weighted averages of species scores)

        CA1       CA2        CA3       CA4       CA5      CA6
1   2.14343  1.168878  1.7575907  0.575155 -1.153061 -1.59651
2   1.76398  0.974804  0.4146591  0.252762 -0.429551 -0.02841
3   1.56461  0.927572  0.1684981  0.229368 -0.235605  0.11459
(......)

第一轴有一个很大的特征根。在CA里面，如果特征根超过0.6，代表数据结构梯度明显。第一轴特征根占总惯量多少比例呢？需要注意的是，两类标尺下，特征根一样。标尺的选择，只影响特征向量，不影响特征根。

#尺下，特征根一样。标尺的选择，只影响特征向量，不影响特征根。
# 绘制每轴的特征根和方差百分比
(ev2 <- spe.ca$CA$eig)
evplot(ev2)

         CA1          CA2          CA3          CA4          CA5          CA6          CA7          CA8          CA9 
6.009926e-01 1.443709e-01 1.072938e-01 8.337321e-02 5.157826e-02 4.184649e-02 3.388638e-02 2.882547e-02 1.684112e-02 
        CA10         CA11         CA12         CA13         CA14         CA15         CA16         CA17         CA18 
1.082639e-02 1.014213e-02 7.885549e-03 6.123133e-03 4.867260e-03 4.606481e-03 3.843808e-03 3.067492e-03 1.823032e-03 
        CA19         CA20         CA21         CA22         CA23         CA24         CA25         CA26 
1.641868e-03 1.295163e-03 8.775034e-04 4.217149e-04 2.148505e-04 1.527935e-04 8.948679e-05 2.695049e-05 
> evplot(ev2)

#这里，断棍模型比Kaiser-Guttman准则更保守。无论是数量分析结果、还是
#条形图都显示第一轴占绝对优势。

# CA双序图
# *********
par(mfrow=c(1,2))
# 1型标尺：样方点是物种点的形心
plot(spe.ca, scaling=1, main="鱼类多度CA双序图（1型标尺）")
# 2型标尺（默认）：物种点是样方点的形心
plot(spe.ca, main="鱼类多度CA双序图（2型标尺）")

1 型标尺更适合解释样方之家的关系和样方的梯度排列；2型标尺更适合解释物种之间的关系和梯度分布。

CA排序中被动加入环境因子

plot(spe.ca, main="鱼类多度CA双序图（2型标尺）")
# CA排序中被动加入环境因子
# 调用最后生成CA结果对象（2型标尺）
spe.ca.env <- envfit(spe.ca, env)
plot(spe.ca.env)
# 这个命令的目的是在最后双序图加入环境变量
#新加入的环境变量信息对解读双序图是否有帮助？

基于CA排序结果的数据表格重排

vegemite(spe, spe.ca)

     2322222222211  11 1 11  11 1 
     40867235190985976604547312231
 PCH .53122..14...................
 BBO 155244..3421.................
 BCO .53234..3411.................
 GRE 255454.135211................
 ANG .54232..24211..1.............
 ABL 5555552355532..2.............
 ROT .52112.12221.2...............
 BOU .54233..35322..1.............
 PSO 134233..25211..11............
 CAR .54123..23111..11............
 HOT 111113..22221..1.............
 GAR 255455115555254211...........
 SPI .51111..23323..41............
 TAN .54354..4342131112.11........
 BAR .33245..45423..32...21.......
 GOU 154345.2554422.1211121.......
 PER .52114..342134.211.2.........
 BRO .43243.1352114.111.2.1.1.....
 TOX .21.12..22233..44............
 CHE 23423411243132522221311.11...
 VAN .32123.1232225.3512.3.1......
 LOC ..1111..11253234554554551432.
 BLA .........1.11..25...43.....2.
 VAI ........11133314344545454445.
 CHA ............1..12...33..12.2.
 OMB .........1..1..1....24..12.3.
 TRU .........1..12.23314455535553
  sites species 
     29      27 

#当前输出的表格与传统的群落数据表格排列方式相反，现在以行为物种，以#列为样方。物种排列顺序和样方排列顺序依赖于排序轴的方向（其实是任
#意的）。可以发现，单纯基于第一轴的结果重新排列数据表格，并没有达
#到最佳的效果。因为第二轴所反映的上游（样方1-10）到中游（样方11-18）#的梯度，以及这些样方的特征种，在这个表格里并没有聚集，而是分散的。

使用函数CA（）进行对应分析

# ************************
source("CA.R") #导入CA.R脚本，此脚本必须在当前工作目录下或给路径
spe.CA.PL <- CA(spe)
biplot(spe.CA.PL, cex=1)
# 用CA第一轴排序结果重新排列数据表格
# 重新排列数据表格与vegemite（）输出的结果一样
summary(spe.CA.PL)
              Length Class  Mode     
total.inertia   1    -none- numeric  
eigenvalues    26    -none- numeric  
rel.eigen      26    -none- numeric  
rel.cum.eigen  26    -none- numeric  
U             702    -none- numeric  
Uhat          754    -none- numeric  
F             754    -none- numeric  
Fhat          702    -none- numeric  
V             702    -none- numeric  
Vhat          754    -none- numeric  
site.names     29    -none- character
sp.names       27    -none- character
color.sites     1    -none- character
color.sp        1    -none- character
call            2    -none- call   

t(spe[order(spe.CA.PL$F[,1]),order(spe.CA.PL$V[,1])])

    24 30 28 26 27 22 23 25 21 29 20 19 18 5 9 17 16 6 10 4 15 14 7 3 11 12 2 13 1
PCH  0  5  3  1  2  2  0  0  1  4  0  0  0 0 0  0  0 0  0 0  0  0 0 0  0  0 0  0 0
BBO  1  5  5  2  4  4  0  0  3  4  2  1  0 0 0  0  0 0  0 0  0  0 0 0  0  0 0  0 0
BCO  0  5  3  2  3  4  0  0  3  4  1  1  0 0 0  0  0 0  0 0  0  0 0 0  0  0 0  0 0
GRE  2  5  5  4  5  4  0  1  3  5  2  1  1 0 0  0  0 0  0 0  0  0 0 0  0  0 0  0 0
ANG  0  5  4  2  3  2  0  0  2  4  2  1  1 0 0  1  0 0  0 0  0  0 0 0  0  0 0  0 0
ABL  5  5  5  5  5  5  2  3  5  5  5  3  2 0 0  2  0 0  0 0  0  0 0 0  0  0 0  0 0
ROT  0  5  2  1  1  2  0  1  2  2  2  1  0 2 0  0  0 0  0 0  0  0 0 0  0  0 0  0 0
BOU  0  5  4  2  3  3  0  0  3  5  3  2  2 0 0  1  0 0  0 0  0  0 0 0  0  0 0  0 0
PSO  1  3  4  2  3  3  0  0  2  5  2  1  1 0 0  1  1 0  0 0  0  0 0 0  0  0 0  0 0
CAR  0  5  4  1  2  3  0  0  2  3  1  1  1 0 0  1  1 0  0 0  0  0 0 0  0  0 0  0 0
HOT  1  1  1  1  1  3  0  0  2  2  2  2  1 0 0  1  0 0  0 0  0  0 0 0  0  0 0  0 0
GAR  2  5  5  4  5  5  1  1  5  5  5  5  2 5 4  2  1 1  0 0  0  0 0 0  0  0 0  0 0
SPI  0  5  1  1  1  1  0  0  2  3  3  2  3 0 0  4  1 0  0 0  0  0 0 0  0  0 0  0 0
TAN  0  5  4  3  5  4  0  0  4  3  4  2  1 3 1  1  1 2  0 1  1  0 0 0  0  0 0  0 0
BAR  0  3  3  2  4  5  0  0  4  5  4  2  3 0 0  3  2 0  0 0  2  1 0 0  0  0 0  0 0
GOU  1  5  4  3  4  5  0  2  5  5  4  4  2 2 0  1  2 1  1 1  2  1 0 0  0  0 0  0 0
PER  0  5  2  1  1  4  0  0  3  4  2  1  3 4 0  2  1 1  0 2  0  0 0 0  0  0 0  0 0
BRO  0  4  3  2  4  3  0  1  3  5  2  1  1 4 0  1  1 1  0 2  0  1 0 1  0  0 0  0 0
TOX  0  2  1  0  1  2  0  0  2  2  2  3  3 0 0  4  4 0  0 0  0  0 0 0  0  0 0  0 0
CHE  2  3  4  2  3  4  1  1  2  4  3  1  3 2 5  2  2 2  2 1  3  1 1 0  1  1 0  0 0
VAN  0  3  2  1  2  3  0  1  2  3  2  2  2 5 0  3  5 1  2 0  3  0 1 0  0  0 0  0 0
LOC  0  0  1  1  1  1  0  0  1  1  2  5  3 2 3  4  5 5  4 5  5  4 5 5  1  4 3  2 0
BLA  0  0  0  0  0  0  0  0  0  1  0  1  1 0 0  2  5 0  0 0  4  3 0 0  0  0 0  2 0
VAI  0  0  0  0  0  0  0  0  1  1  1  3  3 3 1  4  3 4  4 5  4  5 4 5  4  4 4  5 0
CHA  0  0  0  0  0  0  0  0  0  0  0  0  1 0 0  1  2 0  0 0  3  3 0 0  1  2 0  2 0
OMB  0  0  0  0  0  0  0  0  0  1  0  0  1 0 0  1  0 0  0 0  2  4 0 0  1  2 0  3 0
TRU  0  0  0  0  0  0  0  0  0  1  0  0  1 2 0  2  3 3  1 4  4  5 5 5  3  5 5  5 3