optiforziyan的个人博客分享 http://blog.sciencenet.cn/u/optiforziyan

博文

物种分布模型中预测变量多重共线性处理(栅格数据)

已有 1780 次阅读 2021-3-3 00:52 |系统分类:科研笔记

Checking for Collinearity in Predictor Variables

Some species distribution modeling algorithms (e.g., generalized linear model) assume predictor variables are not correlated, whereas with others, it can cause mis-interpretation of variable importance - for example, machine learning techniques will often pick up on a one of the correlated variables as being important, but not others. Thus, it is important to at least be aware of the correlations.


I hereby shared a code which was used in a research (https://doi.org/10.1007/s13595-020-01012-5) for dealing collinearity issue when modelling for species distributions.


####################################################################################
# Multicollinearity analysis                                                                                                                                
# ZL - 2021-01-25           
# Any question : optiforziyan@gmail.com                                                                                                                                  
####################################################################################
# species occurences input
occ_unique <- read.csv("species.csv", header = T)
# prepare a folder for outputs
if(!file.exists("Multicollinearity")) dir.create("Multicollinearity")
# make occ spatial
library(sp)
coordinates(occ_unique) <- ~ lon + lat
# Define the coordinate system that will be used. Here we show several examples:
myCRS1 <- CRS("+init=epsg:4326") # WGS 84
# add Coordinate Reference System (CRS) projection. 
library(raster)
crs(occ_unique) <- myCRS1
plot(occ_unique)
# predictors
# set working path to the folder containing environmental variables
bio <- list.files(pattern = ".tif$")
bio <- stack(bio)
# extracting env conditions for final occ from the raster stack;
# a data frame is returned (i.e multiple columns)
p <- extract(bio,occ_unique)
head(p)
# check multicollinearity
library(usdm)
# [1] the first way is checking VIF value 
v1 <- vifstep(p,th = 5) ### Vifstep
v1
# [2] the second way is checking pairwise correlation 
v2 <- vifcor(p, th = 0.7) ##correlation
v2
# If you select the second way, then you can use the 
# function below to select the final predictors
predictors <- exclude(p,v2) 
# [3] The third method is to manually select the variables
# better to consider variables with ecological meaning for your studied species
# pearson correlation coefficient <= |0.7|
Pcor <- cor(p, method="pearson")
write.csv(Pcor,"Multicollinearity/corr_matrix.csv")
# EOF (end of file)
####################################################################################


  




https://wap.sciencenet.cn/blog-3429889-1274712.html

上一篇:[转载]Seven Easy Graphs to Visualize Correlation Matrices in R
收藏 IP: 188.63.65.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-30 13:06

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部