之前,想研究一下GWAS分析汇中PVE(表型方差解释百分比)的计算方法,写了两篇:
GWAS分析中SNP解释百分比PVE | 第一篇,SNP解释百分比之和为何大于1?
GWAS分析中SNP解释百分比PVE | 第二篇,GLM模型中如何计算PVE?
这里介绍第三篇:混合线性模型的GWAS,如何计算PVE。
1. R语言计算的PVE能否用于MLM模型?
昨天介绍了使用R语言计算显著SNP的表型方差解释百分比(PVE),它的步骤有三步:
第一步:将SNP和协变量(PCA和其它协变量)放到模型中,计算回归模型的R方(R-squared)「这一步加上显著SNP」 第二步:将协变量(PCA和其它协变量)放到模型中,计算回归模型的R方(R-squared)「这一步去掉显著SNP」 第三步:将第一步的R方减去第二步的R方,得到的值就是该SNP的表型变异解释百分比(PVE)
这种计算方法,也是可以用于MLM模型和其它GWAS模型的,它会存在过高估计的风险,但也是一种可行的方法。
代码演示:「第一步:」
r$> mod_2 = lm(y ~ 1+pc1 + pc2 + pc3 ,data=dat1);summary(mod_2) Call: lm(formula = y ~ 1 + pc1 + pc2 + pc3, data = dat1) Residuals: Min 1Q Median 3Q Max -89.495 -19.318 -0.099 18.649 85.448 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 102.0467 0.8864 115.129 < 2e-16 *** pc1 -111.8205 28.0295 -3.989 7.11e-05 *** pc2 -21.6604 28.0295 -0.773 0.44 pc3 35.1350 28.0295 1.254 0.21 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 28.03 on 996 degrees of freedom Multiple R-squared: 0.01783, Adjusted R-squared: 0.01487 F-statistic: 6.028 on 3 and 996 DF, p-value: 0.0004546
「第二步:」
r$> mod_2 = lm(y ~ 1+pc1 + pc2 + pc3 ,data=dat1);summary(mod_2) Call: lm(formula = y ~ 1 + pc1 + pc2 + pc3, data = dat1) Residuals: Min 1Q Median 3Q Max -89.495 -19.318 -0.099 18.649 85.448 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 102.0467 0.8864 115.129 < 2e-16 *** pc1 -111.8205 28.0295 -3.989 7.11e-05 *** pc2 -21.6604 28.0295 -0.773 0.44 pc3 35.1350 28.0295 1.254 0.21 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 28.03 on 996 degrees of freedom Multiple R-squared: 0.01783, Adjusted R-squared: 0.01487 F-statistic: 6.028 on 3 and 996 DF, p-value: 0.0004546
「第三步:」
r$> x1 - x2 [1] 0.03100801
结果和GAPIT
中的结果是一致的。
a3 = fread("04_glm_gapit/GAPIT.GLM.V3.GWAS.Results.csv") %>% arrange(P.value) # GAPIT GLM a3$Rs = a3$Rsquare.of.Model.with.SNP - a3$Rsquare.of.Model.without.SNP a3 = a3 %>% select(SNP,P.value,Rsquare.of.Model.without.SNP,Rsquare.of.Model.with.SNP,`FDR_Adjusted_P-values`,effect,Rs) head(a3)
2. MLM的GWAS模型如何计算PVE?
GAPIT3.0增加了MLM模型计算PVE的方法:
下面是GAPIT论坛中,作者的回复,所以,我们可以在GAPIT软件中使用MLM模型,进行GWAS分析,并得到每个SNP的PVE值。
❝来源:https://groups.google.com/g/gapit-forum/c/acp8bCjZuJo Jiabo Wang 2021年9月13日 13:32:02
收件人 GAPIT Forum Hi Manjeet,
In the GAPIT3, we can use "Random.model=TRUE" to calculate PVE (phenotype variance explained) with the significant markers. Sincerely, Jiabo
❞
「GWAS分析代码:」
# GWAS 分析 library(data.table) source("http://zzlab.net/GAPIT/GAPIT.library.R") source("http://zzlab.net/GAPIT/gapit_functions.txt") myGd = fread("mdp_numeric.txt",header=T,data.table = F) myGM = fread("mdp_SNP_information.txt",header = T,data.table=F) myY = fread("dat_plink.txt",data.table = F) head(myY) covar = fread("cov_plink.txt",data.table = F)[,-1] names(covar)[1] = "Taxa" head(covar) myGAPIT = GAPIT(Y = myY[,c(1,3)],GD = myGd, GM = myGM, # PCA.total=3, CV = covar,Random.model=TRUE, model="MLM")
运行日志查看:
现在的GAPIT分析版本为:3.0 [1] "All packages are loaded already ! GAPIT.Version is 2020.10.24, GAPIT 3.0" [1] "MLM"
结果文件查看:
这里关于rsquare_without_snp
和rsquare_with_snp
的解释如下:
❝"rsquare_with_snp" is the R-sqaure_LR of the model WITH the tested SNP included in the model as an explanatory variable. In the context of the models fitted with GAPIT, this is the mixed model with the kinship matrix, PCs or any other covariates, and the SNP.
❞
❝"rsquare_without_snp" is the R-sqaure_LR of the model WITHOUTthe tested SNP included in the mdoel as an explanatory variable In the context of the models fitted with GAPIT, this is the mixed model with the kinship matrix, PCs or any other covariates.
❞
SNP的PVE的计算方法是:
PVE = rsquare_with_snp - rsquare_without_snp
这里,我们对结果进行处理,得到:
a2 = fread("05_lmm_gapit/GAPIT.MLM.V3.GWAS.Results.csv") %>% arrange(P.value) # GAPIT MLM a2$PVE = a2$Rsquare.of.Model.with.SNP - a2$Rsquare.of.Model.without.SNP head(a2)
结果如下:
> head(a2) SNP Chromosome Position P.value maf nobs Rsquare.of.Model.without.SNP Rsquare.of.Model.with.SNP FDR_Adjusted_P-values effect PVE 1: M98663 20 73 0.00001208 0.2220 1000 0.06567 0.08382 0.3635 -7.475 0.01815 2: M73233 15 64 0.00012951 0.2255 1000 0.06567 0.07953 0.9039 -6.450 0.01385 3: M82957 17 57 0.00017835 0.1685 1000 0.06567 0.07895 0.9039 7.436 0.01328 4: M82958 17 57 0.00017835 0.1685 1000 0.06567 0.07895 0.9039 7.436 0.01328 5: M37588 8 51 0.00021549 0.2705 1000 0.06567 0.07861 0.9039 -6.367 0.01294 6: M36594 8 31 0.00026062 0.3520 1000 0.06567 0.07827 0.9039 -5.610 0.01260
可以看到,SNP名为M98663的PVE为0.01815
3. GLM和MLM模型PVE的比较
这里,就有问题了,PVE的计算方法,有GLM的和MLM的,应该用哪一个呢?
我们之前说过,使用R语言计算的PVE,会过高估计,如果在MLM模型中,我们将显著性的SNP提取,然后用R语言进行PVE分析,其实和GLM模型得到的PVE是一致的。
这里,我们进行比较一下两种PVE的计算结果:
「GLM模型计算PVE结果」
> # GAPIT 中的glm模型 > a1 = fread("04_glm_gapit/GAPIT.GLM.V3.GWAS.Results.csv") %>% arrange(P.value) # GAPIT GLM > a1$PVE = a1$Rsquare.of.Model.with.SNP - a1$Rsquare.of.Model.without.SNP > head(a1) SNP Chromosome Position P.value maf nobs Rsquare.of.Model.without.SNP Rsquare.of.Model.with.SNP FDR_Adjusted_P-values effect PVE 1: M57157 12 44 0.00000002658 0.3750 1000 0.01783 0.04884 0.0007997 7.518 0.03101 2: M57155 12 44 0.00000014999 0.3275 1000 0.01783 0.04543 0.0011567 7.242 0.02760 3: M57156 12 44 0.00000014999 0.3275 1000 0.01783 0.04543 0.0011567 7.242 0.02760 4: M98663 20 73 0.00000015377 0.2220 1000 0.01783 0.04538 0.0011567 -7.812 0.02755 5: M73233 15 64 0.00000051228 0.2255 1000 0.01783 0.04303 0.0029001 -7.526 0.02520 6: M8452 2 69 0.00000057828 0.4350 1000 0.01783 0.04279 0.0029001 -6.520 0.02496
「MLM模型计算PVE:」
> a2 = fread("05_lmm_gapit/GAPIT.MLM.V3.GWAS.Results.csv") %>% arrange(P.value) # GAPIT MLM > a2$PVE = a2$Rsquare.of.Model.with.SNP - a2$Rsquare.of.Model.without.SNP > head(a2) SNP Chromosome Position P.value maf nobs Rsquare.of.Model.without.SNP Rsquare.of.Model.with.SNP FDR_Adjusted_P-values effect PVE 1: M98663 20 73 0.00001208 0.2220 1000 0.06567 0.08382 0.3635 -7.475 0.01815 2: M73233 15 64 0.00012951 0.2255 1000 0.06567 0.07953 0.9039 -6.450 0.01385 3: M82957 17 57 0.00017835 0.1685 1000 0.06567 0.07895 0.9039 7.436 0.01328 4: M82958 17 57 0.00017835 0.1685 1000 0.06567 0.07895 0.9039 7.436 0.01328 5: M37588 8 51 0.00021549 0.2705 1000 0.06567 0.07861 0.9039 -6.367 0.01294 6: M36594 8 31 0.00026062 0.3520 1000 0.06567 0.07827 0.9039 -5.610 0.01260
「两者进行比较:」
> re = merge(a1,a2,by="SNP") > head(re) SNP Chromosome.x Position.x P.value.x maf.x nobs.x Rsquare.of.Model.without.SNP.x Rsquare.of.Model.with.SNP.x FDR_Adjusted_P-values.x effect.x PVE.x Chromosome.y 1: M100000 20 99 0.005662 0.1350 1000 0.01783 0.02541 0.1190 5.036 0.00758157 20 2: M10001 3 0 0.823235 0.0425 1000 0.01783 0.01788 0.9411 -0.716 0.00004923 3 3: M10003 3 0 0.490322 0.1175 1000 0.01783 0.01830 0.7910 -1.486 0.00046956 3 4: M10004 3 0 0.230065 0.3970 1000 0.01783 0.01925 0.5883 1.611 0.00142220 3 5: M10005 3 0 0.012427 0.2805 1000 0.01783 0.02402 0.1724 3.700 0.00618466 3 6: M10006 3 0 0.680577 0.3915 1000 0.01783 0.01800 0.8864 -0.547 0.00016722 3 Position.y P.value.y maf.y nobs.y Rsquare.of.Model.without.SNP.y Rsquare.of.Model.with.SNP.y FDR_Adjusted_P-values.y effect.y PVE.y 1: 99 0.19093 0.1350 1000 0.06567 0.06728 0.9877 2.7332 0.001606674 2: 0 0.73907 0.0425 1000 0.06567 0.06578 0.9927 -1.1751 0.000104136 3: 0 0.34225 0.1175 1000 0.06567 0.06652 0.9927 -2.2101 0.000846925 4: 0 0.47611 0.3970 1000 0.06567 0.06615 0.9927 1.0501 0.000476678 5: 0 0.08458 0.2805 1000 0.06567 0.06847 0.9877 2.8215 0.002796028 6: 0 0.92109 0.3915 1000 0.06567 0.06568 0.9957 -0.1459 0.000009211 > re %>% select(PVE.x,PVE.y) %>% cor PVE.x PVE.y PVE.x 1.0000 0.7875 PVE.y 0.7875 1.0000
可以看到相关系数为0.78,相关性并不高。
两者PVE之和比较:
> sum(a1$PVE) [1] 57.61 > sum(a2$PVE) [1] 28.31
可以看到GLM计算的PVE之和为57.61,而MLM模型计算的PVE之和为28.31,很明显GLM的方法更虚高。
所以,在MLM模型的GWAS中,我们要选择MLM方法计算的PVE。
问题来了,如果不用GAPIT软件,该如何手动计算PVE值呢?
4. 其它GWAS分析软件如何计算PVE
我们知道,其它GWAS软件中是没有PVE的结果的,比如:
GEMMA GCTA中的fast-GWA
下一节介绍一下如何用R语言进行演示MLM的PVE计算方法。
「这应该是全网可以搜索的唯一的实现方法了。」
欢迎继续关注。
转载本文请联系原作者获取授权,同时请注明本文来自邓飞科学网博客。
链接地址:https://wap.sciencenet.cn/blog-2577109-1318011.html?mobile=1
收藏