||
小麦GWAS思路总结-2
首先感谢上次小伙伴们积极参与话题的讨论,其实关于Manhattan图那个问题是我毕业答辩的时候一个美国教授问我的,我在他的循循善诱之下终于回答出来了。另外,有一位小伙伴留言说“One of the biggest limitation of GWA approach is the spurious association, I think one can never get over”前半句所指出的问题还是很中肯的,不过小编认为后半句中的 “never” 在科研语境中还是少用较好。其实对于spurious association,不同的实验目的本身就可以避免这个问题,比如spotblotch和rust等主效基因的GWAS鉴定就可以直接忽略那些highp value的associations。另外,对于像FHB抗性这样的GWAS鉴定,虽说必然会有很多lowconfidence的associations,但是如果我们的实验目的是为了抗病育种,那么Genomicselection也算是一种解决办法吧。下面的统计学思路中我们会从统计学角度再提一下这个问题。
GWAS一般思路
5.利用sub-panel做association分析。
比如做小麦谷蛋白含量的GWAS分析,就必须要考虑到不同小麦class(硬/软)自身所含蛋白质的差别,所以如果所用的群体包含不同种类的小麦,用不同sub-class panel单独分析也可以做为文章重要的一部分。又比如做大麦的GWAS,row-type是一个很明显的区别,Wang et al. 2017 是关于大麦的spot blotch 抗病,作者分别用Whole, Two-rowed 和Six-rowed 三个panels做了association analysis,原因有两个:在统计学上有两篇重量级的文章(Zhao et al. 2007; Zhao et al. 2011) 来讨论sub-structure association mapping的必要性,另外生物学意义上,确实有观察到不同row type 大麦对spot blotch抗性不同。
6. Sub-panel分析继续深入,鉴定出一小批有价值的validation lines ,可应用于以后Genecloning 项目中,比如可以用于target capture。
还是Wang et al. 2017 这篇(其实这就是小编自己的文章,哈哈), 作者利用一步步的sub-panel最终发现在six-rowed breeding lines(120 lines 左右,用cluster分析再精简到50 个line左右)所得出的抗病association的p value 最低,证明目标基因以及相应表型在这个小群体中有很好的分离。所以这个小群体可以作为未来基因克隆的宝贵种质资源,至少可以用来做单倍型分析来validate目标基因。
GWAS进阶思路
前两点是我在最初接触GWAS的时候所学到的,我当时关注这两点是基于两个思考:1)我所用的barley panel是美国农部NSGC core accessions,很多其他实验室都在利用这个Panel做不同的traits,我们能不能把所有的traits联合起来做分析呢?2)我做的是spot blotch,用了三种不同的致病菌种来获得三套不同的phenotyping data,如果单独分析,得到的就是针对三个不同致病种的抗性,那如果把三套data合起来,得到的结果是不是可以算是广谱抗性的鉴定呢?根据我所看到的文章,我的想法应该是可行的。可惜的是在我们小麦和大麦领域,我目前还没有看到相关的文章。我大胆的预测,以后随着更多统计学背景以及做模式植物的大牛们转向小麦,类似下面两点的文章一定会在小麦中出现。
1. Phenome-wide association study (PheWAS) or Pleiotropic association
“Phenomic approaches are complementary to the more prevalent paradigm of genome-wide association studies (GWAS), which have provided some information about the contribution of genetic variation to a wide range of diseases and phenotypes. While a typical GWAS evaluates the association between the variation of hundreds of thousands, to over a million, genotyped SNPs and one or a few phenotypes, a common limitation of GWAS is the focus on a pre-defined and limited phenotypic domain. An alternate approach is that of PheWAS, which utilizes all available phenotypic information and all genetic variants in the estimation of association between genotype and phenotype. By investigating the association between SNPs and a diverse range of phenotypes, a broader picture of the relationship between genetic variation and networks of phenotypes is possible.”(Sarah et al. 2013)
2. Meta-analysis of many genome-wide association studies
“The advent of genome-wide association studies has allowed considerable progress in the identification and robust replication of common gene variants that confer susceptibility to common diseases and other phenotypes of interest. These genetic effect sizes are almost invariably moderate to small in magnitude and single studies, even if large, are underpowered to detect them with confidence. Meta-analysis of many genome-wide association studies improves the power to detect more associations, and to investigate the consistency or heterogeneity of these associations across diverse datasets and study populations.” (Zeggini et al. 2009)
刚才又Google了一下,发现有两篇在rice/maize上做Metabolite-pathway-based Phenome-Wide Association Scan (M-PheWAS):Lu et al. 2015; Chen et al. 2016. 期待我们小麦在这个领域也追赶上来。
下面三点是统计学方法上的一些思路,主要是从Frontiers researchtopics上摘抄下来的,外加我自己的一些理解和总结(https://www.frontiersin.org/research-topics/7228/the-applications-of-new-multi-locus-gwas-methodologies-in-the-genetic-dissection-of-complex-traits有兴趣的小伙伴赶紧投稿啊,小编也正琢磨着投一篇)。
3. Multi-locusGWAS method.
How to use some available multi-locus GWAS methods and how to select them. 可以参考Wen et al. 2017 和Tamba et al. 2017,这两篇文章通讯作者都是原来我在南农的大田试验统计学老师章元明,章老师棒棒的!
4. New technical clue and method to obtain high power and low false positive rate in GWAS.
这又回到了我们开篇所提到的spurious association,其实统计学背景的同行们一直都在努力解决!可以参考Yang, et al. 2014 和Li et al. 2012,都是早期不错的GWAS文章,一个是在rice上,一个是在human上。
5.Heritability missing in GWAS is a common phenomenon.
下面这篇文章提了三点可能的原因以及可能的解决办法(Zhou et al. 2013)。另外,multi-locus GWAS methods 也可能把丢掉的遗传给找回来。
“This missing heritability may be explained by (1) the existence of a large number of minor-effect alleles that remain unidentified; (2) incomplete LD between SNP markers and the causal gene, contributing to the underestimation of single QTL effects; and (3) the existence of gene-by-gene interactions (i.e. epistasis), since only single marker effects were tested in this study. One way to increase heritability is to increase the density of markers. In Arabidopsis, the amount of phenotypic variation accounted for in 44 different traits was moderately high (at least one significant SNP marker with MAF C15 % was found to explain at least 20 % of phenotypic variation) because 250,000 SNP markers were used in the analysis.”
其它思路
小编能力有限,除了上边的思路以外,当然还可以应用GWAS玩出更多花样,比如凯凯以前的推送“plant journal 上看到一篇小麦株高关联分析的文章”,还有利用GWAS来鉴定基因的,比如“抗穗发芽Phs-A1位点研究进展 ”。最后,随着小麦序列和marker越来越多,大家可以更多的借鉴在rice和拟南芥上做的GWAS文章,笔者在这里就不更多列举了。
参考文献(今天的文献有点多,是按文献在文中出现的先后顺序排列的)
Wang et al. 2017. Genome-wide association mapping of spot blotch resistance to three different pathotypes of Cochliobolussativus in the USDA barley core collection, Mol. Breeding
Zhao K, et al. 2007. An Arabidopsis example of association mapping in structured samples. PLoS Genet.
Zhao K, et al. 2011. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Commun.
Sarah et al. 2013. Phenome-Wide Association Study (PheWAS) for Detection of Pleiotropy within the Population Architecture using Genomics and Epidemiology (PAGE) Network. PLoS Genet.
Zeggini etal. 2009. Meta-analysis in genome-wide association studies. Pharmacogenomics.
Lu et al. 2015. Systems Genetic Validation of the SNP-Metabolite Association in Rice
Via Metabolite-Pathway-Based Phenome-Wide Association Scans. Frontiers in PlantScience.
Chen et al. 2016. Comparative and parallel genome-wide association studies for metabolic and agronomic traits in cereals. NatureCommunications.
Wen et al. 2017. Methodologicalimplementation of mixed linear models in multi-locus genome-wide association studies. Brief Bioinform.
Tamba et al. 2017. Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies. PLoS Comput Biol.
Yang, et al. 2014. Combining high-throughput phenotyping and genome-wide association studies to reveal natural genetic variation in rice. Nat. Commun.
Li et al. 2012. Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum. Genet.
Zhou et al. 2013. Genome-wide association mapping reveals genetic architecture of durable spot blotch resistance in US barley breeding germplasm. Mol. Breeding
欢迎关注“小麦研究联盟”,了解小麦新进展
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-12-26 23:26
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社