小柯机器人

新方法高效实现全基因组回归的分析
2021-05-23 21:55

美国Regeneron遗传学中心Jonathan Marchini团队开发出高效实现全基因组回归的分析。2021年5月20日,国际知名学术期刊《自然—遗传学》在线发表了这一成果。

研究人员提出了一种称为REGENIE的新型机器学习方法,可用于拟合定量和二元表型的全基因组回归模型,该模型比多特征分析中的替代方法快得多,同时保持了统计效率。该方法可以适应多种表型的平行分析,并且与将基因组范围的矩阵加载到内存中的现有方法相比,只需要将基因型矩阵的局部片段加载到内存中即可。这样可以节省大量的计算时间和内存使用量。

对于不平衡的病例对照表型,研究人员引入了一种快速的近似Firth logistic回归检验。该方法非常适合利用分布式计算框架。研究人员使用多达407,746名个体的UK Biobank数据集证明了这种方法的准确性和计算优势。

据了解,具有数千个表型队列的全基因组关联分析在计算上是昂贵的,尤其是在考虑样本相关性或群体结构时。

附:英文原文

Title: Computationally efficient whole-genome regression for quantitative and binary traits

Author: Joelle Mbatchou, Leland Barnard, Joshua Backman, Anthony Marcketta, Jack A. Kosmicki, Andrey Ziyatdinov, Christian Benner, Colm ODushlaine, Mathew Barber, Boris Boutkov, Lukas Habegger, Manuel Ferreira, Aris Baras, Jeffrey Reid, Goncalo Abecasis, Evan Maxwell, Jonathan Marchini

Issue&Volume: 2021-05-20

Abstract: Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case–control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.

DOI: 10.1038/s41588-021-00870-7

Source: https://www.nature.com/articles/s41588-021-00870-7

Nature Genetics:《自然—遗传学》,创刊于1992年。隶属于施普林格·自然出版集团,最新IF:41.307
官方网址:https://www.nature.com/ng/
投稿链接:https://mts-ng.nature.com/cgi-bin/main.plex


本期文章:《自然—遗传学》:Online/在线发表

分享到:

0