小柯机器人

科学家研发基于进化数据深度生成模型的疾病变异预测系统
2021-10-29 13:46

美国哈佛大学医学院Debora S. Marks和英国牛津大学Yarin Gal团队合作研发出基于进化数据深度生成模型的疾病变异预测系统。该研究于2021年10月27日发表于国际学术期刊《自然》杂志上。

在本研究中,研究人员提出了一种利用深度生成模型来预测变异致病性而不依赖标签的方法。通过对跨生物体的序列变异分布进行建模,研究人员摆脱了对保持适应性蛋白质序列的约束。该模型EVE(变异效应的进化模型)不仅优于依赖标签数据的计算方法,而且能达到基于高通量实验的预测甚至更好,后者越来越多地被用作变异分类的证据。

研究人员预测了3,219个疾病基因中超过3,600万个变异基因的致病性,并为超过256,000个功能不明的变异蛋白分类提供了证据。该研究工作表明,进化信息模型可以为变异解释提供有价值的独立证据,其将在研究和临床诊断过程中具有广泛应用。 

据了解,量化人类疾病相关基因中蛋白质变异的致病性将对临床方案产生显著影响,但这些变异中的绝大多数(超过 98%)仍然具有未知的功能。原则上,计算方法可以提供对遗传变异的大规模解释。然而,最先进的计算方法依赖于已知疾病标签的训练机器学习模型。由于这些标签稀缺、具有偏倚性且质量参差不齐,因此,通常该模型结果被认为不够可靠。

附:英文原文

Title: Disease variant prediction with deep generative models of evolutionary data

Author: Frazer, Jonathan, Notin, Pascal, Dias, Mafalda, Gomez, Aidan, Min, Joseph K., Brock, Kelly, Gal, Yarin, Marks, Debora S.

Issue&Volume: 2021-10-27

Abstract: Quantifying the pathogenicity of protein variants in human disease-related genes would have a marked effect on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences1,2,3. In principle, computational methods could support the large-scale interpretation of genetic variants. However, state-of-the-art methods4,5,6,7,8,9,10 have relied on training machine learning models on known disease labels. As these labels are sparse, biased and of variable quality, the resulting models have been considered insufficiently reliable11. Here we propose an approach that leverages deep generative models to predict variant pathogenicity without relying on labels. By modelling the distribution of sequence variation across organisms, we implicitly capture constraints on the protein sequences that maintain fitness. Our model EVE (evolutionary model of variant effect) not only outperforms computational approaches that rely on labelled data but also performs on par with, if not better than, predictions from high-throughput experiments, which are increasingly used as evidence for variant classification12,13,14,15,16. We predict the pathogenicity of more than 36 million variants across 3,219 disease genes and provide evidence for the classification of more than 256,000 variants of unknown significance. Our work suggests that models of evolutionary information can provide valuable independent evidence for variant interpretation that will be widely useful in research and clinical settings.

DOI: 10.1038/s41586-021-04043-8

Source: https://www.nature.com/articles/s41586-021-04043-8

Nature:《自然》,创刊于1869年。隶属于施普林格·自然出版集团,最新IF:69.504
官方网址:http://www.nature.com/
投稿链接:http://www.nature.com/authors/submit_manuscript.html


本期文章:《自然》:Online/在线发表

分享到:

0