小柯机器人

科学家利用深度学习来诠释蛋白质宇宙
2022-02-27 13:48

美国谷歌研究Lucy J. Colwell、Maxwell L. Bileschi等研究人员合作利用深度学习来诠释蛋白质宇宙。2022年2月21日,《自然—生物技术》杂志在线发表了这项成果。

研究人员训练了深度学习模型来准确预测未比对氨基酸序列的功能注释,并通过严格的基准评估,从蛋白质家族数据库Pfam的17,929个家族建立。这些模型推断出已知的进化替代模式,并学习表征,准确地对未见过家族的序列进行分类。将深度模型与现有的方法结合起来,可以明显改善偏远同源性检测,这表明深度模型学习了互补的信息。这种方法将Pfam的覆盖范围扩大了9.5%以上,超过了过去十年的增加量,并预测了360个以前没有Pfam注释的人类参考蛋白质组蛋白的功能。这些结果表明,深度学习模型将成为未来蛋白质注释工具的一个核心组成部分。

据悉,了解氨基酸序列和蛋白质功能之间的关系是一个长期的挑战,具有深远的科学和转化意义。最先进的基于比对的技术无法预测三分之一微生物蛋白质序列的功能,阻碍了人类利用不同生物体数据的能力。

附:英文原文

Title: Using deep learning to annotate the protein universe

Author: Bileschi, Maxwell L., Belanger, David, Bryant, Drew H., Sanderson, Theo, Carter, Brandon, Sculley, D., Bateman, Alex, DePristo, Mark A., Colwell, Lucy J.

Issue&Volume: 2022-02-21

Abstract: Understanding the relationship between amino acid sequence and protein function is a long-standing challenge with far-reaching scientific and translational implications. State-of-the-art alignment-based techniques cannot predict function for one-third of microbial protein sequences, hampering our ability to exploit data from diverse organisms. Here, we train deep learning models to accurately predict functional annotations for unaligned amino acid sequences across rigorous benchmark assessments built from the 17,929 families of the protein families database Pfam. The models infer known patterns of evolutionary substitutions and learn representations that accurately cluster sequences from unseen families. Combining deep models with existing methods significantly improves remote homology detection, suggesting that the deep models learn complementary information. This approach extends the coverage of Pfam by >9.5%, exceeding additions made over the last decade, and predicts function for 360 human reference proteome proteins with no previous Pfam annotation. These results suggest that deep learning models will be a core component of future protein annotation tools. A deep learning model predicts protein functional annotations for unaligned amino acid sequences.

DOI: 10.1038/s41587-021-01179-w

Source: https://www.nature.com/articles/s41587-021-01179-w

Nature Biotechnology:《自然—生物技术》,创刊于1996年。隶属于施普林格·自然出版集团,最新IF:68.164
官方网址:https://www.nature.com/nbt/
投稿链接:https://mts-nbt.nature.com/cgi-bin/main.plex


本期文章:《自然—生物技术》:Online/在线发表

分享到:

0