小柯机器人

深度学习提高蛋白质结构预测准确性
2020-01-19 18:55

英国DeepMind公司Andrew W. Senior等研究人员利用深度学习实现对蛋白质结构预测的提高。该研究于2020年1月15日在线发表于国际一流学术期刊《自然》。

研究人员证明了他们可以训练神经网络对残基对之间的距离做出准确的预测,与接触预测相比,可以传达有关结构的更多信息。利用这些信息,研究人员可以构造出平均力势,从而可以准确地描述蛋白质的形状。

研究人员发现,可以通过简单的梯度下降算法优化生成的势,来生成结构,而无需复杂的采样程序。研究人员将这一系统名为AlphaFold,即使对于具有较少同源序列的序列,也可以实现高精度。在最近的蛋白质结构预测关键评估(CASP13)(对领域状态的盲目评估)中,AlphaFold为43个免费建模结构域中的24个创建了高精度结构(模板建模(TM)得分为0.7或更高)。而使用抽样和联系信息的次佳方法仅在43个域中的14个结构域中达到了这样的精度。AlphaFold代表了蛋白质结构预测方面的重大进步。

研究人员希望这种提高的准确性能够深入了解蛋白质的功能和功能异常,特别是在尚未通过实验确定同源蛋白质结构的情况下尤其如此。

据介绍,蛋白质结构预测可用于根据其氨基酸序列确定蛋白质的三维形状。这个问题至关重要,因为蛋白质的结构在很大程度上决定了它的功能。但是,蛋白质结构可能很难通过实验确定。利用遗传信息,最近已经取得了可观的进步。通过分析同源序列中的协变可以推断出哪些氨基酸残基相接触,这有助于蛋白质结构的预测。

附:英文原文

Title: Improved protein structure prediction using potentials from deep learning

Author: Andrew W. Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim Green, Chongli Qin, Augustin dek, Alexander W. R. Nelson, Alex Bridgland, Hugo Penedones, Stig Petersen, Karen Simonyan, Steve Crossan, Pushmeet Kohli, David T. Jones, David Silver, Koray Kavukcuoglu, Demis Hassabis

Issue&Volume: 2020-01-15

Abstract: Protein structure prediction can be used to determine the three-dimensional shape of a protein from its amino acid sequence1. This problem is of fundamental importance as the structure of a protein largely determines its function2; however, protein structures can be difficult to determine experimentally. Considerable progress has recently been made by leveraging genetic information. It is possible to infer which amino acid residues are in contact by analysing covariation in homologous sequences, which aids in the prediction of protein structures3. Here we show that we can train a neural network to make accurate predictions of the distances between pairs of residues, which convey more information about the structure than contact predictions. Using this information, we construct a potential of mean force4 that can accurately describe the shape of a protein. We find that the resulting potential can be optimized by a simple gradient descent algorithm to generate structures without complex sampling procedures. The resulting system, named AlphaFold, achieves high accuracy, even for sequences with fewer homologous sequences. In the recent Critical Assessment of Protein Structure Prediction5 (CASP13)—a blind assessment of the state of the field—AlphaFold created high-accuracy structures (with template modelling (TM) scores6 of 0.7 or higher) for 24 out of 43 free modelling domains, whereas the next best method, which used sampling and contact information, achieved such accuracy for only 14 out of 43 domains. AlphaFold represents a considerable advance in protein-structure prediction. We expect this increased accuracy to enable insights into the function and malfunction of proteins, especially in cases for which no structures for homologous proteins have been experimentally determined7.

DOI: 10.1038/s41586-019-1923-7

Source: https://www.nature.com/articles/s41586-019-1923-7

Nature:《自然》,创刊于1869年。隶属于施普林格·自然出版集团,最新IF:69.504
官方网址:http://www.nature.com/
投稿链接:http://www.nature.com/authors/submit_manuscript.html


本期文章:《自然》:Online/在线发表

分享到:

0