小柯机器人

新算法助力大规模多序列比对
2019-12-03 12:33

近日,西班牙巴塞罗那科学技术学院Cedric Notredame、Evan Floden等研究人员合作开发了可用于大规模多序列比对(MSA)的算法。相关论文于12月2日在线发表于《自然—生物技术》。

研究人员引入了一种回归算法,该算法可在标准工作站上实现多达140万个序列的MSA,并大大提高了大于10000个序列的数据集的准确性。这一回归算法与渐进算法相反,以比对最相似的序列为起点。它使用有效的分而治之策略在线性时间内运行第三方对齐方法,而不管其原始复杂性如何。

这一方法将能够分析非常庞大的基因组数据集,例如最近宣布的地球生物基因组计划(包含150万个真核生物基因组)。

据悉,MSA用于结构和进化预测,但是比对大型数据集的复杂性要求使用近似解,包括渐进算法。渐进式MSA方法从比对最相似的序列开始,然后根据引导树从叶节点到根节点合并其余序列。随着序列数量的增加,它们的准确性会大大下降。

附:英文原文

Title: Large multiple sequence alignments with a root-to-leaf regressive method

Author: Edgar Garriga, Paolo Di Tommaso, Cedrik Magis, Ionas Erb, Leila Mansouri, Athanasios Baltzis, Hafid Laayouni, Fyodor Kondrashov, Evan Floden, Cedric Notredame

Issue&Volume: 2019-12-02

Abstract: Multiple sequence alignments (MSAs) are used for structural1,2 and evolutionary predictions1,2, but the complexity of aligning large datasets requires the use of approximate solutions3, including the progressive algorithm4. Progressive MSA methods start by aligning the most similar sequences and subsequently incorporate the remaining sequences, from leaf to root, based on a guide tree. Their accuracy declines substantially as the number of sequences is scaled up5. We introduce a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard workstation and substantially improves accuracy on datasets larger than 10,000 sequences. Our regressive algorithm works the other way around from the progressive algorithm and begins by aligning the most dissimilar sequences. It uses an efficient divide-and-conquer strategy to run third-party alignment methods in linear time, regardless of their original complexity. Our approach will enable analyses of extremely large genomic datasets such as the recently announced Earth BioGenome Project, which comprises 1.5 million eukaryotic genomes6.

DOI: 10.1038/s41587-019-0333-6

Source: https://www.nature.com/articles/s41587-019-0333-6

Nature Biotechnology:《自然—生物技术》,创刊于1996年。隶属于施普林格·自然出版集团,最新IF:68.164
官方网址:https://www.nature.com/nbt/
投稿链接:https://mts-nbt.nature.com/cgi-bin/main.plex


本期文章:《自然—生物技术》:Online/在线发表

分享到:

0