
2022-03-01 16:44

美国加州大学Pavel A. Pevzne、Anton Bankevich研究组的研究发现多重de Bruijn图可利用长、高保真读数据进行基因组组装。这一研究成果于2022年2月28日发表在国际学术期刊《自然-生物技术》上。

为了实现长、高保真(HiFi)读取的自动组装,研究人员编写了La Jolla Assembler (LJA)程序,这是一种使用Bloom过滤、de Bruijn散点图和不相交生成的快速算法。LJA将HiFi读取的错误率降低了三个数量级,可用于大基因组和k-mer大小构建de Bruijn图,并将其转换为具有不同k-mer大小的多路de Bruijn图。与最先进的组装器相比,该算法不仅实现错误组装降低五倍,而且还生成了更多的连续组装。研究人员通过完全自动组装人类基因组的六个染色体展示了LJA的实用性。

据悉,尽管很多现有基因组组装器都是基于de Bruijn图,但对大型基因组和k-mer大小构建这些图仍然具有挑战性。随着用于人类基因组半手动生成端粒到端粒组装的长HiFi读取的出现,改善这一算法变得尤为紧迫。


Title: Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads

Author: Bankevich, Anton, Bzikadze, Andrey V., Kolmogorov, Mikhail, Antipov, Dmitry, Pevzner, Pavel A.

Issue&Volume: 2022-02-28

Abstract: Although most existing genome assemblers are based on de Bruijn graphs, the construction of these graphs for large genomes and large k-mer sizes has remained elusive. This algorithmic challenge has become particularly pressing with the emergence of long, high-fidelity (HiFi) reads that have been recently used to generate a semi-manual telomere-to-telomere assembly of the human genome. To enable automated assemblies of long, HiFi reads, we present the La Jolla Assembler (LJA), a fast algorithm using the Bloom filter, sparse de Bruijn graphs and disjointig generation. LJA reduces the error rate in HiFi reads by three orders of magnitude, constructs the de Bruijn graph for large genomes and large k-mer sizes and transforms it into a multiplex de Bruijn graph with varying k-mer sizes. Compared to state-of-the-art assemblers, our algorithm not only achieves five-fold fewer misassemblies but also generates more contiguous assemblies. We demonstrate the utility of LJA via the automated assembly of a human genome that completely assembled six chromosomes. A multiplex de Bruijn graph algorithm allows high-accuracy genome assembly from long, high-fidelity reads.

DOI: 10.1038/s41587-022-01220-6

Source: https://www.nature.com/articles/s41587-022-01220-6

Nature Biotechnology:《自然—生物技术》,创刊于1996年。隶属于施普林格·自然出版集团,最新IF:68.164

