薛良交
How to covert snp file from MUMmer into VCF format
2016-9-1 05:47
阅读:7743
标签:Python, MUMmer, VCF

How to covert snp file from MUMmer into VCF format

By Liangjiao Xue


MUMmer is a traditional software to compare two genomes or assemblies. One SNP table could be generated with show-snps tool, which also includes Indel information. Generally, VCF format is more popular currently. So, we need a tool to do the conversion between two formats.


Something need to be considered during the conversion to get the correct converted VCF files from MUMmer/snps:


1) You need to check the reference sequence to rebuild insertion and deletion.
Instead of reading original reference fasta file, I used "show-snps -x 1", so that the surrounding nucleotides are also reported.


2) For the insertions, if the query sequences are reversely mapped to the references, the orders of nucleotides in query sequence are reversely reported.
So, they needed to be concatenated in reverse order.


3) The coordinates of insertion and deletions.
For insertions, the coordinates in MUMmer/snps are the coordinates of nucleotides before insertions. They need to be kept as the same in VCF files.


For deletions, the coordinates in MUMmer/snps are of the nucleotides that are deleted. The coordinates in VCF should be : first_position_of_deletion_block - 1.


Here is my Python code:

https://github.com/liangjiaoxue/PythonNGSTools/blob/master/MUMmerSNPs2VCF.py



These notes of this code is also listed here:

https://github.com/liangjiaoxue/PythonNGSTools


转载本文请联系原作者获取授权,同时请注明本文来自薛良交科学网博客。

链接地址:https://wap.sciencenet.cn/blog-285393-1000040.html?mobile=1

收藏

分享到:

当前推荐数:0
推荐到博客首页
网友评论0 条评论
确定删除指定的回复吗?
确定删除本博文吗?