基因组信息挖掘分享 http://blog.sciencenet.cn/u/hsm 基因组像无标点的天书,需要慢慢理解,慢慢加标点。

博文

[转载]SAM format summary

已有 3460 次阅读 2019-7-3 16:04 |个人分类:生物信息|系统分类:科研笔记|文章来源:转载

SAM format summary

The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences. It is a text format for storing sequence data in a series of tab delimited ASCII columns and is commonly used in next-generation sequencing data processing. It is the (non-binary) human-readable version of the BAM format and contains information about the read and the aligned position in the genome. It was developed by Heng Li in Richard Durbins group and others, their paper is here.

After a header section the alignment section describes all results of the aligned read data. The format is best explained with an example line:

Code


1:497:R:-272+13M17D24M  113  1  497  37  37M  15  100338662  0  CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG  0;==-==9;>>>>>=>>>>>>>>>>>=>>>>>>>>>>  XT:A:U  NM:i:0  SM:i:37  AM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:37
Fieldname	description	Example-data
QNAME	read name	1:497:R:-272+13M17D24M
FLAG	alignment flag	113
RNAME	alignment chromosome	1
POS	alignment start position	497
MAPQ	overall mapping quality	37
CIGAR	alignment CIGAR string	37M
MRNM/RNEXT	name of next alignm. in group (mate)	15
MPOS/PNEXT	pos. of next alignm. in group (mate)	100338662
ISIZE/TLEN	observed Template LENgth	0
SEQ	sequence	CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG
QUAL	quality per base	0;==-==9;>>>>>=>>>>>>>>>>>=>>>>>>>>>>
TAGs	further tags with alignment info
XT:A:U NM:i:0 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:37

The tags are optional and might vary between alignment programs. Shown are examples from BWA. Important for filtering are usually the tags X0:i (numbers of genome alignments of this read) and XM:i (number of mismatches in alignment).

       Tag	Meaning
       NM	Edit distance
       MD	Mismatching positions/bases
       AS	Alignment score
       BC	Barcode sequence
       X0	Number of best hits
       X1	Number of suboptimal hits found by BWA
       XN	Number of ambiguous bases in the referenece
       XM	Number of mismatches in the alignment
       XO	Number of gap opens
       XG	Number of gap extentions
       XT	Type: Unique/Repeat/N/Mate-sw
       XA	Alternative hits; format: (chr,pos,CIGAR,NM;)*
       XS	Suboptimal alignment score
       XF	Support from forward/reverse alignment
       XE	Number of supporting seeds

The read name (at least from Illumina machines) are constructed as:

[instrument-name]:[run ID]:[flowcell ID]:[lane-number]:[tile-number]:
[x-pos]:[y-pos] [read number]:[is filtered]:[control number]:
[barcode sequence]

example:

@M01117:25:000000000-A37B9:1:1101:14984:1386 1:N:0:4




https://wap.sciencenet.cn/blog-442719-1187930.html

上一篇:[转载]在 windows 下的某目录中右键打开 cygwin 终端
下一篇:[转载]Python安装模块的几种方法
收藏 IP: 27.18.87.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

全部作者的精选博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-16 18:11

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部