ljxue的个人博客分享 http://blog.sciencenet.cn/u/ljxue Liangjiao Xue, Bioinformatics is my favorite.

博文

PSL format for blat software

已有 3740 次阅读 2014-11-10 02:56 |个人分类:Bioinformatics|系统分类:科研笔记| alignment, PSL

https://www.biostars.org/p/44633/


In general the coordinates in psl files are “zero based half open.” The first base in a sequence is numbered zero rather than one. When representing a range the end coordinate is not included in the range. Thus the first 100 bases of a sequence are represented as 0-100, and the second 100 bases are represented as 100-200. There is a another little unusual feature in the .psl format. It has to do with how coordinates are handled on the negative strand. In the qStart/qEnd fields the coordinates are where it matches from the point of view of the forward strand (even when the match is on the reverse strand). However on the qStarts[] list, the coordinates are reversed.


Here's an example of a 30-mer that has 2 blocks that align on the minus strand and 2 blocks on the plus strand (this sort of stuff happens in real life in response to assembly errors sometimes).


0         1         2         3 tens position in query

0123456789012345678901234567890 ones position in query

           ++++          +++++ plus strand alignment on query

   --------    ----------      minus strand alignment on query

Plus strand:     qStart 12 qEnd 31 blockSizes 4,5 qStarts 12,26

Minus strand:     qStart 4 qEnd 26 blockSizes 10,8 qStarts 5,19


Essentially the minus strand blockSizes and qStarts are what you would get if you reverse complemented the query.


However the qStart and qEnd are non-reversed. To get from one to the other:    

qStart = qSize - revQEnd     qEnd = qSize - revQStart




https://wap.sciencenet.cn/blog-285393-842348.html

上一篇:R code to split strings
下一篇:向NCBI SRA提交数据
收藏 IP: 128.192.8.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-5-15 17:51

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部