Kevin2015的个人博客分享 http://blog.sciencenet.cn/u/Kevin2015

博文

Reads counting

已有 6056 次阅读 2015-7-6 10:54 |系统分类:科研笔记


关于计算ASE reads counting的方法

方法一:先将bam文件根据snp信息分成maternal or paternal bam,然后依据基因信息做计数,计数方法现有两种

1.     htseq-counthttp://www-huber.embl.de/users/anders/HTSeq/doc/count.html#count

htseq-count [options] <sam_file> <gff_file>

输入文件为sam文件,如果是paired-end数据需要 sort by name.

samtools sort -n file.bam

samtools view –h bamfile.bam > samfile.sam

参数说明:

-m  计数模型,统计reads的时候对一些比较特殊的reads定义是否计入。包括:默认的unionintersection-strict intersection-nonempty具体说明如图所示。(default: union)

 

-s reads是否匹配到同一条链上,默认:yes,可以设置no reversewhether the data is from astrand-specific assay (default: yes)

-t feature type 我理解为最小的计数单位,在gtf或者gff文件中,外显子为最小的定义单      位,对基因计数,只需要将包含的外显子计数相加即可。默认:exon (default,suitable for RNA-Seq analysis using an Ensembl GTF file: exon)

-i 最终的计数单位,一般为基因。默认为:gene_id  也可以设置转录本,但由于模型问题,计数效果不佳。(The default, suitable for RNA-Seqanalysis using an Ensembl GTF file, is gene_id.

-o 输出所有alignmentreads到一个sam文件中。可以不设置。

-q 退出程序

-h 帮助文件

几点说明htseq count Read pairsuse other source not ucsc known genesstick to union

2.      bedtoolsmulticov

reportsthe count of alignments from multiple position-sorted and indexed BAM filesthat overlap intervals in a BED file.

http://bedtools.readthedocs.org/en/latest/content/tools/multicov.html

bedtools multicov [OPTIONS] -bams BAM1 BAM2 BAM3 ... BAMn -bed  <BED/GFF/VCF>

$ bedtools multicov -bams aln1.bam -bedivls-of-interest.bed

chr1 0       10000   ivl1   100

chr1 10000   20000   ivl2   123

chr1 20000   30000   ivl3   213

chr1 30000   40000   ivl4   335

方法二:单纯分析双侧or单侧表达:利用ASEReadCounter 结合vcf计算ref allele alt allele数目。

java -jar GenomeAnalysisTK.jar

 -R reference.fasta

 -T ASEReadCounter

 -o file_name.csv

 -I input.bam

 -sites sites.vcf

 -U ALLOW_N_CIGAR_READS

 [-minDepth 10]

 [--minMappingQuality 10]

 [--minBaseQuality 2]

     [-drf DuplicateRead]

--countOverlapReadsType 参数有

COUNT_READS

Count allreads independently (even if from the same fragment).

COUNT_FRAGMENTS

Count allfragments (even if the reads that compose the fragment are not consistent atthat base).

COUNT_FRAGMENTS_REQUIRE_SAME_BASE

Count allfragments (but only if the reads that compose the fragment are consistent atthat base).




https://wap.sciencenet.cn/blog-2609994-903210.html

上一篇:trio samples analysis using AlleleSeq (未完,待补充)
下一篇:snpEff 对snp进行功能注释
收藏 IP: 37.130.229.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-6-3 12:31

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部