|
关于计算ASE reads counting的方法
方法一:先将bam文件根据snp信息分成maternal or paternal bam,然后依据基因信息做计数,计数方法现有两种
1. htseq-count(http://www-huber.embl.de/users/anders/HTSeq/doc/count.html#count)
htseq-count [options] <sam_file> <gff_file>
输入文件为sam文件,如果是paired-end数据需要 sort by name.
samtools sort -n file.bam
samtools view –h bamfile.bam > samfile.sam
参数说明:
-m 计数模型,统计reads的时候对一些比较特殊的reads定义是否计入。包括:默认的union和intersection-strict、 intersection-nonempty具体说明如图所示。(default: union)
-s reads是否匹配到同一条链上,默认:yes,可以设置no 、 reverse。whether the data is from astrand-specific assay (default: yes)
-t feature type 我理解为最小的计数单位,在gtf或者gff文件中,外显子为最小的定义单 位,对基因计数,只需要将包含的外显子计数相加即可。默认:exon (default,suitable for RNA-Seq analysis using an Ensembl GTF file: exon)
-i 最终的计数单位,一般为基因。默认为:gene_id 也可以设置转录本,但由于模型问题,计数效果不佳。(The default, suitable for RNA-Seqanalysis using an Ensembl GTF file, is gene_id.)
-o 输出所有alignment的reads到一个sam文件中。可以不设置。
-q 退出程序
-h 帮助文件
(几点说明:htseq count Read pairs;use other source not ucsc known genes;stick to union)
2. bedtoolsmulticov:
reportsthe count of alignments from multiple position-sorted and indexed BAM filesthat overlap intervals in a BED file.
(http://bedtools.readthedocs.org/en/latest/content/tools/multicov.html)
bedtools multicov [OPTIONS] -bams BAM1 BAM2 BAM3 ... BAMn -bed <BED/GFF/VCF>
$ bedtools multicov -bams aln1.bam -bedivls-of-interest.bed
chr1 0 10000 ivl1 100
chr1 10000 20000 ivl2 123
chr1 20000 30000 ivl3 213
chr1 30000 40000 ivl4 335
方法二:单纯分析双侧or单侧表达:利用ASEReadCounter 结合vcf计算ref allele 和 alt allele数目。
java -jar GenomeAnalysisTK.jar
-R reference.fasta
-T ASEReadCounter
-o file_name.csv
-I input.bam
-sites sites.vcf
-U ALLOW_N_CIGAR_READS
[-minDepth 10]
[--minMappingQuality 10]
[--minBaseQuality 2]
[-drf DuplicateRead]
--countOverlapReadsType 参数有
COUNT_READS:
Count allreads independently (even if from the same fragment).
COUNT_FRAGMENTS:
Count allfragments (even if the reads that compose the fragment are not consistent atthat base).
COUNT_FRAGMENTS_REQUIRE_SAME_BASE:
Count allfragments (but only if the reads that compose the fragment are consistent atthat base).
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-6-3 12:31
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社