protectdream的个人博客分享 http://blog.sciencenet.cn/u/protectdream

博文

Human Reference Genome

已有 1277 次阅读 2023-7-19 04:43 |个人分类:技术类|系统分类:科研笔记

一、NCBI36 / hg18

1. human_b36

该参考的染色体编号开头不含“chr”,是千人基因组过去使用过的参考基因组,包含EBV病毒序列type1类型(NC_007605)但不含ALT重叠群(alternate loci)。现已弃用,以下分女用、男用两种版本。


(1) human_b36_female


https://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/retired_reference/human_b36_female.fa.gz

​ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/retired_reference/human_b36_female.fa.gz

https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/retired_reference/human_b36_female.fa.gz

​ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/retired_reference/human_b36_female.fa.gz

(2) human_b36_male


https://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/retired_reference/human_b36_male.fa.gz

​ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/retired_reference/human_b36_male.fa.gz

https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/retired_reference/human_b36_male.fa.gz

​ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/retired_reference/human_b36_male.fa.gz

2. Ensembl release 54

Homo_sapiens.NCBI36.54.dna.toplevel,该参考的染色体编号开头不含“chr”。


https://ftp.ensembl.org/pub/release-54/fasta/homo_sapiens/dna/Homo_sapiens.NCBI36.54.dna.toplevel.fa.gz

​ftp.ensembl.org/pub/release-54/fasta/homo_sapiens/dna/Homo_sapiens.NCBI36.54.dna.toplevel.fa.gz

二、GRCh37 / hg19

1. human_g1k_v37(别名:hs37-1kg)

Human g1k v37 是GRCh37系列的基础参考,且该参考的染色体编号开头不含“chr”。


https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz

​ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz

https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz

​ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz

2. Homo_sapiens_assembly19(别名:hs37)

Broad Institute所用参考的类GRCh37版本,介于 human g1k v37 和 hs37d5 之间,它比 human g1k v37 多了EBV病毒序列 NC_007605,但不含 hs37d5 的级联诱饵序列。


https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta

​s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta

3. hs37d5

该参考在 human g1k v37 的基础上增加了 Broad Institute 的名为hs37d5的级联诱饵序列(concatenated decoy sequences,有来自HuRef、BAC或者质粒克隆和NA12878,可以提高序列比对的准确率)和 human herpesvirus 4 type 1 sequence 人类疱疹病毒序列(NC_007605),且该参考也是 Dante Labs 全基因组测序目前使用的参考基因组。该参考的染色体编号开头不含“chr”。


https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz

​ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz

https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz

​ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz

https://www.yfish.org/static/hs37d5.7z

​www.yfish.org/static/hs37d5.7z

4. hg19

(1) YSEQ全基因组测序目前使用的参考,采用长度16569的通用的rCRS线粒体序列:


https://genomes.yseq.net/WGS/ref/hg19/hg19.zip

​genomes.yseq.net/WGS/ref/hg19/hg19.zip

(2) UCSC原版,采用长度16571的旧版的约鲁巴人(Yoruba)线粒体序列,不推荐一般情况下使用:


https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz

​hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz

5. Ensembl release 75

(1) Homo_sapiens.GRCh37.75.dna.primary_assembly,该参考的染色体编号开头不含“chr”,且SN与 human g1k v37 基本一致。


https://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz

​ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz

(2) Homo_sapiens.GRCh37.75.dna.toplevel,该参考的染色体编号开头不含“chr”。


https://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.toplevel.fa.gz

​ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.toplevel.fa.gz

6. build37_used_by_cg

该参考的染色体下载自千人基因组,且编号开头为UCSC样式(“chr”+编号),只有已编排到主序列的部分,不含未定位序列,因此不建议一般情况下使用。


https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/cg_alignment_reference/build37_used_by_cg.fa.gz

​ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/cg_alignment_reference/build37_used_by_cg.fa.gz

https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/cg_alignment_reference/build37_used_by_cg.fa.gz

​ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/cg_alignment_reference/build37_used_by_cg.fa.gz

三、GRCh38 / hg38

1. GCA_000001405.15_GRCh38_no_alt_analysis_set(别名:hs38)

该参考的染色体编号开头包含“chr”前缀,比 GCA_000001405.15_GRCh38_full_analysis_set 序列少了可能影响读取映射器的ALT重叠群(alternate loci),且比 GRCh38 primary assembly 多出EBV病毒序列以作诱饵,更适合一般情况下的参考选用,且该参考也是 Nebula 全基因组测序目前使用的参考。


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz

https://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz

2. GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set(别名:hs38d1)

该参考比 GCA_000001405.15 GRCh38 no alt analysis set 多了哈佛医学院提交到NCBI的 hs38d1 诱饵序列(包括未加入人类基因组的架构、分离自254个公共SGDP样本的全基因组鸟枪法测序序列)。


(1) NCBI官网的版本:


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna.gz

(2) 其他来源,比NCBI的参考多了多种病毒序列:


https://www.yfish.org/static/hs38d1.7z

​www.yfish.org/static/hs38d1.7z

3. GCA_000001405.15_GRCh38_full_analysis_set(别名:hs38a)

该参考比UCSC的hg38多了EBV病毒的序列(chrEBV),且比 GCA_000001405.15 GRCh38 no alt analysis set 多了ALT重叠群。


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_analysis_set.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_analysis_set.fna.gz

https://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_analysis_set.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_analysis_set.fna.gz

4. GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set

该参考比 GCA_000001405.15 GRCh38 full analysis set 多了hs38d1的诱饵序列,也比 GCA_000001405.15 GRCh38 no alt plus hs38d1 analysis set 多了ALT重叠群。


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna.gz

https://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna

​ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna

5. GRCh38_full_analysis_set_plus_decoy_hla(别名:hs38DH、GRCh38DH)

该参考比 GCA_000001405.15 GRCh38 full plus hs38d1 analysis set 多了大量HLA分型的序列,且比 GCA000001405.15 GRCh38 no alt analysis set 多了ALT重叠群、hs38d1的诱饵序列、HLA分型所在序列,同时该参考也被用作古人DNA(aDNA)的cram数据的参考基因。在Broad,也叫Homo_sapiens_assembly38。


https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

​ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

​ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

6. hg38

https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz

​hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz

https://genomes.yseq.net/WGS/ref/hg38/hg38.fa

​genomes.yseq.net/WGS/ref/hg38/hg38.fa

7. Homo_sapiens_assembly38_noALT_noHLA_noDecoy_ERCC

该参考在 GCA_000001405.15_GRCh38_no_alt_analysis_set 的基础上增加了核酸内切酶非催化亚基序列ERCC。


https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGDP_transcriptome/working/HGDP_transcriptome_GRCh38/reference/Homo_sapiens_assembly38_noALT_noHLA_noDecoy_ERCC.fasta

​ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGDP_transcriptome/working/HGDP_transcriptome_GRCh38/reference/Homo_sapiens_assembly38_noALT_noHLA_noDecoy_ERCC.fasta

8. Homo_sapiens_assembly38(别名:hs38DH、GRCh38DH)

Broad Institute所用参考的类hg38版本,碱基序列与 GRCh38 full analysis set plus decoy hla 基本相同。


https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta

​storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta

9. Ensembl release 106

(1) Homo_sapiens.GRCh38.dna.primary_assembly,该参考的SN不含EBV病毒序列,且该参考的染色体编号开头不含“chr”,但其他部分与 GCA_000001405.15 GRCh38 no alt analysis set 相对一致。


https://ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

​ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

(2) Homo_sapiens.GRCh38.dna.toplevel,该参考的染色体编号开头不含“chr”。


https://ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

​ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

10. hs38s

该参考在 GCA_000001405.15 GRCh38 no alt plus hs38d1 analysis set 的基础上多了包含GSTT1基因的 22_KI270879v1_alt 序列,其他染色体编号也在hs38d1参考的基础上去掉前缀“chr”,并且线粒体编号用MT来表示。这也是http://Sequencing.com测序机构所使用的参考基因组。


https://api.onedrive.com/v1.0/shares/s!AgorjTSMFYpjgR0QualUlHx53-0U/root/content

​api.onedrive.com/v1.0/shares/s!AgorjTSMFYpjgR0QualUlHx53-0U/root/content

四、T2T-CHM13

与其他参考相比,Telomere-to-Telomere(T2T)机构已实现从端粒到端粒的完整测序,填补了传统测序的残留空白,但T2T仍处于实验阶段,且可能存在单个位点错误等问题。详细资料建议自行查阅。2021年4月,有关T2T-CHM13的完整人类参考相关论文已被发布到《科学》杂志。


1. CHM13_v1.1

CHM13 T2T v1.1 参考的“chr+数字”命名染色体+线粒体的版本,不含Y染色体。


https://processing.open-genomes.org/reference/CP086569.1-CHM13/CHM13_v1.1.fa

​processing.open-genomes.org/reference/CP086569.1-CHM13/CHM13_v1.1.fa

https://processing.open-genomes.org/reference/CM034974.1-CHM13/CHM13_v1.1.fa

​processing.open-genomes.org/reference/CM034974.1-CHM13/CHM13_v1.1.fa

https://processing.open-genomes.org/reference/CP086569.2-CHM13/CHM13_v1.1.fa

​processing.open-genomes.org/reference/CP086569.2-CHM13/CHM13_v1.1.fa

2. CP086569.1-CHM13_v1.1

该参考在 CHM13 T2T v1.1 的基础上增加德系犹太人NA24385样本作Y染色体参考,默认父系单倍群为J1-ZS2712,且常染色体、X染色体、线粒体命名前缀包含“chr”,Y染色体命名为CP086569.1。


https://processing.open-genomes.org/reference/CP086569.2-CHM13/CHM13_v1.1.fa

​processing.open-genomes.org/reference/CP086569.2-CHM13/CHM13_v1.1.fa

3. T2T-v2.0

官方 CHM13 T2T v2.0 的全部参考基因组,Y染色体序列自带NA24385样本的第二版(CP086569.2),且染色体和线粒体命名含前缀“chr”。


https://processing.open-genomes.org/reference/CP086569.2-CHM13/T2T-v2.0.fa

​processing.open-genomes.org/reference/CP086569.2-CHM13/T2T-v2.0.fa

4. CM034974.1-CHM13_v1.1

该非官方参考在 CHM13 T2T v1.1 的基础上增加了样本HG01243的Y染色体作参考,默认父系单倍群为R1b-DF27,且常染色体、X染色体、线粒体命名前缀包含“chr”,Y染色体命名为CM034974.1且该Y染色体更接近GRCh38。


https://processing.open-genomes.org/reference/CM034974.1-CHM13/CM034974.1-CHM13_v1.1.fa

​processing.open-genomes.org/reference/CM034974.1-CHM13/CM034974.1-CHM13_v1.1.fa

5. T2T-CHM13v2.0(Genome Informatics Section版本)

(1) CHM13v2.0


T2T-CHM13v2.0 参考本体,染色体X、Y部分重复假常染色体区,且序列名已转换为UCSC样式(“chr”+编号)。


https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz

​s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz

(2) CHM13v2.0_noY


该参考不含Y染色体,即 T2T-CHM13v1.1。


https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_noY.fa.gz

​s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_noY.fa.gz

(3) CHM13v2.0_maskedY


该参考Y染色体上的假常染色体区(PAR)即同源区被一长串的字母“N”硬屏蔽。


https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_maskedY.fa.gz

​s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_maskedY.fa.gz

(4) CHM13v2.0_maskedY_rCRS


该参考Y染色体上的假常染色体区(PAR)即同源区被一长串的字母“N”硬屏蔽,并且本参考的线粒体被rCRS的线粒体模型NC_012920.1/J01415.2替换(rCRS也被用于GRCh37/GRCh38/hg38)。


https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_maskedY_rCRS.fa.gz

​s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_maskedY_rCRS.fa.gz

6. hs1

T2T-CHM13v2.0 参考的“chr+染色体/线粒体编号”命名版本,阅读起来相对方便。


https://hgdownload.cse.ucsc.edu/goldenPath/hs1/bigZips/hs1.fa.gz

​hgdownload.cse.ucsc.edu/goldenPath/hs1/bigZips/hs1.fa.gz

五、诱饵序列

1. hs37d5cs

hs37d5 的级联诱饵序列(concatenated decoy sequences),有来自HuRef、BAC或者质粒克隆和NA12878,SN仅以一条“hs37d5”单独命名,且其中的各种序列之间以长度若干的N相连。该诱饵序列已被用于hs37d5参考主序列中。


https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/

​ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/

https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5cs.fa.gz

​ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5cs.fa.gz

2. hs37d5ss

hs37d5 的非级联诱饵序列,其中的每条序列均单独存在,这一点类似于hs38d1诱饵序列。


https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5ss.fa.gz

​ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5ss.fa.gz

https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5ss.fa.gz

​ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5ss.fa.gz

3. GCA_000786075.2_hs38d1_genomic

hs38d1 的几千条单独存在的非级联诱饵序列,包括未加入人类基因组的架构、分离自254个公共SGDP样本的全基因组鸟枪法测序序列。其命名不含“chr”前缀和“decoy”后缀。


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/786/075/GCA_000786075.2_hs38d1/GCA_000786075.2_hs38d1_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/786/075/GCA_000786075.2_hs38d1/GCA_000786075.2_hs38d1_genomic.fna.gz

4. GRCh38_full_analysis_set_plus_decoy_hla-extra(别名:hs38DH-extra)

在诱饵序列 hs38d1 的基础上增加了与HLA分型有关的序列以作为类似于ALT重叠群(alternate loci)的存在,但诱饵命名包含“chr”前缀和“decoy”后缀。该诱饵序列已被用于hs38DH参考主序列中。


https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla-extra.fa

​ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla-extra.fa

https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla-extra.fa

​ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla-extra.fa

5. EBVt1(别名:NC_007605、HHV-1)

即 NC_007605.1,human herpesvirus 4 type 1 sequence 人类疱疹病毒序列,它不属于人类基因组,但可以增加全基因组检测结果的准确度(尤其是唾液样本)。


https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/EBVt1.fa.gz

​ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/EBVt1.fa.gz

https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/EBVt1.fa.gz

​ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/EBVt1.fa.gz

六、其他

以下参考不常用,因此归为一类。比如 GCA_000001405.28 GRCh38.p13 genomic 的2号染色体以GenBank-Accn编号“CM000664.1”表示,而不是常规的“2”或“chr2”;比如 GCF_000001405.39 GRCh38.p13 genomic 的2号染色体以RefSeq-Accn编号“NC_000002.12”表示,用“NT_187361.1”表示“chr1_KI270706v1_random”;以及比如不属于NCBI GRC或UCSC系列的参考基因组“CHM13”“NA12878_prelim”等。以下只列举一部分链接:


1. hg38_CP086569

该混合参考的常染色体(1~22号)、X染色体和线粒体使用hg38序列,Y染色体使用T2T的CP086569.1序列,且不含未定位在主要序列的hg38序列片段。


https://ybrowse.org/gbrowse2/gff/CP086569.1/hg38_CP086569.fasta

​ybrowse.org/gbrowse2/gff/CP086569.1/hg38_CP086569.fasta

2. NeandertalizedReference

尼安德特人化的智人参考基因组。该参考的非线粒体部分基因长度与hs37d5长度一致,但参考碱基改为了与古人类——尼安德特人一致的内容,且线粒体长度不等(17569)、增加了肠杆菌噬菌体phiX序列。


https://cdna.eva.mpg.de/neandertal/Hohlenstein-Stadel/NeandertalizedReference.fa

​cdna.eva.mpg.de/neandertal/Hohlenstein-Stadel/NeandertalizedReference.fa

3. HG01243_v3

https://api.onedrive.com/v1.0/shares/s!AgorjTSMFYpjgT_cFUVNNMz6QoTX/root/content

​api.onedrive.com/v1.0/shares/s!AgorjTSMFYpjgT_cFUVNNMz6QoTX/root/content

4. NCBI收录分类:GCF

(1) GCF_000001405.25_GRCh37.p13_genomic


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.fna.gz

(2) GCF_000001405.40_GRCh38.p14_genomic


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_genomic.fna.gz

(3) GCF_009914755.1_T2T-CHM13v2.0_genomic


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.fna.gz

(4) CHM1_1.1


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/306/695/GCF_000306695.2_CHM1_1.1/GCF_000306695.2_CHM1_1.1_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/306/695/GCF_000306695.2_CHM1_1.1/GCF_000306695.2_CHM1_1.1_genomic.fna.gz

(5) HuRef


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/125/GCF_000002125.1_HuRef/GCF_000002125.1_HuRef_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/125/GCF_000002125.1_HuRef/GCF_000002125.1_HuRef_genomic.fna.gz

5. NCBI收录分类:GCA

(1) GCA_009914755.4_T2T-CHM13v2.0_genomic


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.4_T2T-CHM13v2.0/GCA_009914755.4_T2T-CHM13v2.0_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.4_T2T-CHM13v2.0/GCA_009914755.4_T2T-CHM13v2.0_genomic.fna.gz

(2) GCA_009914755.3_T2T-CHM13v1.1_genomic


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.3_T2T-CHM13v1.1/GCA_009914755.3_T2T-CHM13v1.1_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.3_T2T-CHM13v1.1/GCA_009914755.3_T2T-CHM13v1.1_genomic.fna.gz

(3) GCA_009914755.2_T2T-CHM13v1.0_genomic


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.2_T2T-CHM13v1.0/GCA_009914755.2_T2T-CHM13v1.0_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.2_T2T-CHM13v1.0/GCA_009914755.2_T2T-CHM13v1.0_genomic.fna.gz

(4) GCA_000001405.29_GRCh38.p14_genomic


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.29_GRCh38.p14/GCA_000001405.29_GRCh38.p14_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.29_GRCh38.p14/GCA_000001405.29_GRCh38.p14_genomic.fna.gz

(5) GCA_000001405.15_GRCh38_genomic


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/GCA_000001405.15_GRCh38_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/GCA_000001405.15_GRCh38_genomic.fna.gz

(6) CHM1_1.1


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/306/695/GCA_000306695.2_CHM1_1.1/GCA_000306695.2_CHM1_1.1_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/306/695/GCA_000306695.2_CHM1_1.1/GCA_000306695.2_CHM1_1.1_genomic.fna.gz

(7) HuRef


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/002/125/GCA_000002125.2_HuRef/GCA_000002125.2_HuRef_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/002/125/GCA_000002125.2_HuRef/GCA_000002125.2_HuRef_genomic.fna.gz

(8) NA12878_prelim_3.0


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/077/035/GCA_002077035.3_NA12878_prelim_3.0/GCA_002077035.3_NA12878_prelim_3.0_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/077/035/GCA_002077035.3_NA12878_prelim_3.0/GCA_002077035.3_NA12878_prelim_3.0_genomic.fna.gz

(9) NA19240_prelim_3.0


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/524/155/GCA_001524155.4_NA19240_prelim_3.0/GCA_001524155.4_NA19240_prelim_3.0_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/524/155/GCA_001524155.4_NA19240_prelim_3.0/GCA_001524155.4_NA19240_prelim_3.0_genomic.fna.gz

(10) HG00514_prelim_3.0


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/180/035/GCA_002180035.3_HG00514_prelim_3.0/GCA_002180035.3_HG00514_prelim_3.0_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/180/035/GCA_002180035.3_HG00514_prelim_3.0/GCA_002180035.3_HG00514_prelim_3.0_genomic.fna.gz

(11) HG00733_prelim_1.0


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/208/065/GCA_002208065.1_HG00733_prelim_1.0/GCA_002208065.1_HG00733_prelim_1.0_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/208/065/GCA_002208065.1_HG00733_prelim_1.0/GCA_002208065.1_HG00733_prelim_1.0_genomic.fna.gz

(12) YH_2.0(炎黄)


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/004/845/GCA_000004845.2_YH_2.0/GCA_000004845.2_YH_2.0_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/004/845/GCA_000004845.2_YH_2.0/GCA_000004845.2_YH_2.0_genomic.fna.gz

(13) KOREF1.0


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/712/695/GCA_001712695.1_KOREF1.0/GCA_001712695.1_KOREF1.0_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/712/695/GCA_001712695.1_KOREF1.0/GCA_001712695.1_KOREF1.0_genomic.fna.gz

(14) GCA_018873775.2_hg01243.v3.0_genomic


https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/018/873/775/GCA_018873775.2_hg01243.v3.0/GCA_018873775.2_hg01243.v3.0_genomic.fna.gz

​ftp.ncbi.nlm.nih.gov/genomes/all/GCA/018/873/775/GCA_018873775.2_hg01243.v3.0/GCA_018873775.2_hg01243.v3.0_genomic.fna.gz

…………


【注】


文中列出的 Ensembl 参考仅为NCBI36、GRCh37的最终版,以及GRCh38的最新版,如果需要从Ensembl下载其他release,可进入如下目录来进行选择:

https://ftp.ensembl.org/pub/

2. 以下几种参考基因组也被用于千人基因组(1000genomes)WGS数据的主要参考,后三种推荐在一般情况下使用:


human_b36 (已淘汰)

human_g1k_v37

hs37d5

hs38 (GCA_000001405.15_GRCh38_no_alt_analysis_set,但EBI的官方链接已被移除)

hs38DH (GRCh38_full_analysis_set_plus_decoy_hla)

3. Fasta参考文件的本体既可以直接使用,也可以作为bgzip压缩的gz格式使用。


4. 更多不常见的fasta参考也可以通过在这里逐级搜索对应GCA或GCF的编号下载到(可在 NCBI Genome Remapping Service 的 Source Orgarism 输入 Homo Sapiens 找出编号):


https://ftp.ncbi.nlm.nih.gov/genomes/all/

https://www.ncbi.nlm.nih.gov/genome/tools/remap

5. hg18的完整版参考被官网移除,因此hg18仅存的的染色体参考版本链接如下(可手动拼接成完整的参考):


Index of /goldenPath/hg18/chromosomes

​hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/

6. Illumina官网也有一部分参考基因组文件,找到Homo Sapiens(智人)所在位置后,根据需要下载并使用即可:


iGenomes

​support.illumina.com.cn/sequencing/sequencing_software/igenome.html





https://wap.sciencenet.cn/blog-2866696-1395854.html

上一篇:Can you be close without sex?
下一篇:MBTI 16型人格——所有类型特点
收藏 IP: 134.174.250.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-27 14:27

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部