路漫漫其修远兮分享 http://blog.sciencenet.cn/u/zhpd55 追求科学,勇于探索,苦海无涯,愿作小舟。

博文

基因组高引文献检索结果(被引频次>7000次)

已有 3000 次阅读 2018-5-13 04:27 |个人分类:新观察|系统分类:科研笔记| 基因组, 高引文献, 被引频次

基因组高引文献检索结果(被引频次>7000次)

诸平

如果我们从1920德国汉堡大学(University of Hamburg)植物学教授汉斯·温克勒(Hans Winkler, 1877-1945)首次使用基因组(genome)这一名词算起,至今也不足百年。但是近百年来与基因组相关的研究却发生了翻天覆地的变化。我们以PubMed数据库收录量为例,近几十年每年以千篇以上的文献递增,特别是进入21世纪以来,每年收录文献数量的变化几乎呈现出直线上升的发展势头。与基因组相关的基因组学(genomics)研究文献的变化,也有类似的趋势,详见图1所示。

GENOM-genomic.jpg

1 PUBMED数据库基因组(genome)文献数量之变

1. 高引论文(被引频次>7000次)

Initial sequencing and analysis of the human genome.

Feb 15 2001, Nature, volume 409, issue 6822, pp 860-921

Authors:

  Show all  

The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

Citations (22,357) *  

 

Cluster analysis and display of genome-wide expression patterns

Dec 8 1998,Proceedings of the National Academy of Sciences of the United States of America,volume 95, issue 25,pp 14863-14868

Authors:

+-1 others  

A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is de- scribed that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be inter- preted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly charac- terized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.
 Citations (17,684) *  

 

The Sequence of the Human Genome 

Feb 16 2001,Science,volume 291,issue 5507,pp 1304-1351.

Authors:

J. Craig Venter (Celera Corporation)

Mark D. Adams (Celera Corporation)

Eugene W. Myers (Celera Corporation)

Peter W. Li (Celera Corporation)

Richard J. Mural (Celera Corporation) 

+268 others  

A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional ∼12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

Citations (15,813) *  

 

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles 

Oct 25 2005, Proceedings of the National Academy of Sciences of the United States of America, volume 102, issue 43, pp 15545-15550.

Authors:

Aravind Subramanian (Broad Institute)

Pablo Tamayo (University of California, San Diego)

Vamsi K. Mootha (Harvard University)

Sayan Mukherjee (Duke University)

Benjamin L. Ebert (Brigham and Women's Hospital) 

+6 others  

Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.   

 

Citations (14,000) *

 

DnaSP v5

Jun 1 2009, Bioinformatics, volume 25,issue 11, pp 1451-1452

Authors:

Pablo Librado (University of Barcelona)

Julio Rozas (University of Barcelona) 

+-3 others  

Motivation: DnaSP is a software package for a comprehensive analysis of DNA polymorphism data. Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets. Among other features, the newly implemented methods allow for: (i) analyses on multiple data files; (ii) haplotype phasing; (iii) analyses on insertion/deletion polymorphism data; (iv) visualizing sliding window results integrated with available genome annotations in the UCSC browser. Availability: Freely available to academic users from: http://www.ub.edu/dnasp 

 

Citations (13,125) *

KEGG: Kyoto Encyclopedia of Genes and Genomes

Jan 1 1999, Nucleic Acids Research, volume 28, issue 1, pp 27-30

Authors:

Minoru Kanehisa (Kyoto University)

Susumu Goto (Kyoto University)

+-3 others  

KEGG (Kyoto Encyclopedia of Genes and Genomes) is a knowledge base for systematic analysis of gene functions, linking genomic information with higher order functional information. The genomic information is stored in the GENES database, which is a collection of gene catalogs for all the completely sequenced genomes and some partial genomes with up-to-date annotation of gene functions. The higher order functional information is stored in the PATHWAY database, which contains graphical representations of cellular processes, such as metabolism, membrane transport, signal transduction and cell cycle. The PATHWAY database is supplemented by a set of ortholog group tables for the information about conserved subpathways (pathway motifs), which are often encoded by positionally coupled genes on the chromosome and which are especially useful in predicting gene functions. A third database in KEGG is LIGAND for the information about chemical compounds, enzyme molecules and enzymatic reactions. KEGG provides Java graphics tools for browsing genome maps, comparing two genome maps and manipulating expression maps, as well as computational tools for sequence comparison, graph comparison and path computation. The KEGG databases are daily updated and made freely available (http://www.genome.ad.jp/kegg/ ).

 Citations (10,756) *  

 

Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.

Jan 1 2000, Nature, volume 408, issue 6814, pp 796-815

Authors:

Arabidopsis Genome Initiative (J. Craig Venter Institute) 

+-4 others

The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans - the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

Citations (8,398) *  

Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence.

Jan 11 1998, Nature, volume 393, issue 6685, pp 537-544

Authors:

+37 others  

Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation. 

 Citations (7,986) *

The Complete Genome Sequence of Escherichia coli K-12

Sep 5 1997, Science, volume 277, issue 5331, pp 1453-1462

Authors:

+12 others  

The 4,639,221‐ base pair sequence of Escherichia coli K-12 is presented. Of 4288 protein-coding genes annotated, 38 percent have no attributed function. Comparison with five other sequenced microbes reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident. The largest family of paralogous proteins contains 80 ABC transporters. The genome as a whole is strikingly organized with respect to the local direction of replication; guanines, oligonucleotides possibly related to replication and recombination, and most genes are so oriented. The genome also contains insertion sequence (IS) elements, phage remnants, and many other patches of unusual composition indicating genome plasticity through horizontal transfer. 

 

Citations (7,758) *

Fingerprinting genomes using PCR with arbitrary primers
Jan 1 1990, Nucleic Acids Research, volume 18, issue 24, pp 7213-7218

Authors:

+-3 others

 Simple and reproducible fingerprints of complex genomes can be generated using single arbitrarily chosen primers and the polymerase chain reaction (PCR). No prior sequence information is required. The method, arbitrarily primed PCR (AP-PCR), involves two cycles of low stringency amplification followed by PCR at higher stringency. We show that strains can be distinguished by comparing polymorphisms in genomic fingerprints. The generality of the method is demonstrated by application to twenty four strains from five species of Staphylococcus, eleven strains of Streptococcus pyogenes and three varieties of Oryza sativa (rice).

Citations (7,693) *

Mass spectrometry-based proteomics

Mar 1 2003,  Nature, volume 422, issue 6928, pp 198-207

Authors:

+-3 others  

Recent successes illustrate the role of mass spectrometry-based proteomics as an indispensable tool for molecular and cellular biology and for the emerging field of systems biology. These include the study of protein–protein interactions via affinity-based isolations on a small and proteome-wide scale, the mapping of numerous organelles, the concurrent description of the malaria parasite genome and proteome, and the generation of quantitative protein profiles from diverse species. The ability of mass spectrometry to identify and, increasingly, to precisely quantify thousands of proteins from complex samples can be expected to impact broadly on biology and medicine.   

 Citations (7,139) *

Sequencing technologies - the next generation.

Jan 1 2010, Nature Reviews Genetics, volume 11, issue 1, pp 31-46

Authors:

Michael L. Metzker (Human Genome Sequencing Center) 

+-4 others  

There is an increasing demand for next-generation sequencing technologies that rapidly deliver high volumes of accurate genome information at a low cost. This Review provides a guide to the features of the different platforms, and describes the recent advances in this fast-moving area.
Citations (7,055) * 

 2 主要作者  

Eric S. Lander

Richard Wilson

David Haussler

Elaine R. Mardis

Ewan Birney

W. James Kent

Steven L. Salzberg

Peer Bork

Robert S. Fulton

J. Craig Venter

Evan E. Eichler

Francis S. Collins

Jean Weissenbach

Marco A. Marra

Huanming Yang

Mark S. Guyer

LaDeana W. Hillier

Tim Hubbard

Roderic Guigó

Jeremy Schmutz


3  Eric S. Lander的高引论文(被引频次>5000次)

Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles

…, SL Pomeroy, TR Golub, ES Lander… - Proceedings of the …, 2005 - National Acad Sciences

Although genomewide RNA expression analysis has become a routine tool in biomedical
research, extracting biological insight from such information remains a major challenge.
Here, we describe a powerful analytical method called Gene Set Enrichment Analysis …

Cited by 13946 Related articles All 45 versions

[PDF] marcottelab.org

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring

…, MA Caligiuri, CD Bloomfield, ES Lander - …, 1999 - science.sciencemag.org

Although cancer classification has improved over the past 30 years, there has been no
general approach for identifying new cancer classes (class discovery) or for assigning
tumors to known classes (class prediction). Here, a generic approach to cancer classification …

Cited by 12593 Related articles All 82 versions

MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations

ES Lander, P Green, J Abrahamson, A Barlow, MJ Daly… - Genomics, 1987 - Elsevier

With the advent of RFLPs, genetic linkage maps are now being assembled for a number of
organisms including both inbred experimental populations such as maize and outhred
natural populations such as humans. Accurate construction of such genetic maps requires …

Cited by 7456 Related articles All 13 versions

[PDF] genetics.org

Mapping mendelian factors underlying quantitative traits using RFLP linkage maps.

ES Lander, D Botstein - Genetics, 1989 - Genetics Soc America

The advent of complete genetic linkage maps consisting of codominant DNA markers
[typically restriction fragment length polymorphisms (RFLPs)] has stimulated interest in the
systematic genetic dissection of discrete Mendelian factors underlying quantitative traits in …

Cited by 5858 Related articles All 29 versions

[PDF] mit.edu

The structure of haplotype blocks in the human genome

…, A Adeyemo, R Cooper, R Ward, ES Lander… - …, 2002 - science.sciencemag.org

Haplotype-based methods offer a powerful approach to disease gene mapping, based on
the association between causal mutations and the ancestral haplotypes on which they
arose. As part of The SNP Consortium Allele Frequency Projects, we characterized …

Cited by 5406 Related articles All 21 versions 

4  主要研究机构

Broad Institute

Massachusetts Institute of Technology

Harvard University

National Institutes of Health

Wellcome Trust

Washington University in St. Louis

Baylor College of Medicine

University of Oxford

University of California, Santa Cruz

University of Washington

Stanford University

Cornell University

European Bioinformatics Institute

University of California, San Diego

University of Michigan

Max Planck Society

Cold Spring Harbor Laboratory

Affymetrix

Institute for Systems Biology

Case Western Reserve University



https://wap.sciencenet.cn/blog-212210-1113687.html

上一篇:计算机科学高引论著top300
下一篇:高考择业比择校更重要
收藏 IP: 74.136.220.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-27 21:23

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部