|||
Researchers already knew that 1.5 percent of the genome codes for proteins. ENCODE found that an additional 8.5 percent codes for regions where proteins stick to DNA, presumably regulating gene transcription. And, because ENCODE hasn’t looked at every possible type of cell or every possible protein that sticks to DNA, this figure is likely conservative. Birney estimates that the total proportion of the genome that either creates a protein or sticks to one is around 20 percent.
The rest of the functional elements in the ENCODE analysis cover other classes of sequence that were thought to be essentially functionless, including introns. “The idea that introns are definitely deadweight isn’t true,” said Birney. Even some repetitive sequences—small chunks of DNA that have the ability to copy themselves and are typically viewed as parasites—are likely to be functional, often containing sequences where proteins can bind to influence the activity of nearby genes. Perhaps their spread across the genome represents not the invasion of a parasite, but a way of spreading control. “These parasites can be subverted sometimes,” Birney said.
Birney expects that many skeptics will argue about the exact proportion—the 80 percent of the genome that ENCODE estimates to be doing something—and about the definition of “functional.” But, he said, “no matter how you cut it, we’ve got to get used to the fact that there’s a lot more going on with the genome than we knew.”
What’s in a gene?
The simplistic view of a gene is that it’s a stretch of DNA that is transcribed to make a protein. But with ENCODE’s data, this definition no longer makes sense. There are a lot of transcripts, probably more than anyone had realized, some of which connect two previously unconnected genes. This means that the boundaries for those genes have to widen, and the gaps between them shrink or disappear.
Gingeras says that this “intergenic” space has shrunk by a factor of four. “A region that was once called Gene X is now melded to Gene Y,” he says. With such blurring boundaries, Gingeras thinks that it no longer makes sense to think of a gene as a specific point in the genome, or as its basic unit. Instead, that honor falls to the RNA transcript. “The atom of the genome is the transcript,” says Gingeras. “They are the basic unit that’s affected by mutation and selection.”
New disease leads
For the last decade, geneticists have run a seemingly endless stream of genome-wide association studies (GWAS), and have thrown up a long list of single nucleotide polymorphisms (SNPs) that correlate with the risk of different conditions. The ENCODE team has mapped all of these GWAS-identified SNPs to their data.
The researchers found that just 12 percent of known SNPs lie within protein-coding areas. They also showed that compared to random SNPs, the disease-associated ones are 60 percent more likely to lie within the non-coding but functional regions that ENCODE identified, especially in promoters and enhancers. This suggests that many of these variants are controlling the activity of different genes, and provides many fresh leads for understanding how they affect our risk of disease. “It was one of those too good to be true moments,” said Birney. “Literally, I was in the room [when they got the result] and I went: Yes!”
The ENCODE researchers also found new links between disease-associated SNPs and specific DNA elements. For example, they found five SNPs that increase the risk of Crohn’s disease, and that are recognized by a group of transcription factors called GATA2. “That wasn’t something that the Crohn’s disease biologists had on their radar,” Birney said. “Suddenly we’ve made an unbiased association between a disease and a piece of basic biology.”
“We’re now working with lots of different disease biologists looking at their data sets,” he added. “In some sense, ENCODE is working from the genome out, while GWAS studies are working from disease in.” So far, the team has identified 400 such hotspots that are worth looking into.
The 3-D genome
Writing the genome out as a string of letters invites a common fallacy: that it’s a two-dimensional, linear entity. In reality, DNA is wrapped around proteins called histones like beads on a string. These are then twisted, folded and looped in an intricate three-dimensional way. In this way, distant parts of the genome can actually be physical neighbors, and can affect each other’s activity.
Job Dekker, a bioinformaticist at University of Massachussetts Medical School,used ENCODE data to map these long-range interactions across just 1 percent of the genome in three different types of cell, and discovered more than 1,000 of them. “I like to say that nothing in the genome makes sense, except in 3D,” said Dekker. The availability of the new ENCODE data is “really a teaser for the future of genome science,” he added.
转自:http://the-scientist.com/2012/09/05/getting-to-know-the-genome/Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-12-13 11:04
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社