View more articles about

"These non-coding RNAs have been called the 'dark matter' of the genome because, just like the dark matter of the universe, they are massive in terms of coverage—making up over 95 percent of the human genome," says B. Franklin Pugh. (Credit: Alex Pang/Flickr)

genomes ,

Team finds origin of genome’s ‘dark matter’

Non-coding RNA comes from at the same locations along the human genome as coding RNA, researchers have discovered.

“Dark matter” does not contain the blueprint for making proteins and yet it makes up more than 95 percent of the human genome.

The team’s findings eventually may help to pinpoint exactly where complex-disease traits reside, since the genetic origins of many diseases reside outside of the coding region of the genome.


B. Franklin Pugh, chair in molecular biology at Penn State, and postdoctoral scholar Bryan Venters, now at Vanderbilt University, performed the research, which appears early online today in Nature.

In their research, Pugh and Venters set out to identify the precise location of the beginnings of transcription—the first step in the expression of genes into proteins.

“During transcription, DNA is copied into RNA—the single-stranded genetic material that is thought to have preceded the appearance of DNA on Earth—by an enzyme called RNA polymerase and, after several more steps, genes are encoded and proteins eventually are produced,” Pugh explains.

‘Initiation machines’

He adds that, in their quest to learn just where transcription begins, other scientists had looked directly at RNA. However, Pugh and Venters instead determined where along human chromosomes the proteins that initiate transcription of the non-coding RNA were located.

“We took this approach because so many RNAs are rapidly destroyed soon after they are made, and this makes them hard to detect,” Pugh says. “So rather than look for the RNA product of transcription we looked for the ‘initiation machine’ that makes the RNA. This machine assembles RNA polymerase, which goes on to make RNA, which goes on to make a protein.”

Pugh adds that he and Venters were stunned to find 160,000 of these “initiation machines,” because humans only have about 30,000 genes. “This finding is even more remarkable, given that fewer than 10,000 of these machines actually were found right at the site of genes. Since most genes are turned off in cells, it is understandable why they are typically devoid of the initiation machinery.”

The remaining 150,000 initiation machines—those Pugh and Venters did not find right at genes—remained somewhat mysterious. “These initiation machines that were not associated with genes were clearly active since they were making RNA and aligned with fragments of RNA discovered by other scientists,” Pugh says. “In the early days, these fragments of RNA were generally dismissed as irrelevant since they did not code for proteins.”

Pugh adds that it was easy to dismiss these fragments because they lacked a feature called polyadenylation—a long string of genetic material, adenosine bases—that protect the RNA from being destroyed.

Not just junk

Pugh and Venters further validate their surprising findings by determining that these non-coding initiation machines recognized the same DNA sequences as the ones at coding genes, indicating that they have a specific origin and that their production is regulated, just like it is at coding genes.

“These non-coding RNAs have been called the ‘dark matter’ of the genome because, just like the dark matter of the universe, they are massive in terms of coverage—making up over 95 percent of the human genome. However, they are difficult to detect and no one knows exactly what they all are doing or why they are there,” Pugh says.

“Now at least we know that they are real, and not just ‘noise’ or ‘junk.’ Of course, the next step is to answer the question, ‘what, in fact, do they do?'”

Pugh adds that the implications of this research could represent one step towards solving the problem of “missing heritability”—a concept that describes how most traits, including many diseases, cannot be accounted for by individual genes and seem to have their origins in regions of the genome that do not code for proteins.

“It is difficult to pin down the source of a disease when the mutation maps to a region of the genome with no known function,” Pugh says. “However, if such regions produce RNA then we are one step closer to understanding that disease.”

The US National Institutes of Health funded the study.

Source: Penn State

Related Articles