Wheat genome comes together like jigsaw puzzle

Researchers have sequenced the genome of the species of wheat most commonly grown for making bread, Triticum aestivum.

“After many years of trying, we’ve finally been able to produce a high-quality assembly of this very challenging genome,” says Steven Salzberg, professor of biomedical engineering and genetic medicine at Johns Hopkins University and author of the paper in the journal GigaScience outlining the near-complete DNA sequencing of the wheat.

“It’s like trying to put together a jigsaw puzzle of a landscape scene with a huge blue sky.”

The same research team was also involved in sequencing the bread wheat’s “ancestor,” Aegilops tauschii. The journal Nature published a separate report on that achievement.

Together, the wheat genome sequences can help biologists better understand the evolutionary history of wheat, and may also advance the quest for hardier, more pest- and drought-resistant wheat types to help feed the world’s growing population, the scientists say.

Bread wheat has one of the most complex genomes known to science, containing an estimated 16 billion base pairs of DNA and six copies each of seven chromosomes, the scientists say. The human genome is roughly one-fifth that size, with about 3 billion base pairs and two copies of 23 chromosomes.

Previously published versions of the bread wheat genome have contained large gaps in its highly repetitive DNA sequence.

“The repetitive nature of this genome makes it difficult to fully sequence,” Salzberg says. “It’s like trying to put together a jigsaw puzzle of a landscape scene with a huge blue sky. There are lots of very similar small pieces to assemble.”

It took a year for the team to assemble 1.5 trillion bases of raw data into a final assembly of 15.34 billion base pairs.

To do it, Salzberg and his team used two types of genome sequencing technology: high throughput short-read sequencing and long-read, single molecule sequencing.

As its name implies, high throughput sequencing generates massive amounts of DNA base pairs very quickly and cheaply, although the fragments are very short—just 150 base pairs long for this project. To help assemble the repetitive areas, the team used real-time, single molecule sequencing, which reads DNA as it is being synthesized in a tiny, nano-scale well on a chip. The technology enables scientists to read up to 20,000 base pairs at a time by measuring fluorescent signals that are emitted as each DNA base is copied.

Salzberg says that sequencing a genome of this size requires not only genetic expertise, but also very large computing resources available at relatively few research institutions around the world. The team used approximately 100 CPU years to put this genome together.

Salzberg and his team also participated in the collaborative effort to sequence Aegilops tauschii, commonly referred to as goatgrass and still found in parts of Asia and Europe. Its genome is approximately one-third the size of the bread wheat genome, but has similar levels of repetition. The work took about four years.

Salzberg’s colleagues on the bread wheat study are from Johns Hopkins University; Pacific Biosciences in Menlo Park, California; and the Earlham Institute in the United Kingdom. The National Science Foundation and National Human Genome Research Institute helped to fund the project.