Big data sets create ‘tree of life’ confusion
VANDERBILT (US) — The genomics revolution has given experts mountains of DNA data to reconstruct the evolution of all living beings, but the vast information has led to contradictory conclusions.
“It has become common for top-notch studies to report genealogies that strongly contradict each other in where certain organisms sprung from, such as the place of sponges on the animal tree or of snails on the tree of mollusks,” says Antonis Rokas, associate professor of biological sciences, at Vanderbilt University.
In a study published online by the journal Nature, Rokas and graduate student Leonidas Salichos analyze the reasons for these differences and propose a suite of novel techniques that can resolve the contradictions and provide greater accuracy in deciphering the deep branches of life’s tree.
“The study by Salichos and Rokas comes at a critical time when scientists are grappling with how best to detect the signature of evolutionary history from a deluge of genetic data. These authors provide intriguing insights into our standard analytical toolbox, and suggest it may be time to abandon some of our most trusted tools when it comes to the analysis of big data sets. This significant work will certainly challenge the community of evolutionary biologists to rethink how best to reconstruct phylogeny,” says Michael F. Whiting, program director of systematics and biodiversity science at the National Science Foundation, which funded the study.
To gain insight into this paradox, Salichos assembled and analyzed more than 1,000 genes—approximately 20 percent of the entire yeast genome—from each of 23 yeast species. He quickly realized that the histories of the 1,000-plus genes were all slightly different from each other as well as different from the genealogy constructed from a simultaneous analysis of all the genes.
“I was quite surprised by this result,” Salichos points out.
By adapting an algorithm from information theory, the researchers found that they could use these distinct gene genealogies to quantify the conflict and focus on those parts of the tree that are problematic.
In broad terms, Rokas and Salichos found that genetic data is less reliable during periods of rapid radiation, when new species were formed rapidly. A case in point is the Cambrian explosion, the sudden appearance about 540 million years ago of a remarkable diversity of animal species, without apparent predecessors. Before about 580 million years ago, most organisms were very simple, consisting of single cells occasionally organized into colonies.
“A lot of the debate on the differences in the trees has been between studies concerning the ‘bushy’ branches that took place in these ‘radiations’,” Rokas says.
The researchers also found that the further back in time they went the less reliable the genetic data becomes. “Radioactive dating methods are only accurate over a certain time span,” says Rokas. “We think that the value of DNA data might have a similar limit, posing considerable challenges to existing algorithms to resolve radiations that took place in deep time.”
The National Science Foundation supported the research.
Source: Vanderbilt University
You are free to share this article under the Creative Commons Attribution-NoDerivs 3.0 Unported license.