RICE (US) — A new tool is advancing the art of predicting the form and function of proteins.
Proteins are the workhorses that carry out the biological tasks essential to every living thing, but before they can go to work, they fold. Each protein has its own characteristic, folded shape, and various diseases, including cancer, have been linked to proteins that misfold or otherwise misbehave.
Amino acid molecules, known as residues, link together in chains to form proteins. For the past three decades, scientists have created many methods to predict how a particular chain of residues is likely to fold and the purpose the resulting protein serves.
Rice University biophysicist José Onuchic and his team developed the new tool, known as direct coupling analysis-fold (DCA-fold), that enhances existing methods. Details of their research appear in the online version of the Proceedings of the National Academy of Sciences.
While most protein-folding researchers look at the sequence of amino acids in a protein, either through X-ray crystallography of folded proteins or through computer simulations, Onuchic and colleagues stepped back to look at the DNA sequences that serve as the blueprints for the proteins.
By exploiting the increasingly large database of genomic sequence information, they’re able to increase the accuracy of predicting the structures of folded proteins.
They start by finding points in the protein-encoding genome sequences that appear to change at the same time, even though they may be separated by great distances along the chain.
The implication, Onuchic says, is that at some point in the protein’s evolutionary history, the amino acids made contact, liked what they saw and kept in touch. In more technical terms, a benefit to the protein’s purpose was realized and conserved.
“A lot of the decisions made through the biological process don’t depend on a strong partner,” says Onuchic, a professor of physics and astronomy, chemistry, biochemistry, and cell biology. “It’s like the protein comes, kisses you once, and goes away; it’s what we call very weak interaction, which we’re never going to be able to see with current methods.
“But those weak interactions can cause a conformational change, transfer a phosphorus or start an entire cascade of signals,” he adds.
Onuchic, lead authors Joanna Sulkowska and Faruck Morcos, postdoctoral researchers at the University of California, San Diego, and their colleagues looked deeply into the genomes of bacteria to pull 15 protein models; the scientists took from them about 1,000 distinct protein-coding sequences, enough for DCA-fold to be statistically accurate.
“When you look at the evolution of a particular protein in these bacteria and see just one residue change from one sequence to the next, that’s probably random,” Onuchic says. “But if two change at the same time, the probability is that they’re changing together. It’s a good sign that these things probably interact with each other.”
Pull signal from the noise
Spotting those interactions is difficult in the proteins themselves, at least with current methods, he notes.
Crystallization, for instance, freezes a protein in time but provides no evidence of interactions that happened on the way to the finished product. And while computer simulations that align protein sequences are improving, their accuracy is not as good as it needs to be, he adds.
But direct coupling analysis of protein-coding genes spots positions in sequences conserved across genomes that change together—a change that could only happen through mechanical contact.
DCA-fold finds those subtle interactions that other methods miss. Energy landscape theories developed by Onuchic and his team predict how those interactions nudge the protein through its process. The combined result eliminates possibilities from the range of forms a protein might take.
Onuchic sees that as a way to pull a signal from all the noise.
“The entire game here is to show that by adding the genomic data to folding simulations, we can aid in structure prediction,” Onuchic says. “Here, we get at least 1,000 bacterial sequences that are part of the same protein family but are not necessarily structurally similar. Then we compare the sequences and figure out which pairs of amino acids change at the same time.
“Although previous correlation methods could give approximate answers, our model is much more accurate.
“Once we know there’s a high probability that these two amino acids came together at some point, we’re constrained. If the sequence data tells me we can have structure a, b or c, we can then look to see which is consistent with the pair of contacts we now know about from the genomic data and eliminate the wrong predictions.”
The research was supported by the National Science Foundation through the Center for Theoretical Biological Physics.
More news from Rice University: http://news.rice.edu/