March 31, 2009 -- Scientists at Penn State and the National Institute of Genetics in
Japan have demonstrated that several statistical methods commonly used
by biologists to detect natural selection at the molecular level tend
to produce incorrect results. "Our finding means that hundreds of
published studies on natural selection may have drawn incorrect
conclusions," said Masatoshi Nei, Penn State Evan Pugh Professor of
Biology and the team's leader. The team's results will be published in
the Online Early Edition of the journal Proceedings of the National
Academy of Sciences during the week ending Friday 3 April 2009 and also
in the journal's print edition at a later date.
Nei
said that many scientists who examine human evolution have used faulty
statistical methods in their studies and, as a result, their
conclusions could be wrong. For example, in one published study the
scientists used a statistical method to demonstrate pervasive natural
selection during human evolution. "This group documented adaptive
evolution in many genes expressed in the brain, thyroid, and placenta,
which are assumed to be important for human evolution," said Masafumi
Nozawa, a postdoctoral fellow at Penn State and one of the paper's
authors. "But if the statistical method that they used is not reliable,
then their results also might not be reliable," added Nei. "Of course,
we would never say that natural selection is not happening, but we are
saying that these statistical methods can lead scientists to make
erroneous inferences," he said.
The team examined the branch-site method and several types of
site-prediction methods commonly used for statistical analyses of
natural selection at the molecular level. The branch-site method
enables scientists to determine whether or not natural selection has
occurred within a particular gene, and the site-prediction method
allows scientists to predict the exact location on a gene in which
natural selection has occurred.
"Both of these methods are very
popular among biologists because they appear to give valuable results
about which genes have undergone natural selection," said Nei. "But
neither of the methods seems to give an accurate picture of what's
really going on."
Nei said that for many years he has suspected
that the statistical methods were faulty. "The methods assume that when
natural selection occurs the number of nucleotide substitutions that
lead to changes in amino acids is significantly higher than the number
of nucleotide substitutions that do not result in amino acid changes,"
he said. "But this assumption may be wrong. Actually, the majority of
amino acid substitutions do not lead to functional changes, and the
adaptive change of a protein often occurs by a rare amino acid
substitution. For this reason, statistical methods may give erroneous
conclusions." Nei also believes that the methods are inaccurate when
the number of nucleotide substitutions observed is small.
To
demonstrate the faultiness of the statistical methods, Nei's team
compiled data collected by their Emory University colleague, Shozo
Yokoyama, on the genes that control the abilities of fish to see light
at different water depths and on the genes that control color vision in
a variety of animals. The team used these data to compare statistically
predicted sites of natural selection with experimentally determined
sites. They found that the statistical methods rarely predicted the
actual sites of natural selection, which had been identified by
Yokoyama through experiments. "In some cases, statistical method
completely failed to identify the true sites where natural selection
occurred," said Nei. "This particular exercise demonstrated the
difficulty with which statistical methods are able to detect natural
selection."
To demonstrate how small sample sizes can lead to
incorrect results, the team used computer simulations to examine the
evolution of genes in three primates: humans, chimpanzees, and
macaques. The scientists mimicked the procedures used by the authors of
a 2007 paper, which applied the branch-site method to 14,000
orthologous genes -- genes that are genealogically identical among
different species -- and which found that the method predicted
selection in 32 of the genes. Nei and his team also studied selection
using Fisher's exact test, but this test did not detect any selection.
"The results indicate that the number of nucleotide substitutions that
occurred were too small to detect any selection; therefore, all of the
32 cases obtained by the branch-site method must be false positives,"
said Nozawa.
"These statistical methods have led many scientists
to believe that natural selection acted on many more genes in humans
than it did in chimpanzees, and they conclude that this is the reason
why humans have developed large brains and other morphological
differences," said Nei. "But I believe that these scientists are wrong.
The number of genes that have undergone selection should be nearly the
same in humans and chimps. The differences that make us human are more
likely due to mutations that were favorable to us in the particular
environment into which we moved, and these mutations then accumulated
through time."
Nei said that to obtain a more realistic picture
of natural selection, biologists should pair experimental data with
their statistical data whenever possible. Scientists usually do not use
experimental data because such experiments can be difficult to conduct
and because they are very time-consuming.
Source : Penn State