Login

Join for Free!
17148 members
table of contents table of contents

This study strongly support the introns-late hypothesis of the origin of spliceosomal …


Biology Articles » Evolutionary Biology » Molecular Evolution » Phase distribution of spliceosomal introns: implications for intron origin » Results

Results
- Phase distribution of spliceosomal introns: implications for intron origin

Inference of the evolution of intron phase distribution

Figure 1 shows the evolution of intron phase distribution inferred from intron patterns in conserved regions of 684 gene orthologs from seven eukaryotes using an assumed ecdysozoa tree and the maximum likelihood method of estimating rates of intron gains and losses. There is a general trend toward an increasing proportion of phase-0 introns caused by gained introns. For two branches, one from the crown ancestor to Arabidopsis thaliana and the other from the ecdysozoa ancestor to Caenorhabditis elegans, the differences between phase distributions of gained introns and ancestral introns are statistically significant (P = 8.3 × 10-16 and 1.8 × 10-5, respectively). In contrast, differences between the phase distributions of lost introns and ancestral introns are not statistically significant for any branch that has data for lost introns. Our result for the evolution of intron phase distribution thus suggests that the nonuniformity of intron phase distribution is more likely to be due to the nonrandomness of intron insertions.

Compilation of a genome-wide dataset

In order to test the introns-late prediction that intron phase distribution is non-uniform, we compiled a dataset from the entire genomes of 10 eukaryotes (Table 1). These 10 species were chosen because they cover a broad range of evolutionary distance and their genomes are well annotated. In this dataset, the average number of introns per gene varies from 1.0 in Schizosaccharomyces pombe to 8.1 in Homo sapiens. The GC content of the coding regions in the genomes ranges from 24% in Plasmodium falciparum to 56% in Neurospora crassa, and the distribution of phase-0 introns ranges from 38.2% in N. crassa to 57.6% in A. thaliana. In all species the intron phase distributions show an obvious pattern of phase-0 > phase-1 > phase-2; the only exception is A. thaliana, in which the distribution of phase-2 introns is slightly larger than that of phase-1 introns. These results are consistent with previously published results (e.g., ref. [10]).

Prediction of intron phase distribution for the all-pattern model

Figure 2 shows the intron phase distributions predicted by an intron insertion model (hereafter, the all-pattern model) in which introns can be inserted into any sequence pattern, but are inserted into different patterns with different frequencies. The predicted intron phase distributions matched the observed ones quite well for GC-rich species with GC content >50% (e.g., N. crassa and Drosophila melanogaster), but did not match for GC-poor species with GC content A. thaliana and C. elegans). For all GC-poor species, the largest errors in prediction occurred in phase-0 and phase-1 introns; the proportions of phase-0 introns were underestimated whereas those of phase-1 introns were overestimated. Note that although most Xenopus tropicalis introns are shared with H. sapiens introns (unpublished data), the GC content is 5% lower and the prediction error is much larger in X. tropicalis. Based on this observation, we speculated that the larger prediction errors in GC-poor species may be due to higher mutation rates.

Inference of the GC content and intron density in the RP gene dataset

To test our speculation that the prediction errors were due to high mutation rates, we compiled a smaller dataset containing 79 orthologs of ribosomal protein (RP) genes from four species: A. thaliana, Oryza sativa, Chlamydomonas reinhardtii, and H. sapiens, and inferred the evolution of GC content and intron density (Figure 3). The three plant species were chosen because A. thaliana had the largest prediction error using the all-pattern model (Figure 2). The outgroup H. sapiens was chosen due to its nearly neutral (52%) GC content and its high density of introns. The analysis indicated that 98% of A. thaliana introns already existed in its last common ancestor with O. sativa, and the inferred GC content for this ancestor was 54%. The result suggests that the large reduction in GC content (from 54% to 47%) in A. thaliana is likely to be the main cause for its large prediction error. (Note that although the GC content of RP genes is somewhat different from the average GC content in each whole genome, this does not affect the result significantly, as only the relative differences are important here.) It is possible that when introns are inserted, the exon junctions surrounding introns are subjected to a much lower mutation rate than the average mutation rate in the genes of fast-evolving species due to the need for efficient splicing. Consequently, the intron phase distributions predicted using current sequences in fast-evolving species would not match the observed data.

Prediction of intron phase distribution with mutation correction

To accommodate this source of error in fast-evolving species, we proposed a simple model for mutation correction and used it to re-predict the intron phase distributions for all species in the genome-wide dataset (Figure 4). The best mutation rates (the rate at which the prediction error is smallest), the corresponding GC contents, the predicted intron phase distributions, the prediction errors, and the standard deviations for all species are provided in Table 2. As shown in Figure 4, the differences between the predicted intron phase distributions and the observed ones were now not statistically significant (i.e., P > 0.05) for H. sapiens, N. crassa, Fusarium graminearum, Cryptococcus neoformans, A. thaliana, and X. tropicalis. There are several lines of evidence for the validity of our mutation correction model. First, for A. thaliana, the GC content at the best mutation rate was 57.6% (Table 2 [see Additional file 1]), a value very close to the inferred 54% of the last common ancestor of A. thaliana and O. sativa in the 79 orthologs of RP genes (Figure 3). It is possible that this value was the average GC content of A. thaliana during the period when most of its introns were gained. Second, the best prediction errors and GC contents of H. sapiens and X. tropicalis were close to each other, in agreement with the fact that most H. sapiens introns are shared with those of X. tropicalis and their divergence is quite recent (unpublished data). (The small difference between the two inferred best GC contents is likely due to difference in the GC content of the second bases of codons, because our model does not correct for mutations at these bases.) Third, the inferred best GC contents of the two other animals: D. melanogaster and C. elegans were also very close to those of H. sapiens and X. tropicalis. Finally, our result suggests that the human genome is evolving toward decreasing its GC content, consistent with the result of Meunier and Duret [16].



rating: 1.00 from 2 votes | updated on: 17 Dec 2006 | views: 553 |

Rate article:







excellent!bad…