Login

Join for Free!
17806 members
table of contents table of contents

This study strongly support the introns-late hypothesis of the origin of spliceosomal …


Biology Articles » Evolutionary Biology » Molecular Evolution » Phase distribution of spliceosomal introns: implications for intron origin » Discussion

Discussion
- Phase distribution of spliceosomal introns: implications for intron origin

The introns-early theory explains the excess of phase-0 introns by predicting that a fraction of present-day introns are ancient and these introns were in phase-0. If this explanation is correct, the excess of phase-0 introns should generally decrease during eukaryotic evolution as new introns are inserted into random positions. A direct test of the explanation of the introns-early theory for the excess of phase-0 introns is therefore to infer the evolution of intron phase distribution from observed data. This test was first performed by Roy et al. [17]. Using a dataset of 280 ancient genes (unpublished), they divided the present-day introns into two categories: lineage-specific introns and widely phylogenetically distributed introns, which are thought to be rough estimates of recently gained introns and ancestral introns, respectively. They found that the presumed ancestral introns had a stronger phase-0 bias than the lineage-specific introns (Table 3 of ref. [17]). In contrast, our results (Figure 1) show a general trend over evolution toward an increase in the excess of phase-0 introns. We believe that this discrepancy is more likely due to different datasets than to different classification methods because when a classification method similar to that used in ref. [17] was applied to the current dataset, a stronger phase-0 bias in lineage-specific introns was obtained [18]. Another reason for this discrepancy may be that all of the 280 gene families in the dataset used in ref. [17] are ancient, and these gene families may show a different pattern of evolution of intron phase distribution than younger gene families. However, when we used a smaller dataset of 79 RP gene families – all of which are believed to be ancient – from the same seven species studied here [19], the result was still inconsistent with that in ref. [17] (data not shown).

Sverdlov et al. [18] suggested that the stronger phase-0 bias in lineage-specific introns than in widely distributed introns refuted the explanation of the introns-early hypothesis. However, it should be stressed that this conclusion cannot always be drawn from this result: The explanation of introns-early may still be correct even when lineage-specific introns have stronger phase-0 bias than widely distributed introns. Consider the following example: suppose a species has 200 current introns with a phase distribution of 100:50:50, and 100 of these are widely distributed introns with a phase distribution of 40:30:30. Therefore, the species also has 100 lineage-specific introns with a phase distribution of 60:20:20. We suppose further that all 100 lineage-specific introns were gained recently and there are also 100 introns specific to this species that have been lost. If the phase distribution of the lost introns is 40:30:30, the phase distribution of ancestral introns will be 80:60:60, which has less phase-0 bias than the current introns. However, if the phase distribution of lost introns is 80:10:10, the phase distribution of ancestral introns will be 120:40:40, which has more phase-0 bias than the current introns. Thus, no decisive conclusion can be reached by comparing intron phase distributions between lineage-specific introns and widely distributed introns. In contrast, by using the maximum likelihood method to infer a set of most reliable events (>90% probability of occurrence), we were able to estimate the intron phase distribution at each ancestral node.

Our result for the evolution of intron phase distribution suggests that the excess of phase-0 introns is more likely to be caused by the nonrandomness of intron gains. However, all previous studies failed to prove this at a satisfactory level [10,15]. Therefore, we decided to re-test this prediction on a large scale using genome-wide data from 10 model species. We first used the fixed-pattern intron insertion model, in which introns are inserted only into proto-splice sites, and our experimental results (data not shown) were consistent with previous results [10], in which the intron phase distributions predicted from the distributions of four potential proto-splice sites (G|G, AG|G, AG|GT, and MAG|R) did not match the observed ones.

Another model of intron insertion has been proposed in which introns are either randomly inserted into sequences but with different rates of fixation or are preferentially inserted into a consensus sequence [14,20,21]. We therefore tested the all-pattern intron insertion model, in which introns can be inserted into any pattern of sequences but are inserted into different patterns with different frequencies. Since the frequencies of intron insertion may vary from species to species, these frequencies were obtained from the observed data separately for each species. The results (Figure 2) show that the model predicted intron phase distributions well in GC-rich species but not in GC-poor species. Analysis of a smaller dataset of 79 orthologs of RP genes shows that higher mutation rates are very likely the main cause for the higher prediction errors in GC-poor species (Figure 3). Therefore, we proposed a simple model for mutation correction and used it to predict intron phase distributions for all species again. As expected, the predicted intron phase distributions now matched the observed data for both GC-rich and GC-poor species, with differences in six out of ten species that were not statistically significant (Figure 4 and Table 2).

Although the predicted intron phase distributions of four remaining species (D. melanogaster, C. elegans, S. pombe, and P. falciparum) account quite well for the observed distributions (Figure 4), their differences were still statistically significant. It is possible that the assumption of not changing amino acid sequences in our mutation correction model did not fully compensate for the mutation effect in S. pombe and P. falciparum, as they have very low GC contents. The larger errors in D. melanogaster and C. elegans may be partly due to the nonuniformity of intron losses, because both species suffered from high rates of intron loss after their divergence from H. sapiens [22]. Moreover, since other factors such as annotation mistakes on exon/intron structures may also affect the results, we should not put too much weight on statistical tests. Therefore, we conclude that the all-pattern intron insertion model may explain intron phase distributions even when statistical equivalence is not reached.

The intron phase distributions are lineage-specific and may be affected by two factors: changes in DNA sequences and changes in intron insertion frequencies. The latter may reflect changes in the efficiency with which the splicing machinery splices out introns. When the intron insertion frequencies learned from H. sapiens were used to predict N. crassa sequences, the predicted intron phase distribution was 44:32:24, much closer to the distribution observed in H. sapiens (45:31:24) than in N. crassa (38:34:28). This indicates that the change in intron insertion frequencies has stronger effect on the intron phase distribution than the change in DNA sequences.


rating: 1.00 from 2 votes | updated on: 17 Dec 2006 | views: 618 |

Rate article:







excellent!bad…