The first complete sequence of a 23 megabase (Mb) euchromatic (non-heterochromatic) portion of a human Y chromosome, from a single male, has recently been published by David Page's group [7]. This is a considerable achievement, given the difficulties of sequencing chromosomes that are rich in repeats, as is the case even for the euchromatin of the human Y. The researchers exploited the fact that a single Y chromosome provided the material for sequencing: this allowed them to attribute slight sequence differences between similar tracts of DNA to within-chromosome rather than between-individual variation, and hence facilitated the identification of repeats.
The results provide a uniquely detailed picture of the organization of a Y chromosome - not yet available for other species - which confirms many previous findings but modifies others. The sequenced region is nearly the entire portion of the euchromatic Y chromosome that does not cross over with the X, together with some of the repeat-rich heterochromatic part of the Y chromosome. Skaletsky et al. [7] call this the 'male-specific Y' (MSY), because the small part of the Y that does cross over is essentially the same as the corresponding part of the X and is common to males and females. We now have a list of putative transcription units within the MSY that is considerably larger in number (158) than was previously thought, but this is still a small fraction of the number of the 1,000 or so on the 160 Mb of the X chromosome [6]. Many of the ones on the Y probably do not code for proteins.
The identifiable coding sequences among the 158 predicted genes on the Y chromosome can be divided into two categories. The first comprises 27 genes with clear signs of homology to genes on the X chromosome, betraying the common origin of these two chromosomes; 13 of these have degenerated into pseudogenes. The remaining 14 active Y-linked genes within this category tend to have a broad range of expression in many different tissues, with the notable exception of the male-determining gene, Sry, which is expressed early in development in the germ cells of males, and has an X-linked counterpart, Sox3. There is considerable variation among these 14 genes in the extent of their silent-site sequence divergence (Ks) [8] between the copies on the X and Y chromosomes, although it is not clear that they can be sharply divided into four 'evolutionary strata', representing discrete phases in the differentiation of the X and Y chromosomes, as was previously proposed for a subset of these genes [9].
There are two possible explanations for the range of differences in divergence: one is that the more highly diverged genes have been isolated from recombination with the X chromosome for longer than the less highly diverged genes; and the other is that there are different rates of genetic exchange via gene conversion between X and Y for different genes. The first hypothesis is consistent with the fact that genes with different Ks values tend to cluster together on the X chromosome in contiguous blocks, with Ks values increasing from the distal short arm to the distal long arm of the X. The order of these genes differs greatly between X and Y chromosomes, suggesting that there have been rearrangements involving chromosomal inversions that would have helped to suppress crossing-over between the evolving X and Y chromosomes. The evolutionary advantages to such suppression of crossing-over have long been discussed [1,3,10]. Sry and Sox3 have the highest Ks value, as would be predicted from such evolutionary considerations for descendants of a gene that must have been involved in early stages of the evolution of the Y chromosome, equivalent to a time of isolation of over 250 million years ago. The second hypothesis - that different genes have different rates of genetic exchange between X and Y - is supported by the fact that gene conversion between genes on the X and Y chromosomes has been detected in the cat family [11], and seems also to be occurring at a high rate within the human Y chromosome (discussed below), but this does not explain the ordering of Ks values along the X chromosome and the very ancient origin of several Y chromosome genes.
The second category of Y-linked genes consists of nine gene families that mostly have no resemblance to genes on the X chromosome. These are organized into repeats of two or more units, and show testis-specific expression. They seem to comprise genes whose functions are important for males but possibly deleterious for females; seven out of the nine have originated by transposition from an autosome, and two come from the X chromosome. It is interesting that a parallel case of a transposition of a gene from an autosome onto a Y chromosome has been found in the plant Silene latifolia; in this case the Y-linked copy is specifically expressed in male reproductive tissue [12]. These observations suggest that there may be an ongoing evolutionary process of acquisition of genes with male-specific functions by the Y chromosome; if higher levels of expression of these genes are advantageous for males but disadvantageous for females, this would be favored by selection [1,10].
Another unusual feature of the human Y chromosome is the presence of a 3.4 Mb tract of predominantly non-coding sequence derived from the X chromosome by a transposition event of some kind. The extent of sequence divergence from the corresponding region of the X chromosome suggests that the transposition occurred about 3-4 million years ago. An inversion of part of the short arm of the Y subsequently split the transposition into two non-contiguous blocks. Only two genes are present in this part of the Y chromosome [7].