VI. EXPRESSION PROFILING OF EMBRYONIC STEM CELLS
It is generally assumed that ES cell biology is regulated through transcriptional mechanisms, but the definition of a stem cell remains largely functional (see sects. II and IV). The developmental capacity of ES cell lines requires a set of genes that are not expressed in other cell types, and knowledge of the intricate mechanisms regulating ES cell pluripotentiality and differentiation potential is currently limited to a few signaling pathways (e.g., LIF, BMP, Wnt) and regulatory factors (e.g., Oct-3/4, Nanog). Theoretically, a comprehensive analysis of a cellular transcriptome (i.e., all the RNAs present in a cell type) should be sufficient to define the molecular phenotype of stem cells and establish the determinants of ES cell choice. The underlying hypothesis behind these assumptions suggests that some mRNAs will be uniquely or more abundantly expressed in embryonic and/or adult stem cells than in any other cell type and that comparisons among cell populations will reveal these differences. Although several transcriptome-based (microarrays or SAGE) studies have now been published, which claim to have identified potential stemness-associated factors, a closer inspection of the data indicates that the identification of "stemness" factors has proved elusive (109). This is true for both mouse and human ES cells. The reasons most frequently cited for variations among studies include cell lines, culturing conditions, array and hybridization protocols, data analysis, and potentially contaminating cells. Additionally, many of the studies in mice focused on comparisons among ES cells with adult stem cells, because of earlier studies suggesting a broader potential or plasticity of adult stem cells than previously believed (34); however, this broader plasticity of primary isolates of many adult stem cells has recently been called into question (see review in Ref. 379). The identification of "stemness" genes by these approaches, therefore, remains the topic of lively debate and much conjecture. Finally, the phenotype of ES cells must also involve complex processes that alter protein abundance both as a consequence of gene activation and processing (transcription, splicing, etc.), as well as regulatory events associated with translation and posttranslational modifications (PTM). Proteomic approaches are therefore required to visualize and interpret the phenotype of undifferentiated ES cells.
A. Microarrays
Ramalho-Santos et al. (284) and Ivanova et al. (169) were the first to employ microarrays to compare mouse ES cells with hematopoietic (HSCs) and neuronal (NPCs) stem/progenitor cells. They identified 216 and 283 transcripts, respectively, that were enriched in all three stem cell libraries. Remarkably only six genes overlapped between the two lists, but when the stemness-associated transcripts were grouped, a common theme emerged. Stem cells expressed a large number of transcripts that could be described as signaling factors, transcription/translation factors, and proteins associated with DNA repair, protein degradation, and protein folding. The stem cells also expressed a prominent set of gene transcripts with unknown function, suggesting that many unique transcripts, either from novel genes or in the form of splicing variants, remain to be identified from embryos (42). Furthermore, some of the stemness-associated factors clustered to chromosome 17, suggesting that characterization of the genomic regions that regulate stem cell-associated factors will further promote our understanding of the regulatory networks required to maintain undifferentiated stem cell populations. About the same time, Tanaka et al. (356) compared ES and trophoblastic stem cells to identify Esg-1 (Dppa5) as an ES cell-restricted transcript that is exclusively associated with pluripotency.
Fortunel et al. (121) subsequently identified 385 transcripts that were highly expressed in mES cells, neural progenitor, and retinal stem/progenitor cells. From this list, only one transcript (
6-integrin) was present in the lists of stemness-associated transcripts published by Ramalho-Santos et al. (284) or Ivanova et al. (169). Most of the commonly enriched transcripts that were identified were not exclusively expressed in stem cells, suggesting that stem cell abundant transcripts may only be elevated relative to differentiated cells (60), and further analyses comparing stem cell lines with tissues seemed warranted. In 2003, Sharov et al. (327) compared transcript abundances among mouse oocytes, blastocysts, stem cells, postimplantation embryos, and newborn tissue. This comparison led to the identification of groups of genes expressed in preimplantation embryos and various stem cell lines (i.e., ES, EG, trophoblastic stem cells, mesenchymal stem cells, neural stem cells, osteoblasts, and hematopoietic stem cells). Importantly, the ES and EG cells were shown to have a distinct genetic program relative to the other cell types, and one set of 88 genes was identified that showed a decrease in expression with a loss of developmental potential, i.e., more differentiated cell types. These results were consistent with the notion that adult stem cells acquire or retain pluripotency with characteristics of less defined cell types and that ES and EG cells contain a limited but unique set of transcripts that differ from signature molecules in adult stem cells. Because development is often considered to involve a sequential activation and repression of genes, it is likely that differences in transcript abundance were indicative of defined differentiation or developmental stages.
Global expression profiles for hES cells have now been published by several groups (31, 48, 103, 138, 315, 339). A common finding among these studies is the existence of gene transcripts that are present at significantly higher levels in undifferentiated cells than in fully differentiated cells; however, many of the findings, like those for mouse, vary widely among studies. Carpenter et al. (70) had previously shown from FACS analysis that hES cell lines, which had been derived in the same laboratory using similar techniques, consisted of heterogeneous population of cells that make it difficult to quantify their transcriptomes under standard cultivation conditions. Of the cell lines accessible for study, many may also have been isolated at slightly different stages of blastocyst maturation and under different conditions. For these reasons, transcriptome comparisons among hES cell lines are open to interpretation.
Sato et al. (315) published the first analysis of differentiated and undifferentiated human ES cells (Line H1). A set of 918 genes was enriched in undifferentiated cells, including numerous ligand/receptor pairs and secreted inhibitors of the FGF, TGF-
/BMP, and Wnt pathways, which they suggested to be important for the regulation of hES cells. Two hundred twenty-seven transcripts were shared by the list of mES cell enriched transcripts reported by Ramalho-Santos et al. (284). This is noteworthy because these findings suggested that the molecular programs, which underlie ES cell identity, at least partially, seem evolutionarily conserved at a molecular level. Subsequent analyses, however, suggested that genes implicated in "stemness" of mouse embryonic and adult stem cells differ from those gene sets identified in hES cells (103). Sperger et al. (339) compared the expression profiles of hES cell lines with human germ cell tumor cell lines, tumor samples, somatic cell lines, and testicular tissue samples. The goal of this study was to identify genes specifically expressed at a higher level in pluripotent cell types. Based on the microarray data, the five ES cell lines examined clustered together and secondarily clustered as a branch of EC cell lines, suggesting that their expression patterns were more similar to each other than to any of the other cell types used in this analysis. They furthermore suggested that EC cells most closely resemble transformed ICM or primitive ectoderm cells.
A few general findings were consistent among the studies. These included the presence of transcripts to Oct-3/4, Nanog, Tdgf1, Utf1, and lin-28 in undifferentiated hES cells, but remarkably, Sox2, Dnmt3B, gp130 and Rex-1 (ZFP42) were inconsistently or poorly expressed among several lines (31, 103, 138). Among differentially regulated gene transcripts were several components associated with signaling pathways (48), several of which have been suggested to play key roles in hES cell growth and/or differentiation. These included Wnt, BMP, FGF receptor, and Nodal (Lefty A and B, Nodal and Pitx2) signaling, but not LIF receptor/gp130 signaling. Even though the FGF receptors are relatively abundant in these cells, the distribution of these receptor subtypes was highly heterogeneous (70), as is likely to be the case for most other signaling components commonly associated with hES cells.
B. Serial Analysis of Gene Expression
In the first attempt to quantify the functionally active genome of ES cells, we employed serial analysis of gene expression (SAGE; see Fig. 8), which is a sequence-based technique that relies on short sequence tags to identify transcripts present in a cell (373). Although we initially used SAGE to define the transcriptomes of P19 EC and R1 ES cell lines (9, 10), only two other mouse SAGE libraries were available at that time for comparative purposes, precluding a clear analysis of the molecular basis for the embryonic stem cell phenotype. Recently, two SAGE libraries were constructed from hES cells (296). Like the microarray data presented earlier, the human data suffered from considerable heterogeneity among cell lines. In one of the cell lines, for example, transcripts encoding Rex-1 were highly abundant, but absent in the second. Although the authors suggested that Rex-1 might be dispensible for the derivation of human ES cells, it is more likely that the hES cell line lacking Rex-1 was more closely associated with primitive ectoderm (339), which does not normally express Rex-1 at least in mouse. Comparisons with the mouse R1 ES cell SAGE library indicated considerable differences between the transcriptomes of mouse and human ES cells. Members of the LIF signaling pathway (STAT3, LIFR, and gp130) were much more highly expressed in mouse than in human ES cells, whereas Oct-3/4 and Sox2 were more highly abundant in human than mouse ES cells.
Because SAGE data are quantitative in nature, we were able to use the R1 mouse SAGE dataset to estimate the total number of transcripts present in ES cells. For statistical reasons, it proved difficult to estimate accurately the total number of unique transcripts, but a simple correction indicated that >54,000 unique transcripts must be present, and model simulations indicated that 130,000 unique transcripts were compatible with the R1 ES cell sampling profile (343). Because
10% of the tags in this SAGE library did not map with any previously described EST dataset, we estimated that the number of unique transcripts (splice variants or novel gene transcripts) that have not yet been identified in ES cells remain quite high (
6,000–13,000), underscoring a potential limitation in our ability to define the molecular basis of ES cell identity.
Since our initial SAGE analysis of mouse R1 ES cells, over 40 mouse SAGE libraries, including two additional ES cell lines (D3 and ESF 116) and one from an EG cell line (EG-1), have been deposited in the public domain, which have permitted us to identify transcripts with expression patterns similar to that of Oct-3/4 (unpublished data). We have been able to exploit the comparative power of SAGE (http://www.ncbi.nlm.nih.gov/SAGE), which increases as a function of the number of publicly available libraries, to confirm or refute the authenticity of other stemness-associated transcripts. As an example, we have taken a subset of known and putative stemness factors identified from microarray analyses and compared the abundance (tags per million) of each transcript among 40 SAGE libraries. Based on these analyses, we would conclude that Mdr1 and the LIF receptor are not stemness-restricted factors but that factors like UTF-1, Dppa-5, Sox2, and Tdgf (in addition to Oct-3/4 and Nanog) are authentic embryonic stemness-related transcripts, whereas other transcripts, like those to Thy1 (see Table 2), would be excluded from our stemness list because of its elevated expression levels in testes and cerebellum.
Based on all available transcriptome (microarrays and SAGE) evidence, it is likely that ES cells contain a relatively small set of novel molecular markers/transcripts implicated in stemness. It is also likely that molecular determinants of pluripotentiality versus differentiation will involve a constellation of factors working in concert to regulate a stem cell's choice, but functional studies similar to those described for Nanog (233) and Wnt signaling (314) will be required before any specific signature factor can be unequivocally associated with stemness or a defined progeny.
C. Proteomic Analyses
The molecular basis of ES cells and their ability to differentiate into cell lineages is a complex process that involves altered protein abundance resulting from changes in gene expression (transcription, polyadenylation, splicing, etc.) as well as protein regulatory events associated with translation (initiation, elongation, termination) and PTMs. Proteomic approaches have therefore been deemed essential to the visualization and interpretation of the cellular phenotype of undifferentiated ES cells. As a first step in this analysis, Elliott et al. (106) have established a proteomic database of mouse R1 ES cells analyzed by two-dimensional gel electrophoresis coupled with mass spectrophotometric techniques. Of the 700 spots analyzed, 241 distinct protein species were identified that corresponded to 218 unique proteins, approximately one-half of which were specifically associated with DNA maintenance, transcription, translation, and protein processing. Almost 21% of the proteins exhibited some form of PTM (e.g., phosphorylation, palmitoylation), and several of the proteins (e.g., peptidyl prolyl cis-trans isomerase A and FK506-binding protein 4) had not been previously associated with PTMs in other tissues. Although it is difficult to conclude how widespread these events are until comparisons have been made among ES cell lines of mouse and human origin, these data confirm that highly abundant proteins in mouse ES cell lines in vitro undergo substantial PTMs and that transcriptome analyses alone are insufficient to account for the molecular and cellular basis of embryonic stemness.