Nematode strains
Nematodes were grown as described [104]. Strains were maintained on nematode growth media plates inoculated with the E. coli strain OP50 [105]. Strains used to isolate transcripts via mRNA-tagging were N2 (wild type), SD1241 (gaIs153, F25B3.3::FLAG::PAB-1) (NC694 (wdEx257, unc-4::3XFLAG::PAB-1) [37]. GFPtagged embryonic neurons were isolated from NW1229 (evIs111, F25B3.3::GFP) [47] (J Culotti, personal communication) for MAPCeL analysis.
Molecular biology
To create pPRSK29 (F25B3.3::FLAG::PAB-1), 4 kb of the F25B3.3 promoter upstream of the predicted ATG start was amplified using the following primers: Dp-5 (5'-GTC AAC TAG TGT ATG ATT CCT CG-3') and Dp-3 (5'-TCG GGG TAC CTA TCG TCG TCG TCG TCG ATG CCG TCT TCA CGA-3'). The predicted ATG start of F25B3.3 was replaced with an Asp718 site in the 3' primer. This PCR fragment was cloned into pCR2.1-TOPO (Invitrogen, Carlsbad, California, USA) to generate pPRSK29.1. pPRSK29.1 was digested with BamH1 and Asp718 to obtain the promoter fragment. pPRSK9 (myo3::FLAG::PAB-1) [11] was digested with Asp718 and SacI to obtain the FLAG::PAB-1 fragment. pBluescript SK was digested with SacI and BamHI, and a threeway ligation was performed to obtain pPRSK29 (F25B3.3::FLAG::PAB-1).
Transgenic generation
pPRSK29 (60 ng/μl) was co-injected with pTG99 (sur-5::GFP, 20 ng/μl) using standard injection protocols [106]. The resulting transgenic array was integrated using a Stratalinker (Stratagene) at 300 Joules/m2 [107] (Shohei Mitani, personal communication). GFP reporters were selected at random from a subset of plasmids received from the Promoterome project [108]. Microparticle bombardment was conducted as described [5].
Generating synchronized populations of L2 larvae for mRNA-tagging
Strains were grown to 'starvation' (that is, all dauer larvae) on ten 60 mm nematode growth media plates at 25°C. Half of each 60 mm plate was split into four pieces and placed on a 150 mm 8P plate [109] inoculated with the E. coli strain Na22. The resultant twenty 8P plates were incubated at 25°C until a majority of the food was depleted and most animals were gravid adults (a 'line' of worms is usually found at the retreating edge of the bacteria). The worms were removed from the plates with ice-cold M9 buffer (22 mM KH2PO4, 22 mM Na2HPO4, 85 mM NaCl, 1 mM MgSO4) and collected by centrifugation. Washes were repeated until the supernatant was clear of bacteria. A sucrose float (30 ml ice cold M9 buffer, 20 ml cold 70% sucrose) was performed to create an axenic nematode suspension. Animals were washed twice in ice-cold M9 buffer, then resuspended in 75 ml bleach solution (15 ml Chlorox, 3.75 ml 10 N NaOH, 56.25 ml water). Worms were transferred to a 125 ml glass beaker with a stir bar and incubated for 5-6 minutes while stirring rapidly (solution turns a dark yellow when nearing completion). When a majority of adults burst, the solution was passed through a 53 μm nylon mesh (Fisher #08670201, Pittsburgh, Pennsylvania, USA) to separate intact embryos from worm carcasses. Embryos were harvested by centrifugation and washed at least three times with M9 buffer. Embryos were resuspended in RT M9 buffer and incubated on a nutator for 12-16 hours at 20°C to allow L1 larvae to hatch and arrest.
Arrested L1 larvae were collected by centrifugation. Animals were resuspended in 1 ml RT M9 buffer and split equally over six 150 mm 8P plates. L1s were grown at 20°C for 22-25 hours to reach mid-L2, as shown by the appearance of the post-deirid sensory organ (approximately 80%) [1]. L2s (approximately 0.3-1 ml) were harvested from 8P plates and sucrose floated as above. Worms were resuspended in 30 ml cold M9.
mRNA-tagging
Methods are identical to those previously described [11] with the following modifications. Synchronized L2 larvae were resuspended in 2-3 ml homogenization buffer (HB; 50 mM HEPES, pH 7.6; 150 mM NaCl; 10 mM MgCl2; 1 mM EGTA, pH 8.0; 15 mM EDTA, pH 8.0; 0.6 mg/ml Heparin; 10% glycerol) and passed through a French press at 6,000 psi. Total RNA was isolated from 100 μl of lysate. An amount of lysate equivalent to 200 μg total RNA was used for co-immunoprecipitation. Following co-immunoprecipitation, beads were washed three times by brief treatment with 2 ml low-salt homogenization buffer (LSHB; 20 mM HEPES, pH 7.6; 25 mM NaCl; 1 mM EGTA, pH 8.0; 1 mM EDTA, pH 8.0; 0.6 mg/ml Heparin; 10% glycerol). Beads were then washed three time for 30 minutes in 2 ml LSHB. The LSHB treatment substantially reduced nonspecific RNA binding to the agarose beads (data not shown). Elution and mRNA extraction were performed as described [11] (see detailed protocol in Additional data file 20).
Isolation of RNA from embryonic neurons for MAPCeL analysis
In the MAPCeL method, GFP cells are isolated by FACS for microarray analysis. Primary cultures of embryonic cells were prepared [12] from a transgenic line expressing GFP throughout the nervous system, NW1229 (evIs111, F25B3.3::GFP) [47] (J Culotti, personal communication). After 24 hour in culture, GFP-labeled neurons were obtained by FACS and total RNA isolated as described [5,110]. Muscle profiling data used in Figures 4 and 7 were obtained by MAPCeL of embryonic muscle cells after 24 hours in culture (M24 dataset) (RMF, DMM, unpublished data). The top 50 enriched genes in this dataset were selected on the basis of statistical rank.
RNA amplification and microarray data analysis
A C. elegans Affymetrix chip was used for all microarray experiments [111]. For mRNA-tagging experiments, 25 ng of co-immunoprecipitated RNA was amplified and labeled as previously described [5]. Larval pan-neural (F25B3.3::FLAG::PAB-1) profiles were obtained in triplicate. Four independent larval A-class motor neuron (unc-4::3XFLAG::PAB-1) profiles were obtained. Reference profiles were generated from low levels of non-specifically bound RNA obtained from mock immunoprecipitations of synchronized populations of wild type (N2) L2 larvae. Five independent reference datasets were obtained. Total RNA (100 ng) was amplified and labeled for the MAPCeL sample, F25B3.3::GFP, isolated in triplicate. A previously obtained profile of total RNA isolated from all viable embryonic cells in culture was used as a MAPCeL reference [5].
Hybridization intensities for each experiment were scaled by reference to a global average signal from the same array (Additional data files 25 and 26) and normalized by robust multi-array analysis (RMA; Additional data files 27 and 28). We identified transcripts in two categories: EGs, or transcripts that are reliably detected in a given sample; and enriched genes, or transcripts with intensity values that are significantly higher than reference samples. EGs were estimated for the mRNA-tagging samples as follows. Expressed transcripts in the F25B3.3::FLAG::PAB-1 (larval pan-neural) and the unc4::3XFLAG::PAB-1 (larval A-class motor neurons) were initially identified on the basis of a 'present' call in a majority (for example, two-thirds) of experiments as determined by Affymetrix MAS 5.0. In this approach, genes are called 'absent' and, therefore, excluded when the mismatch (MM) value exceeds the perfect match (PM) intensity for a given gene. This analysis initially identified 8,084 'present' transcripts in the larval pan-neural sample and 7,578 transcripts in the larval A-class motor neuron sample (Additional data file 21). These lists, however, are likely to include mRNAs that are non-specifically bound to the anti-FLAG sepharose beads at low levels relative to bona fide neuronal transcripts (see above). We reasoned that transcripts included in the experimental samples that are actually derived from this non-specific pool should be generally detected in the reference sample at higher intensity values. Therefore, to exclude these non-specific mRNAs from the list of predicted neuronal genes, the average RMA-normalized intensity for each transcript in the reference sample was subtracted from the RMA value of the corresponding gene in the experimental sample. Transcripts with resultant positive values were considered EGs whereas transcripts with negative values after this operation were removed. In a final adjustment, a limited number of transcripts that are detected as neuronally enriched (see below) but not scored as present by MAS 5.0 were restored to the lists. This treatment identified 4,033 EGs in the larval pan-neural dataset and 3,320 EGs in the larval A-class motor neuron profile (Additional data file 13). EGs (7,953) for the MAPCeL embryonic pan-neural dataset were identified as previously described (Additional data file 12) [5]. Our treatment is relatively stringent as it is likely to exclude at least some transcripts that may be ubiquitously expressed (for example, 'housekeeping' genes) or potentially more highly expressed in another tissue relative to the nervous system. This prediction is consistent with the finding that approximately 20% (509/2,422; Additional data file 15) of transcripts identified in independent microarray experiments as highly enriched in GMIc (GMI plus the genes common to all three groups) remain in the list of larval pan-neural EGs (Additional data file 13). In contrast, 48% (1,172/2,422; Additional data file 15) of transcripts enriched in these other tissues are included in the list of 6,342 EGs in the larval reference dataset (Additional data file 13).
The data discussed in this publication have been deposited in NCBI's Gene Expression Omnibus [112-114] and are accessible through GEO series accession number GSE8004 (embryonic pan-neural, larval pan-neural, larval A-class) and GSE8159 (embryonic A-class).
To detect neuronally enriched transcripts, RMA-normalized intensities for experimental versus reference samples were statistically analyzed using Significance Analysis of Microarrays software (SAM) [115]. A two-class unpaired analysis of the data was performed to identify genes that differ by ≥ 1.5-fold from the reference at a FDR of
RMA normalized intensity values for all datasets were imported into GeneSpring GX 7.3 (Agilent Technologies, Santa Clara, California, USA) to generate the line graphs shown in Figures 4 and 7. Each experimental dataset was paired to its corresponding reference dataset for these diagrams.
Annotation of datasets
We utilized Perl scripts and hand annotation to identify all known neuronally expressed C. elegans transcripts (WormBase Release 146 (WS146)). First, WormMart was used to identify all transcripts with expression patterns. This list was filtered for genes represented on the Affymetrix microarray. For genes that have multiple spots on the microarray, only one representative spot was kept in the list (3,044). Genes with expression patterns with no spatial information or exclusive to males were eliminated (2,837). Each gene was then placed into two categories based on its known expression pattern - neural (1,612) versus non-neural (1,225) - using the following criteria. We used a Perl script ('keyword_search.pl', Additional data file 22) to search descriptions of 2,837 genes with known expression patterns for genes with defined neural expression. To reduce the number of false positives identified, we first searched under the term 'cell group', which provides simple, but clear, spatial expression information. Using this strategy, the majority of neuronally expressed genes were separated from the full dataset. Several genes in WormBase, however, had no cell group, or contained insufficient data in the cell group description to determine neural expression. Therefore, WormBase was also searched for terms associated with neuronal expression. This list was hand-annotated to ensure its validity (for a full list of search terms, see Additional data file 23).
Hypergeometric calculations
Overlap statistics were calculated using web-based software designed by Jim Lund (University of Kentucky) [116]. The number of genes in the genome was set at 18,666 (total number of genes represented on the C. elegans Affymetrix array). When using this calculation, a representation factor below 1.0 indicates under-representation, while a value above 1.0 indicates over-representation.
Microscopy and identification of GFP-expressing cells
GFP-expressing animals were visualized by differential interference contrast (DIC) and epifluorescence microscopy using either a Zeiss Axioplan or Axiovert compound microscope. Digital images were recorded with CCD cameras (ORCA I, ORCA ER, Hammatsu Corporation, Bridgewater, NJ, USA).
Identification of mouse homologs of uncharacterized conserved C. elegans pan-neural genes described in the Allen Brain Atlas
Twenty-six mouse homologs of the 27 uncharacterized conserved C. elegans genes (Additional data file 9) found in both embryonic and larval pan-neural enriched datasets were identified in Ensembl [117]. Mouse homolog gene names were then used to query the Allen Brain Atlas [118] for expression in the mouse brain. A gene was scored as 'expressed in the brain' if it had an intensity value of 10 or higher (normalized scale 0-100) in at least one brain region on the summary graph interface. The one exception was 1500041B16Rik, which did not have a summary graph; expression in the brain in this case was confirmed by direct visualization of the in situ photographs available in the Brain Atlas.
C. elegans interactome
Genes enriched in both the larval and embryonic pan-neural datasets were used to seed the C. elegans interactome [67,119]. The map was trimmed to exclude genes with one interacting partner. The initial dataset consisted of 711 genes (Additional data file 10), of which 17% (124) were listed in the Interactome database. One large cluster of 34 interactors was identified and contains 17 proteins from the original seed. The additional 17 genes were categorized as enriched, expressed, or not present in the pan-neural datasets. Genes were assigned to categories based on known or predicted functions in C. elegans or other organisms.