We completed an EST sequencing project to characterize genes expressed in the cricket nerve cord that underlie pulse rate of male song in L. kohalensis. By constructing a cDNA library from nymphal and adult crickets, our aim was to enhance the discovery of genes involved in the construction of the central pattern generating circuit (CPG) underlying rhythmic singing behavior. In addition, we enriched for full-length cDNA by utilizing a template-switching reverse transcriptase (SMART™ technology – BD Clontech, Mountain View, CA). Furthermore, we increased the representation of genes expressed in low-copy number by normalizing our amplified cDNA using a double-stranded nuclease (Trimmer-Direct Kit; Evrogen, Moscow). Sequencing of ~22,000 clones from this library by The Institute for Genomic Research (TIGR) produced 14,502 high quality EST's with an average length greater than 700 bases (Tables 1, 2, 3). Assembly of these EST's produced 8,607 unique sequences. We were then able to annotate 5,225 of these genes based on BLAT protein comparisons against a comprehensive non-redundant protein database maintained by the Dana-Farber Cancer Institute. Of these annotated genes, we could assign gene ontology (GO) terms to 408 genes. The diversity of our library is reflected in the large number of different GO terms assigned to these genes, including 572 Biological Process, 275 Molecular Functions, and 212 Cellular Compartment GO terms, and suggests that we were successful in our attempt to normalize cDNA representation in our library.
Cricket Gene Index
A Gene Index based on our EST sequencing project was assembled and is publicly-available at [85]. This electronic resource consists of a description of the cricket EST library, including a summary of the number of unique sequences, the distribution of tentative consensus (TC) sequences, gene annotations, GO terms, and a set of 70-mer oligonucleotide probes. The cricket Gene Index thus joins more than 30 other animal gene indices hosted by DFCI and represents the second largest EST resource for Orthoptera available online. While the cricket EST project sequenced roughly one third of that sequenced by the Locusta migratoria project (45,754 EST's, [86]) this disparity is not reflected in the total number of unique sequences identified by these two projects (L. migratoria = 12,161 unique sequences versus L. kohalensis = 8,607 unique sequences).
Crickets as models for behavioral genomics
Species of Orthoptera have long served as neurophysiological models of behavior. Our analysis of 14,502 EST sequences and subsequent production of 8607 singletons and tentative consensus sequences from a nerve cord derived library represents a major advance in the available genomic resources for the study of cricket neurophysiology and behavior. This resource will provide valuable tools with which to examine the underlying genetic basis of cricket stridulation, a model for the study of central pattern generation (Table 4). The resources presented here represent the first opportunity to analyze the neurophysiologic process of stridulation at the genomic scale.
Developing additional genomic resources for Laupala
We are utilizing multiple approaches in order to dissect the genetic basis of pulse rate variation in Laupala. In addition to ongoing QTL mapping efforts [64] (Shaw et al. in press), the Laupala Gene Index is a first step towards two additional genetic approaches to our study of pulse rate evolution. First, the oligonucleotide probe set developed from our Gene Index is the backbone of an oligonuclelotide micoarray being constructed to study gene expression in Laupala. These microarrays will be used to study patterns of gene expression across multiple species [87] to identify candidate genes whose expression varies with pulse rate. Second, the EST's are being screened for variation that can be used in a linkage analysis. Placing these EST's on the Laupala linkage map will facilitate comparisons between the QTL analysis and the study of gene expression. The identification of candidate genes that fall within QTL regions will strengthen the support for these candidate genes and guide our choice of which genes to use in functional studies. Furthermore, estimating the linkage relationships of EST's within Laupala and comparing them with known orthologs in model systems will allow us to identify regions of synteny across multiple species. Establishing such areas of synteny is another powerful approach to identifying strong candidate genes [88-90]. Given the now rich genomic resources available in Laupala, the extensive divergence of male song CPG and its influence on reproductive isolation, and the fairly limited genetic divergence within this genus, Laupala represents an excellent system to study the evolutionary genomics of CPG diversification.
In addition, the development of genomic resources in Laupala can be used to tackle some of the most urgent topics in evolutionary biology. Few other systems provide both the genomic tools and evolutionary power necessary to provide an understanding of how gene expression evolves in recently diverged taxa [91]. Furthermore, because male pulse rate plays a critical function in reproductive isolation in this genus, identifying the genes whose expression contributes to the construction of this phenotype will provide insight into how the evolution of gene expression contributes to reproductive isolation during the course of speciation [92].
Comparative genomics in insects
In the last 15 years, there has been a proliferation of genomic resources available for model organisms. As technology has improved, whole genome sequences have become available for a growing number of species and for the first time comparative studies of entire genomes have become possible [93-96]. However, the phylogenetic breadth of insect species in which genomic tools have been developed is extremely limited. For example, of the 37 insect genomes sequencing projects currently completed or under way, 22 (~60%) involve species of Drosophila. The remaining species are either directly related to human health (the mosquitoes Aedes aegypti and Culex pipiens, the Tsetse fly Glossina morsitans, the human louse Pediculus humanus humanus, and the Hemipteran vector of Chaga's disease Rhodnius prolixus) [97], or are of agriculture importance (the red flour beetle Tribolium casteneum, the honey bee Apis mellifera, the silkworm moth Bombyx mori, the pea aphid Acyrthosiphon pisum, and the parasitoid wasp Nasonia vitripennis). The only species with significant genomic tools that is not of biomedical or agricultural importance is the African butterfly (Bicyclus anyana), an evo-devo model for wing pattern development [98]. The vast majority of these insects are holometabolous and possess relatively small genomes [99,100]. However, this severe phylogenetic and genome-size bias limits comparative studies of insect and arthropod evolution (Figure 1 &2). The cricket Gene Index presented here represents a significant contribution to the genomic resources available for comparative molecular studies of basal insect lineages (Table 5). Based on our preliminary comparative analysis, Laupala, a representative of the Orthopteran suborder Ensifera, is as distinct from Locusta, a representative of the Califeran suborder of the Orthoptera, as it is from other insect orders.