Login

Join for Free!
17207 members
table of contents table of contents

Biology Articles » Bioinformatics » Bioinformatics in microbial biotechnology – a mini review » Pair-wise genome comparison

Pair-wise genome comparison
- Bioinformatics in microbial biotechnology – a mini review

After the identification of gene-functions, a natural step is to perform pair-wise genome comparisons. Pair-wise genome comparison of a genome against itself provides the details of paralogous genes – duplicated genes that have similar sequence with some variation in function. Pair-wise genome comparisons of a genome against other genomes have been used to identify a wealth of information such as ortholologous genes – functionally equivalent genes diverged in two genomes due to speciation, different types of gene-groups – adjacent genes that are constrained to occur in close proximity due to their involvement in some common higher level function, lateral gene-transfer – gene transfer from a microorganism that is evolutionary distant, gene-fusion/gene-fission, gene-group duplication, gene-duplication, and difference analysis to identify genes specific to a group of genomes such as pathogens, and conserved genes [11,13].

To derive orthologs and sets of gene-groups, genomes are modeled as an ordered set of genes, and a pair of genomes is modeled as a bipartite graph where each node in one set is connected to homologous nodes – similar genes using pair-wise gene-alignment – in the second set. Orthologs are derived as the best matching homologs. To identify homologous gene-group, two neighboring genes in one genome that are homologous to two neighboring genes in the other genome are identified, a window consisting of neighboring genes is created in both the genomes and slided until the next gene in the first genome has no homologous gene in the corresponding neighborhood window in the second genome. After a non-matching gene is identified, the matching genes are collected as one gene-group.

The detailed comparative study [11,12,14] has shown that: (i) a large percentage of these gene-groups are co-transcribed or co-regulated [11,26], (ii) there are multiple types of gene-groups in a genome, (iii) the order of homologous genes in a gene-group is not always the same in two microorganisms, (iv) gene-groups are duplicated a lot, (v) all the genes in ordered gene-group are embedded in the same pathway, and unordered gene-groups occur at the junction points of adjacent pathways [12], (vi) larger genomes share more genes-groups despite not being evolutionary too close, (vii) gene-duplication and gene-insertion/gene-deletion are common means of genome restructuring, and horizontal gene-transfer and gene fusion are not uncommon, and (viii) gene duplication occurs mainly for the genes involved in cell surface interaction, nutrient transport, and sensor proteins. The rationale for duplication is a need to adapt under different external conditions and the use of similar mechanism for multiple sensors and transport proteins. The knowledge of genes specific to pathogens, genes inserted/deleted from pathways that are homologous to genes in the plasmids, and conserved genes are very useful to identify candidates for vaccine development and anti-microbial agents [11,56,73].

An interesting observation of pair-wise genome comparison studies has been that genome restructuring occurs by a combination of insertion/deletion, duplication, and fusion of domains as well as genes. However, the domain level comparative analysis tools are in the stage of infancy due to computational complexity and the limited availability of domain level functional information about various genes from the wet-labs.


rating: 5.50 from 4 votes | updated on: 31 Oct 2006 | views: 886 |

Rate article:







excellent!bad…