Login

Join for Free!
17703 members
table of contents table of contents

Biology Articles » Bioinformatics » Bioinformatics in microbial biotechnology – a mini review » Automated identification of genes

Automated identification of genes
- Bioinformatics in microbial biotechnology – a mini review

After the contigs are joined, the next issue is to identify the protein coding regions or ORFs (open reading frames) in the genomes. The identification of ORFs can be done in three ways: (1) using Hidden Markov Model (HMM) based techniques such as GLIMMER [24] and GeneMark [41], (2) by searching the known database of genes such as GenBank ftp://ftp.ncbi.nih.gov/genbank/ to identify genes, and (3) the use of algorithms based on decision trees that identify start codons [64] and stop codons of the coding regions. HMM based techniques develop multiple probabilistic state machines each capable of identifying an ORF. Each machine predicts the next nucleotide character using a state transition with maximum probability and matches the predicted nucleotide character with the current nucleotide character in the actual sequence. Statistical training using known sample sequences derives the probability of state transition. In the case of microbial genomes, the HMM based software such as GLIMMER has provided 95% – 97% accuracy.


rating: 5.50 from 4 votes | updated on: 31 Oct 2006 | views: 967 |

Rate article:







excellent!bad…