Comparing sequences without using alignments: application to HIV/SIV subtyping
Gilles Didier1, Laurent Debomy2, Maude Pupin2, Ming Zhang3,4, Alexander Grossmann5, Claudine Devauchelle5 and Ivan Laprevotte5
1Institut Mathématique de Luminy, UMR 6206, Campus de Luminy, Case 907, 13288 Marseille Cedex 9, France
2Equipe Bioinfo, LIFL, USTL, cité scientifique, Batiment M3, 59655 Villeneuve d'Ascq, France
3Department of Bioinformatics, Institute of Microbiology and Genetics, University of Goettingen. Goettingen 37077, Germany
4Theoretical Biology and Biophysics Group, Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA
5Laboratoire Statistique et Génome, UMR 8071, Tour Evry 2, 523 Place des Terrasses, 91034 Evry, France
BMC Bioinformatics 2007,
8:1doi:10.1186/1471-2105-8-1. [Open Access]
Abstract
Background
In general, the construction of trees is based on sequence
alignments. This procedure, however, leads to loss of informationwhen
parts of sequence alignments (for instance ambiguous regions) are
deleted before tree building. To overcome this difficulty, one of us
previously introduced a new and rapid algorithm that calculates
dissimilarity matrices between sequences without preliminary alignment.
Results
In this paper, HIV (Human Immunodeficiency Virus) and SIV (Simian
Immunodeficiency Virus) sequence data are used to evaluate this method.
The program produces tree topologies that are identical to those
obtained by a combination of standard methods detailed in the HIV
Sequence Compendium. Manual alignment editing is not necessary at any
stage. Furthermore, only one user-specified parameter is needed for
constructing trees.
Conclusion
The extensive tests on HIV/SIV subtyping showed that the virus
classifications produced by our method are in good agreement with our
best taxonomic knowledge, even in non-coding LTR (Long Terminal Repeat)
regions that are not tractable by regular alignment methods due to
frequent duplications/insertions/deletions. Our method, however, is not
limited to the HIV/SIV subtyping. It provides an alternative tree
construction without a time-consuming aligning procedure.