Login

Join for Free!
25579 members
table of contents table of contents

Neighbor-Net is a novel method for phylogenetic analysis that is currently being …


Biology Articles » Molecular Biology » Consistency of the Neighbor-Net Algorithm » Description of the Neighbor-Net algorithm

Description of the Neighbor-Net algorithm
- Consistency of the Neighbor-Net Algorithm

3 Description of the Neighbor-Net algorithm

In this section we present a detailed description of the Neighbor-Net algorithm, as implemented in the current version of SplitsTree [9]. The Neighbor-Net algorithm was originally described in [2], where the reader may find a more informal description for how it works. For the convenience of the reader we will use the same notation as in [2] where possible.

In Figure 2 we present pseudo-code for the Neighbor-Net algorithm. The aim of the algorithm is, for a given input distance function d, to compute a circular split weight function ω so that the distance function dω gives a good approximation to d. The resulting distance function dω can then be represented by a planar phylogenetic network as indicated in the last section.

To this end, NEIGHBOR-NET first computes an ordering Θ of X, and then applies a non-negative least-squares procedure to find a best fit for d within the set of distance functions {d|:Ϭ(X) → ≥0, Ϭ⊆ ϬΘ}. More details concerning the least-squares procedure may be found in [2]: Here we will concentrate on the description of the key computation for finding an ordering Θ of X, which is detailed in the procedure FINDORDERING.

An (ordered) cluster is a non-empty finite set C together with an ordering ΘC = c1, ..., ck of the elements in C, k = |C|. Two elements a, b C are called neighbors if there exists i ∈ {1, ..., k - 1} such that a = ci and b = ci+1, or b = ci and a = ci+1. The input of the procedure FINDORDERING consists of a set of mutually disjoint clusters, together with a distance function d on the set . The ordering Θ = y1, ..., yn of Y that is returned by FINDORDERING must be compatible with the collection of ordered clusters, that is, for every cluster C there must exist i, j ∈ {1, ..., n}, i j, with the property that ΘC = yi, ..., yj or ΘC = yj, ..., yi.

The procedure FINDORDERING calls itself recursively. Apart from the base case (line 5 of Figure 2), where the recursion bottoms out, two different cases are considered – the reduction and selection cases (lines 7–15 and lines 17–22 of Figure 2, respectively). In the reduction case a cluster C with k = |C| ≥ 3 is replaced by a smaller cluster C'. In particular, in lines 7–11 we let ΘC = c1, ..., ck be the ordering of C with c1 = x, c2 = y, c3 = z, and put C' = (C\{x, y, z}) ∪ {u, v} and ΘC'= u, v, c4, ..., ck, where u and v are two new elements not contained in Y. Then, in lines 12–14, we define a distance function d' on the set Y' = (Y\{x, y, z}) ∪ {u, v} using the formulae:

  (1)

where α, β and γ are positive real numbers satisfying α + β + γ = 1 (note that these formulae slightly differ from the ones given in [2] in which there is a typographical error). In the current implementation of Neighbor-Net the values α = β = γ = 1/3 are used.

When FINDORDERING is recursively called with the new collection of clusters and distance function d' it returns an ordering of Y' that is compatible with . Thus, there exists i ∈ {1, ..., n - 2} such that either u = and v = or v = and u = . The resulting ordering Θ of Y is then defined (in line 14) as follows:

  (2)

This completes the description of the reduction case.

We now describe the selection case. Note that in view of line 6 this case only applies if every cluster in contains at most two elements. In lines 17–18, two clusters C1, C2 are selected and replaced by the single cluster C' = C1 C2. The clusters C1 and C2 are selected as follows: We define a distance function on the set of clusters by

and select C1, C2 , C1 C2 that minimize the quantity

  (3)

where m is the number of clusters in . The function Q that is used to select pairs of clusters is called the Q-criterion. Note that this is a direct generalization of the selection criterion used in the NJ algorithm [2]. However, using only this criterion yields a method that is not consistent as illustrated in Figure 3. So, once the clusters C1 and C2 have been selected we use a second criterion to determine an ordering ΘC' in line 19 for the new cluster C'. In particular, for every x C1 C2 we define

put = m + |C1| + |C2| - 2, and select x1 C1 and x2 C2 that minimize the quantity

[d](x1, x2) = ( - 2)d(x1, x2) - R(x1) - R(x2).   (4)

We then choose an ordering ΘC' in which x1 and x2 are neighbors and for which every two elements that were neighbors in C1 or C2 remain neighbors. This completes the description of the selection case, and hence the description of the procedure FINDORDERING.


rating: 1.00 from 1 votes | updated on: 1 Sep 2007 | views: 2601 |

Rate article:







excellent!bad…