
table of contents ![]() Alternative structural models determined experimentally are available for an increasing number of … |
Biology Articles » Bioinformatics » Conformational analysis of alternative protein structures » Methods
Methods
|
|
|
(1) |
|
|
(2) |
, for each pair of alignment positions i, j.
The total SD matrix TA(i, j) also takes into account the estimates of coordinate uncertainty. The coordinate uncertainty for residue i in model ak is denoted by
. Neglecting the covariance, one can estimate the distance uncertainty
as:
|
|
(3) |
|
|
(4) |
, which combine the contributions to the variance from the distribution of distances and from the uncertainties: |
|
(5) |
The relative SD matrix RA(i, j) provides a measure of significant variability as the ratio of
SET and
SUS:
|
|
(6) |
The maximum relative difference matrix X describes the structural outliers. Matrix X is based on the maximal differences between the distances and it has been previously proposed (Schneider, 2000).
|
|
(7) |
2.5 Comparison matrices
The comparison matrices are used for identifying the structural differences between two subsets A = {a1, ... , ak, ... , am} and B = {b1, ... , bk, ... , bn}. The method provides three types of comparisons, namely between two subsets, between a single entry and a subset, and between two single entries. The backbone conformations are compared using C
atom distances, and side-chain conformations are compared using distances between the centroid of the side-chain and C
atoms.
For each pair of positions (i, j) in the alignment, the extent of agreement between the two distance distributions of the two subsets A and B, relative to the variance of the distributions, is given by the value of the Welch statistic (Welch, 1938). There are two components in estimating the variance, one resulting from the variance of the distances
, and one from the distance uncertainties
. If only the distance distributions are considered, then:
|
|
(8) |
|
|
(9) |
|
|
(10) |
Extending the formalism to the comparison of a single entry a to a subset B, and not considering the coordinate uncertainties, we define:
|
|
(11) |
|
|
(12) |
|
|
(13) |
|
|
(14) |
|
|
(15) |
If only two entries a and b are compared, the variance is derived from the distance uncertainties as previously proposed (Schneider, 2000):
|
|
(16) |
|
|
(17) |
2.6 Identification of hinges, variable and invariant regions
The relative orientation of the backbone is preserved in the invariant regions. Invariant regions can be composed of more than one segment of contiguous residues (invariant segments). Invariant segments in invariant regions are structurally conserved relative to each other. The backbone structure is not preserved in variable segments. Hinge segments are associated with short flexible fragments on the protein backbone and with transitions between different invariant and variable segments.
Invariant backbone regions, as well as hinges and variable segments are identified from the variation matrices or from the comparison matrices using a data smoothing approach. First the hinge segments are identified, then the remaining inter-hinge segments are classified either as variable or as invariant segments. Finally, invariant segments that preserve the relative orientation to each other are grouped into invariant regions. See Supplementary Material for detailed explanation.
A clustering approach is used to identify regions with invariant side chains, based on the variation or comparison matrices computed with side-chain centroid distances. The matrix elements are used as distances for hierarchical clustering with group average agglomeration. The resulting tree is cut at a certain cutoff (s_cutoff).
2.7 Superposition
For an invariant backbone region, an optimal superposition between all structural entries and a representative entry is computed. The representative structure is chosen as the structure with lowest sum of the backbone clustering dissimilarity values in the invariant region. The superposition between each entry and the representative is computed based on the invariant regions. The superpositions were performed using Biopython http://biopython.org/, as provided in the PDB module (Hamelryck and Manderick, 2003).
2.8 Implementation and visualization tools
The methods were implemented in Python http://www.python.org/, using the Biopython library. Version 2.1.0 of the R environment for statistical computing (R Development Core Team, 2005) was used for clustering, for data smoothing and for visualization. PyMOL (http://www.pymol.org) was used for molecular rendering and visualization.
rating: 0.00 from 0 votes | updated on: 1 Dec 2007 | views: 1219 |

© Biology-Online.org. All Rights Reserved. Register | Login | About Us | Contact Us | Link to Us | Disclaimer & Privacy
Science Network - Braintrack.com - University Directory | Chemicool.com - Chemistry | EquationSheet.com - Equations