Login

Join for Free!
19178 members
table of contents table of contents

This paper will provide some examples of misleading annotations with regard to …


Biology Articles » Bioinformatics » Bioinformatics as a critical prerequisite to transcriptome and proteome studies » Proteins rich in particular amino acids

Proteins rich in particular amino acids
- Bioinformatics as a critical prerequisite to transcriptome and proteome studies

Cell wall structural proteins provide interesting examples of poor quality annotation because their sequences are rich in particular amino acids. Three classes of structural proteins have been clearly defined: extensins characterized by the presence of numerous Ser-Pron (n≥3) motifs separated by Tyr-, Lys-, His, and Val-rich regions (Kieliszewski and Lamport, 1994Go); Hydroxyproline/Proline-Rich proteins (H/PRPs) characterized by a high content in Pro and Pro-Pro-X-Y-Lys motifs, where X, Y=Val, Tyr, His, or Glu (Showalter, 1993Go); and Glycine-Rich proteins (GRPs) characterized by a high content in Gly (up to 70%) organized in repeats of the (Gly-X) motif, where X=Gly, Ala, or Ser (Showalter, 1993Go). Numerous proteins predicted to have a signal peptide by PSORT (http://psort.nibb.ac.jp/form.html) and TargetP (http://www.cbs.dtu.dk/services/TargetP/) and showing only short stretches of Pro or Gly have been wrongly annotated as extensin-like, PRP or GRP. This is notably the case for At2g33790 (14.6% Pro), At5g26070 (23.5% Pro), and At4g28300 (13.6% Pro) annotated as extensins or PRPs in the Uniprot, NCBI, TAIR, and TIGR databases. At4g34300 (14.7% Gly), At4g33930 (14.6% Gly), and At2g15340 (17.6% Gly) are presently annotated as GRPs in the NCBI, TAIR, and TIGR databases, but as putative or unknown proteins in the Uniprot and MIPS databases. Other examples are provided by a recent transcriptome study on peach by Trainotti et al. (2003)Go. Contig 010 shows homology to the S65062 [GenBank] cotton fiber protein 6 (John, 1996Go). Since, this protein has only one short Ser-Gly motif, it cannot be classified among structural proteins as suggested by the authors. In the same way, contig 125 shows homology to Arabidopsis thaliana NP_176440 [GenBank] (At1g62510). The primary sequence of the encoded protein has only one short X-Pro (with X = His, Lys, Asn, Thr, Ser) domain that again is not sufficient to classify it among the structural proteins mentioned in the MIPS database. It actually comprises a PFAM domain (PF00234) defining a protease inhibitor/seed storage/LTP family (http://hits.isb-sib.ch/cgi-bin/PFSCAN) clearly indicated in the NCBI, TAIR, and TIGR databases.


rating: 1.00 from 2 votes | updated on: 31 Oct 2006 | views: 616 |

Rate article:







excellent!bad…