Close
Help





JOURNAL

Evolutionary Bioinformatics

Performance of Hidden Markov Models in Recovering the Standard Classification of Glycoside Hydrolases

Submit a Paper


Evolutionary Bioinformatics 2017:13 1176934317703401

Original Research

Published on 20 Apr 2017

DOI: 10.1177/1176934317703401


Further metadata provided in PDF



Sign up for email alerts to receive notifications of new articles published in Evolutionary Bioinformatics

Abstract

Glycoside hydrolases (GHs) are carbohydrate-active enzymes that assist the hydrolysis of glycoside bonds of complex sugars into carbohydrates. The current standard GH family classification is available in the CAZy database, which is based on the similarities of amino acid sequences and curated semi-automatically. However, with the exponential increase in data availability from genome sequences, automated classification methods are required for the fast annotation of coding sequences. Currently, the dbCAN database offers automatic annotations of signature domains from CAZy-defined classifications using a statistical approach, the hidden Markov models (HMMs). However, dbCAN does not contain the entire set of CAZy GH families. Moreover, no evaluation has been conducted so far of the viability of using HMM profiles as a means of automatically assigning GH amino acid sequences to the standard CAZy GH family classification itself. In this work, we performed a meta-analysis in which amino acid sequences from CAZy-defined GH families were used to build HMM family-specific profiles. We then queried a set with ~300 000 GH sequences against our database of HMM profiles estimated from CAZy families. We conducted the same evaluation against the available dbCAN HMM profiles. Our analyses recovered 65% of matches with the standard CAZy classification, whereas dbCAN HMMs resulted in 61% of matches. We also provided an analysis of the types of errors commonly found when HMMs are used to recover CAZy-based classifications. Although the performance of HMM was good, further developments are necessary for a fully automated classification of GH, allowing the standardization of GH classification among protein databases.



Downloads

PDF  (1.25 MB PDF FORMAT)

RIS citation   (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)

XML   (44.34 KB XML FORMAT)

Supplementary Files 2   (333.67 KB ZIP FORMAT)

BibTex citation   (BIBDESK, LATEX)





Quick Links


New article and journal news notification services