Prediction of Four Kinds of Supersecondary Structures in Enzymes by Using Ensemble Classifier Based on SVM
Abstract: Enzymes are a kind of protein that has catalytic function. The study of supersecondary structures in enzymes plays an important role in the structure and function of enzymes. Based on enzyme sequence information, four kinds of supersecondary structures in enzymes were researched for the first time. Amino acids of sites and dipeptide components of sites were selected as parameters, for five selections of the best fixed-length pattern, the predictive results in 7-fold cross-validation were not ideal by using scoring function method; scores were selected as input parameters of support vector machine (SVM); the results were fused with weighted factors by using ensemble classifier; the better performance was obtained; the overall prediction accuracy was 72.64% and the Matthews correlation coefficient was above 0.57. Therefore, ensemble classifier based on SVM is an effective method to predict four kinds of supersecondary structures in enzymes.
文章引用: 高苏娟 , 胡秀珍 (2014) 基于支持向量机的整体分类器算法 预测酶蛋白质中四类简单超二级结构 。 计算生物学， 4， 1-11. doi: 10.12677/HJCB.2014.41001
 Cai, Y.D. and Chou, K.C. (2005) Using Functional Domain Composition To Predict Enzyme Family Classes. Journal of Proteome Research, 4, 109-111.
 Kumar, M., Bhasin, M., Natt, N.K., et al. (2005) BhairPred: Prediction of β-Hairpins in a Protein from Multiple Alignment Information Using ANN and SVM Techniques. Nucleic Acids Research, 33, 154-159.
 胡秀珍, 李前忠 (2006) 用离散量的方法识别蛋白质的超二级结构. 生物物理学报, 6, 424-428.
 Zou, D.S., He, Z.S., He, J.Y., et al. (2011) Supersecondary Structure Prediction Using Chou’s Pseudo Amino Acid Composition. Journal of Computational Chemistry, 32, 271-278.
 Hu, X.Z. and Li, Q.Z. (2008) Prediction of the β-Hairpins in Proteins Using Support Vector Machine. The Protein Journal, 27, 115-122.
 Hu, X.Z., Li, Q.Z. and Wang, C.L. (2010) Recognition of β-Hairpin Motifs in Proteins by Using the Composite Vector. Amino Acids, 38, 915-921.
 Sun, L.X., Hu, X.Z. and Li, S.B. (2012) Predicting βαβ Motifs Based on SVM by Using the ID and MS Values. 2012 5th International Conference on BioMedical Engineering and Informatics (BMEI 2012), Chongqing, 16-18 October 2012, 910-914.
 Wang, Z., Harkins, P.C., Ulevitch, R.J., Han, J.H., Cobb, M.H. and Goldsmith, E.J. (1997) The Structure of MitogenActivated Protein Kinase p38 at 2.1-Å Resolution. Proceedings of the National Academy of Sciences, 94, 2327-2332.
 Batistic, O. and Kudla, J. (2004) Integration and Channeling of Calcium Signaling through the CBL Calcium Sensor/ CIPK Protein Kinase Network. Planta, 219, 915-924.
 Webb, E.C. (1992) Enzyme Nomenclature. Academic Press, SanDiego.
 Cartharius, K., Frech, K., Grote, K., et al. (2005) Mat Inspector and Beyond: Promoter Analysis Based on Transcription Factor Binding Sites. Bioinformatics, 21, 2933-2942.
 Kel, A.E., GoBling, E., Reuter, I., et al. (2003) MATCHTM: A Tool for Searching Transcription Factor Binding Sites in DNA Sequences. Nucleic Acids Research, 31, 3576-3579.
 Vapnik, V. (1995) The Nature of Statistical Learning Theory. Springer, New York.
 Vapnik, V. (1998) Statistical Learning Theory. Wiley-Interscience, Hoboken
 Hu, X.Z. and Li, Q.Z. (2008) Using Support Vector Machine to Predict β-Turns and γ-Turns in Proteins. Computational Chemistry, 29, 1867-1875.
 Chou, K.C. and Cai, Y.D. (2002) Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location. Journal of Biological Chemistry, 227, 45765-45769.
 Ding, C.H.Q. and Dubchak, I. (2001) Multi-Class Protein Fold Recognition Using Support Vector Machines and Neural Networks. Bioinformatics, 17, 349-358.
 Shi, J.Y., Pan, Z., Zhang, S.W. and Liang, Y. (2006) Protein Fold Recognition with Support Vector Machines Fusion Network. Progress in Biochemistry Biophysics, 3, 155-162.
Chang, C.C. and Lin, C.J. (2001) LIBSVM: A Library for Support Vector Machines. Software.
 Cai, Y.D., Guo, P.Z. and Chou, K.C. (2005) Predicting Enzyme Family Classes by Hybridizing Gene Product Composition and Pseudo-Amino Acid Composition. Journal of Theoretical Biology, 234, 145-149.
 Chou, K.C. and Cai, Y.D. (2004) Using GO-PseAA Predictor to Predict Enzyme Sub-Class. Biochemical and Biophysical Research Communications, 325, 506-507.
 Shen, H.B. and Chou, K.C. (2007) EzyPred: A Top-Down Approach for Predicting Enzyme Functional Classes and Subclasses. Biochemical and Biophysical Research Communications, 364, 53-59.
 Shi, R.J. and Hu, X.Z. (2010) Predicting Enzyme Subclasses by Using Support Vector Machine with Composite Vectors. Protein and Peptide Letters, 17, 599-604.
 Hu, X.Z. and Ting, W. (2011) Prediction of Enzyme Subclass by Using Support Vector Machine Based on Improved Parameters. 2011 7th International Conference on Natural Computation, Shanghai, 26-28 July 2011, 593-598.
 Wang, Y. and Hu, X.Z. (2011) Predicting of Oxidoreductase and Lyase Subclasses by Using Support Vector Machine. 2011 10th IEEE/ACIS International Conference on Computer and Information Science, Sanya, 16-18 May 2011, 2731.
 Liu, X.X. and Hu, X.Z. (2011) Identifying the β-Hairpin Motifs in Enzymes by Using Support Vector Machine. 2011 10th IEEE/ACIS International Conference on Computer and Information Science, Sanya, 16-18 May 2011, 21-26.
 Long, H.X. and Hu, X.Z. (2012) Prediction β-Hairpin Motifs in Enzyme Protein Using Three Methods. 2012 8th International Conference on Natural Computation (ICNC 2012), Chongqing, 29-31 May 2012, 570-574.
 阎隆飞, 孙之荣 (1999) 蛋白质分子结构.清华大学出版社, 北京, 43-56.
 Kuhn, M., Meiler, J. and Baker, D. (2004) Strand-Loop-Strand Motifs: Prediction of Hairpin and Diverging Turns in Proteins. Protein, 5, 282-288.
 Shen, H.B. and Chou, K.C. (2006) Ensemble Classifier for Protein Fold Pattern Recognition. Bioinformatics, 22, 17171722.
 Cruz, X., Hutchinson, E.G., Shepherd, A., et al. (2002) Predicting Protein Topology: An Approach to Identifying Bhairpins. Proceedings of the National Academy of Sciences, 99, ll157-1l162.