A Predictor of Protein Secondary Structure Based on a Continuously Updated Templet Library
Abstract: Protein secondary structure prediction is an important field of computational biology. Although the accuracies of the existed state-of-the-art approaches are more than 80% but these methods have a common limitation. They couldn’t learn new structure knowledge of currently measured proteins, and couldn’t change the used model and their parameters. Thus, they couldn’t satisfy our expecting in the changing world. Here, we present a predictor of protein secondary structure based on a con-tinuously updated templet library: SIPSS. The basic stone of our approach is structural similarity based on sequence homology. First, a continuously updated templet library is constructed, which can automatically download the measured protein structure data from PDB per-month. After screening, the new information of protein sequences and structures are supplied into the template library. Then a query sequence is aligned against the template library by using PSI BLAST, and a new variable-SPSSM variable is obtained. Last, the SPSSM variable is used in a conditional random field algorithm for modelling and prediction. Our experiments showed that SIPSS can online learn new protein structure information and its prediction accuracy (80.6%) of protein secondary struc-ture measured in recent times is significantly better than the state-of-the art approaches. SIPSS is available free of charge at http://cheminfo.tongji.edu.cn/SIPSS/.
文章引用: 周鹏杰 , 文明 , 丛培盛 , 李通化 (2017) 基于可持续更新模板库的蛋白质二级结构预测器。 计算生物学， 7， 13-22. doi: 10.12677/HJCB.2017.72002
Rost, B. (2001) Review: Protein Secondary Structure Prediction Continues to Rise. Journal of Structural Biology, 134, 204-218.
Buchan, D.W.A., et al. (2013) Scalable Web Services for the PSIPRED Protein Analysis Workbench. Nucleic Acids Research, 41, 349-357.
Mirabello, C. and Pollastri, G. (2013) Porter, Pale Ale 4.0: High-Accuracy Prediction of Protein Secondary Structure and Relative Solvent Accessibility. Bioinformat-ics, 29, 2056-2058.
Drozdetskiy, A., et al. (2015) JPred4: A Protein Second-ary Structure Prediction Server. Nucleic Acids Research, 43, 389-394.
Heffernan, R., et al. (2015) Improving Prediction of Secondary Structure, Local Backbone Angles, and Solvent Accessible Surface Area of Proteins by Iterative Deep Learning. Scientific Reports, 5, Article ID: 11476.
Wang, S., et al. (2016) Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields. Scientific Reports, 6, 355-378.
Rost, B. and Sander, C. (1993) Prediction of Protein Secondary Structure at Better than 70% Accuracy. Journal of Molecular Biology, 232, 584-599.
Cai, Y.D., et al. (2002) Artificial Neural Network Method for Predicting Protein Second-ary Structure Content. Computers & Chemistry, 26, 347-350.
Kieslich, C.A., et al. (2016) CONSSERT: Consensus SVM Model for Accurate Prediction of Ordered Secondary Structure. Journal of Chemi-cal Information & Modeling, 56, 455-461.
Kim, H. and Park, H. (2003) Protein Secondary Structure Prediction Based on an Improved Support Vector Machines Approach. Protein Engineering, 16, 553-560.
Liu, Y., et al. (2004) Comparison of Probabilistic Combination Methods for Protein Sec-ondary Structure Prediction. Bioinformatics, 20, 3099-3107.
 Spencer, M., Eickholt, J. and Cheng, J. (2014) A Deep Learning Network Approach to ab Initio Protein Secondary Structure Prediction. IEEE/ACM Transactions on Computational Biology & Bioinformatics, 103-112.
Gribskov, M., Mclachlan, A.D. and Eisenberg, D. (1987) Profile Analysis: Detection of Distantly Related Proteins. Proceedings of the National Academy of Sciences of the United States of America, 84, 4355-4358.
Altschul, S.F., et al. (1997) Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Research, 25, 3389-3402.
Li, D., et al. (2012) A Novel Structural Position-Specific Scoring Matrix for the Prediction of Protein Secondary Structures. Bioinformatics, 28, 32-39.
Rose, P.W., et al. (2015) The RCSB Protein Data Bank: Views of Structural Biology for Basic and Applied Research and Education. Nucleic Acids Research, 43, 345-356.
 Cong, P., et al. (2013) SPSSM8: An Accurate Approach for Predicting Eight-State Second-ary Structures of Proteins. Biochimie, 95, 2460-2464.
Moult, J., et al. (2016) Critical Assessment of Methods of Protein Structure Prediction: Progress and New Directions in Round XI. Proteins-Structure Function & Bioinformatics, 84, 4-14.
Wang, G. (2005) PISCES: Recent Improvements to a PDB Sequence Culling Server. Nucleic Acids Research, 33, W94-W98.
Kabsch, W. and Sander, C. (1983) Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen- Bonded and Geometrical Features. Biopolymers, 22, 2577-2637.
 Hoi, S.C.H., Wang, J. and Zhao, P. (2014) LIBOL: A Library for Online Learning Algo-rithms. Journal of Machine Learning Research, 15, 495-499.
 Lafferty, J.D., Mccallum, A. and Pereira, F.C.N. (2002) Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of the Eighteenth International Con-ference on Machine Learning, 3, 282-289.
Zemla, A., Fidelis, K. and Rost, B. (1999) A Modified Definition of Sov, a Seg-ment-Based Measure for Protein Secondary Structure Prediction Assessment. Proteins-Structure Function & Bioinformatics, 34, 220-223.
Jones, D.T. (1999) Protein Sec-ondary Structure Prediction Based on Position-Specific Scoring Matrices. Journal of Molecular Biology, 292, 195-202.
Dong, X., et al. (2017) Force Interacts with Macromolecular Structure in Activation of TGF-β. Nature, 542, 55-59.