基于可持续更新模板库的蛋白质二级结构预测器
A Predictor of Protein Secondary Structure Based on a Continuously Updated Templet Library

作者: 周鹏杰 , 丛培盛 * , 李通化 :同济大学化学科学与工程学院,上海; 文明 :中南大学化学化工学院,湖南 长沙;

关键词: 蛋白质二级结构预测持续更新SPSSM变量条件随机场Protein Secondary Structure Prediction Continuously updated SPSSM Variable Conditional Ran-dom Field

摘要:
蛋白质二级结构预测是计算生物学研究的重要领域。虽然现有优秀的机器学习方法的预测准确度已经超过80%,但是它们都有共同的缺陷:不能及时学习最新实测的蛋白质结构信息,不能持续修改模型和参数,从而满足人们在日新月异时代对蛋白质二级结构预测的要求。本文构建了基于可持续更新模板库的蛋白质二级结构预测器:SIPSS。我们的新方法以同源序列的结构保守性为基本原理。首先我们建立了一个可持续更新的模板库,每月自动从蛋白质数据库中下载新测定的蛋白质结构数据,经过筛选将新的序列和结构信息补充进模板库。然后对于查询序列,用多重同源比对与模板库比对,得到新的变量:SPSSM变量。最后,我们以SPSSM为变量,用条件随机场建模和预测。实际测试表明,SIPSS能够在线学习新的蛋白质结构信息,对新近测定的蛋白质二级结构预测准确度(80.6%)明显高于现有的预测器。SIPSS网站:http://cheminfo.tongji.edu.cn/SIPSS/,可供用户免费使用。

Abstract: Protein secondary structure prediction is an important field of computational biology. Although the accuracies of the existed state-of-the-art approaches are more than 80% but these methods have a common limitation. They couldn’t learn new structure knowledge of currently measured proteins, and couldn’t change the used model and their parameters. Thus, they couldn’t satisfy our expecting in the changing world. Here, we present a predictor of protein secondary structure based on a con-tinuously updated templet library: SIPSS. The basic stone of our approach is structural similarity based on sequence homology. First, a continuously updated templet library is constructed, which can automatically download the measured protein structure data from PDB per-month. After screening, the new information of protein sequences and structures are supplied into the template library. Then a query sequence is aligned against the template library by using PSI BLAST, and a new variable-SPSSM variable is obtained. Last, the SPSSM variable is used in a conditional random field algorithm for modelling and prediction. Our experiments showed that SIPSS can online learn new protein structure information and its prediction accuracy (80.6%) of protein secondary struc-ture measured in recent times is significantly better than the state-of-the art approaches. SIPSS is available free of charge at http://cheminfo.tongji.edu.cn/SIPSS/.

文章引用: 周鹏杰 , 文明 , 丛培盛 , 李通化 (2017) 基于可持续更新模板库的蛋白质二级结构预测器。 计算生物学, 7, 13-22. doi: 10.12677/HJCB.2017.72002

参考文献

[1] Rost, B. (2001) Review: Protein Secondary Structure Prediction Continues to Rise. Journal of Structural Biology, 134, 204-218.
https://doi.org/10.1006/jsbi.2001.4336

[2] Buchan, D.W.A., et al. (2013) Scalable Web Services for the PSIPRED Protein Analysis Workbench. Nucleic Acids Research, 41, 349-357.
https://doi.org/10.1093/nar/gkt381

[3] Mirabello, C. and Pollastri, G. (2013) Porter, Pale Ale 4.0: High-Accuracy Prediction of Protein Secondary Structure and Relative Solvent Accessibility. Bioinformat-ics, 29, 2056-2058.
https://doi.org/10.1093/bioinformatics/btt344

[4] Drozdetskiy, A., et al. (2015) JPred4: A Protein Second-ary Structure Prediction Server. Nucleic Acids Research, 43, 389-394.
https://doi.org/10.1093/nar/gkv332

[5] Heffernan, R., et al. (2015) Improving Prediction of Secondary Structure, Local Backbone Angles, and Solvent Accessible Surface Area of Proteins by Iterative Deep Learning. Scientific Reports, 5, Article ID: 11476.
https://doi.org/10.1038/srep11476

[6] Wang, S., et al. (2016) Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields. Scientific Reports, 6, 355-378.
https://doi.org/10.1038/srep18962

[7] Rost, B. and Sander, C. (1993) Prediction of Protein Secondary Structure at Better than 70% Accuracy. Journal of Molecular Biology, 232, 584-599.
https://doi.org/10.1006/jmbi.1993.1413

[8] Cai, Y.D., et al. (2002) Artificial Neural Network Method for Predicting Protein Second-ary Structure Content. Computers & Chemistry, 26, 347-350.
https://doi.org/10.1016/S0097-8485(01)00125-5

[9] Kieslich, C.A., et al. (2016) CONSSERT: Consensus SVM Model for Accurate Prediction of Ordered Secondary Structure. Journal of Chemi-cal Information & Modeling, 56, 455-461.
https://doi.org/10.1021/acs.jcim.5b00566

[10] Kim, H. and Park, H. (2003) Protein Secondary Structure Prediction Based on an Improved Support Vector Machines Approach. Protein Engineering, 16, 553-560.
https://doi.org/10.1093/protein/gzg072

[11] Liu, Y., et al. (2004) Comparison of Probabilistic Combination Methods for Protein Sec-ondary Structure Prediction. Bioinformatics, 20, 3099-3107.
https://doi.org/10.1093/bioinformatics/bth370

[12] Spencer, M., Eickholt, J. and Cheng, J. (2014) A Deep Learning Network Approach to ab Initio Protein Secondary Structure Prediction. IEEE/ACM Transactions on Computational Biology & Bioinformatics, 103-112.

[13] Gribskov, M., Mclachlan, A.D. and Eisenberg, D. (1987) Profile Analysis: Detection of Distantly Related Proteins. Proceedings of the National Academy of Sciences of the United States of America, 84, 4355-4358.
https://doi.org/10.1073/pnas.84.13.4355

[14] Altschul, S.F., et al. (1997) Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Research, 25, 3389-3402.
https://doi.org/10.1093/nar/25.17.3389

[15] Li, D., et al. (2012) A Novel Structural Position-Specific Scoring Matrix for the Prediction of Protein Secondary Structures. Bioinformatics, 28, 32-39.
https://doi.org/10.1093/bioinformatics/btr611

[16] Rose, P.W., et al. (2015) The RCSB Protein Data Bank: Views of Structural Biology for Basic and Applied Research and Education. Nucleic Acids Research, 43, 345-356.
https://doi.org/10.1093/nar/gku1214

[17] Cong, P., et al. (2013) SPSSM8: An Accurate Approach for Predicting Eight-State Second-ary Structures of Proteins. Biochimie, 95, 2460-2464.

[18] Moult, J., et al. (2016) Critical Assessment of Methods of Protein Structure Prediction: Progress and New Directions in Round XI. Proteins-Structure Function & Bioinformatics, 84, 4-14.
https://doi.org/10.1002/prot.25064

[19] Wang, G. (2005) PISCES: Recent Improvements to a PDB Sequence Culling Server. Nucleic Acids Research, 33, W94-W98.
https://doi.org/10.1093/nar/gki402

[20] Kabsch, W. and Sander, C. (1983) Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen- Bonded and Geometrical Features. Biopolymers, 22, 2577-2637.
https://doi.org/10.1002/bip.360221211

[21] Hoi, S.C.H., Wang, J. and Zhao, P. (2014) LIBOL: A Library for Online Learning Algo-rithms. Journal of Machine Learning Research, 15, 495-499.

[22] Lafferty, J.D., Mccallum, A. and Pereira, F.C.N. (2002) Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of the Eighteenth International Con-ference on Machine Learning, 3, 282-289.

[23] Zemla, A., Fidelis, K. and Rost, B. (1999) A Modified Definition of Sov, a Seg-ment-Based Measure for Protein Secondary Structure Prediction Assessment. Proteins-Structure Function & Bioinformatics, 34, 220-223.
https://doi.org/10.1002/(SICI)1097-0134(19990201)34:2<220::AID-PROT7>3.0.CO;2-K

[24] Jones, D.T. (1999) Protein Sec-ondary Structure Prediction Based on Position-Specific Scoring Matrices. Journal of Molecular Biology, 292, 195-202.
https://doi.org/10.1006/jmbi.1999.3091

[25] Dong, X., et al. (2017) Force Interacts with Macromolecular Structure in Activation of TGF-β. Nature, 542, 55-59.
https://doi.org/10.1038/nature2103

分享
Top