计算生物学

Vol.1 No.2 (December 2011)

基于支持向量机的蛋白质命名实体识别的研究
Research of Protein Named Entity Recognition Based on SVMs

 

作者:

龚乐君 , 付亚星 , 孙啸 , 谢建明 , 于双鑫

 

关键词:

支持向量机蛋白质实体识别特征选择Supports Vector Machines (SVMs) Protein Entity Recognition Feature Selection

 

摘要:

发展一种利用支持向量机识别蛋白质命名实体的方法,选择四组特征对蛋白质语料进行识别实验。实验表明,与基线系统相比,上下文特征有较小的增幅,而当前词的词性及词形的组合特征获得了最好的性能,达到78.43%的准确率。这一研究结果显示词性及词形特征在蛋白质实体识别中起着重要的作用。

This paper describes an approach to identify protein named entity using Supports Vector Machines (SVMs), and selects four groups of features to do experiments for the protein corpus. Experiment results show the system performance of context features increases smaller than baseline system, and the combined feature of part of speech (POS) and word type is achieved 78.43% accuracy which is the best performance in all ex- periments. The research results show the combined feature of POS and word type play important roles in the protein entity recognition.

文章引用:

龚乐君 , 付亚星 , 孙啸 , 谢建明 , 于双鑫 (2011) 基于支持向量机的蛋白质命名实体识别的研究。 计算生物学, 1, 5-10. doi: 10.12677/hjcb.2011.12002

 

参考文献

分享
Top