基于主题词的文本案例检索算法研究
Algorithm Optimization about Textual Case Retrieval Based on Topic Words

作者: 孙 镇 :北京大学,北京;全国组织机构代码管理中心,北京; 袁 辉 , 孙 泰 , 宫 政 , 赵 捷 :全国组织机构代码管理中心,北京; 汤 磊 :中国测绘科学研究院,北京;

关键词: 布尔检索主题词语义距离改进检索算法查准率查全率Boolean Retrieval Topic Words Semantic Distance Improved Algorithm Precision Rate Recall Rate

摘要:
分析传统文本检索方法布尔检索的本质,发现该检索方法存在两个缺点:检索算法忽略了词语之间的语义关系以及不能对检索结果进行重要性排序,针对于此提出利用基于主题词的改进检索算法。通过丰富主题词构建关键词库语义信息检索框架的基础上,计算关键词的语义距离和相似度。最后将改进后的算法应用到灾情案例检索系统中,并对检索结果做性能分析,实验证明该算法在文本检索的查准率和查全率上都有较好的改善。

Abstract:
Two shortages of Boolean retrieval, ignoring the semantic relations between words and unable to rank the retrieval results in order of importance, were found by analyzing the essence of traditional text retrieval, and in view of which, an improvement of algorithm optimization based on topic words was proposed. Through enriching topic words to structure keywords library, the semantic distance and similarity of keywords were calculated on the basis of semantic retrieval framework. The improved algorithm was applied in the military case retrieval system at last, and then retrieval results were analyzed to detect performance. It is observed that the improved algorithm has a better improvement in both precision rate and recall rate of retrieval.

文章引用: 孙 镇 , 袁 辉 , 孙 泰 , 宫 政 , 赵 捷 , 汤 磊 (2013) 基于主题词的文本案例检索算法研究。 计算机科学与应用, 3, 354-359. doi: 10.12677/CSA.2013.38062

参考文献

[1] 严悦, 哈进兵 (2012) 利用ART神经网络优化相似案例匹配方法. 信息系统工程, 3, 70-74.

[2] 葛继科, 邱玉辉 (2009) 一种基于本体概念语义距离的服务相似度度量方法. 计算机科学, 6, 181-184.

[3] 王旭阳, 萧波 (2013) 基于概念关联度的只能检索研究. 计算机工程与设计, 4, 1415-1419.

[4] 杨健, 赵秦怡 (2008) 基于案例的推理技术研究进展与应用. 计算机工程与设计, 3, 710-713.

[5] 杨小平, 丁浩, 黄都培 (2003) 基于向量空间模型的中文信息检索技术研究. 计算机工程与应用, 15, 109-111.

[6] Budanitsky, A. and Hirst, G. (2006) Evaluating wordnet-based measures of semantic distance. Computational Linguistics, 32, 13-47.

分享
Top