Improved TF-IDF Keyword Extraction Algorithm
Abstract: According to the TF-IDF extract algorithm, this paper proposes a new extraction algorithm based on the words frequency statistics. Combining with sections mark technology, this algorithm assigns corresponding position weight to the words located in different position and calculates the words similarities with the same parts of speech which have a high counter in the result of the word segmentation, then merge the words with a higher similarity, finally we get the keyword sorted by the weight via the TF-IWF algorithm. This method optimized the traditional Chinese keyword extract algorithm, which take little notice of the higher similarity words, and lead to low-accuracy. The results show the new approach has better algorithm performance compared with the previous TF-IDF algorithm and the keywords set extracted can generally express the content of the article.
文章引用: 王小林 , 杨 林 , 王 东 , 镇丽华 (2013) 改进的TF-IDF关键词提取方法。 计算机科学与应用， 3， 64-68. doi: 10.12677/CSA.2013.31012
Copyright © 2020 Hans Publishers Inc. All rights reserved.