计算机科学与应用

Vol.6 No.9 (September 2016)

K-Means算法的研究分析及改进
Research on K-Means Algorithm Analysis and Improvement

 

作者:

藏传宇 , 沈勇 , 张宇昊 , 陈长庚 , 张浩 , 杨真谛 :云南大学软件学院,云南 昆明

 

关键词:

机器学习聚类分析K-Means算法p-K-means算法Machine Learning Cluster Analysis K-Means Algorithm p-K-Means Algorithm

 

摘要:

传统的k-means算法采用的是随机数初始化聚类中心的方法,这种方法的主要优点是能够快速的产生初始化的聚类中心,其主要缺点是初始化的聚类中心可能会同时出现在同一个类别中,导致迭代次数过多,甚至陷入局部最优出现错误的聚类结果。针对传统的k-means算法初始聚类中心的缺点,本文提出了p-K-means算法,该算法采用了数学几何距离的方法改进k-means算法中初始聚类中心分布不均匀的现象多个聚类中心出现在同一类簇中的现象,这种方法能避免k-means聚类算法聚类过程中陷入局部最优,另一方面降低了聚类过程中的反复迭代次数。本文通过实验的方式来对两个算法进行分析比较后发现改进的算法在收敛速度上优于传统k-means算法,也不容易陷入局部最优。

Traditional k-means algorithm uses a random number to initialize the cluster center, the main advantage of this method is the ability to quickly produce cluster center initialization, its main drawback is initializing cluster centers may appear in the same a category, leading to excessive iterations, errors and even local optimum clustering result. For the shortcomings of traditional k-means algorithm initial cluster centers, this paper presents the pK-means algorithm, which uses a mathematical geometric distance method for improving the k-means clustering phenomenon of multiple algorithms initial cluster centers unevenly distributed Center appear in the same class cluster phenomenon, this approach avoids k-means clustering algorithm clustering process into local optimization, on the other hand reduces the clustering process repeated iterations. After analyzing and comparing two algorithm experimentally, the article found that the improved algo-rithm is better than the traditional k-means algorithm converges quickly, not easy to fall into local optimum.

文章引用:

藏传宇 , 沈勇 , 张宇昊 , 陈长庚 , 张浩 , 杨真谛 (2016) K-Means算法的研究分析及改进。 计算机科学与应用, 6, 551-564. doi: 10.12677/CSA.2016.69069

 

参考文献

分享
Top