基于流行距离的聚类算法及其在极光分类中的应用
Clustering Algorithm and Its Application in the Classification of Aurora Based on the Manifold Distance

作者: 孙羊子 , 王晅 :陕西师范大学物理学与信息技术学院,陕西 西安;

关键词: 谱聚类K-Means算法流行距离拉普拉斯矩阵Spectral Clustering (SC) K-Means Algorithm Manifold Laplace Matrix

摘要: 本文提出了一种新的基于流行距离的谱聚类算法,这是一种新型的聚类分析算法。不仅能够对任意的非规则形状的样本空间进行聚类,而且能获得全局最优解。文章以聚类算法的相似性度量作为切入点,对传统的相似性测度方法进行改进,将传统谱聚类算法(NJW-SC)中的基于欧氏距离的相似性测度换为基于流行距离的相似性测度,在此基础上对样本对象集进行聚类。之后将新提出来的算法同K-Means算法、传统谱聚类算法、模糊C均值聚类算法在人工数据集上进行实验对比,得出新的算法在非凸形状的数据集和在全局一致性上取得了较好的效果。在UCI数据集上用人工评价指标F-measure对聚类质量进行评价,发现其也优于其他方法。在通过实验数据验证后,我将谱聚类算法应用在实际的数据中,看其是否能取得良好的效果。查阅资料,最终选取了极光图像,通过对极光图像的分类验证了谱聚类算法在极光分类中也有很好的应用。

Abstract: This paper presents a new Spectral clustering analysis algorithm based on the unsupervised learning. Spectral clustering algorithm has its own unique advantage. For example, it can be clustered in any irregular shape of the sample space, but also be obtained the optimal solution in the global. The article prefers to use the clustering algorithm of the similarity measure as the breakthrough point to improve the traditional similarity measure. I use the manifold distance as the similarity measure instead of the Euclidean distance on the basis of the traditional spectral clustering algorithm (NJW-SC). On the basis the object set and the sample clustering can be clus-tered. After I set the experimental comparison with the new algorithm and K-means algorithm, traditional spectral clustering algorithm (NJW-SC), the fuzzy clustering algorithm (FCM) on artificial data set, it can be concluded that the new algorithm has been achieved good results in the convex shape of the data sets and on the global consistency. On UCI data sets, I tried to use the artificial labeling evaluation index F-measure numerical calculation to carry out on the clustering quality. At last, I chose the aurora images and tried to use them to verify that spectral clustering algorithm also had very good application in the aurora classification.

文章引用: 孙羊子 , 王晅 (2016) 基于流行距离的聚类算法及其在极光分类中的应用。 计算机科学与应用, 6, 303-316. doi: 10.12677/CSA.2016.65037

参考文献

[1] Jain, A.K., Murty, M.N. and Flynn, P.J. (1999) Data Clustering: A Review. ACM Computing Surveys, 31, 264-323.
http://dx.doi.org/10.1145/331499.331504

[2] Duda, R.O., Hart, P.E. and Stork, D.G. (2001) Pattern Classification. 2nd Edition, John Wiley & Sons, New York.

[3] Che, W.F. and Feng, G.C. (2012) Spectral Clustering: A Semi-Supervised Approach. Neuro Computing, 77, 119-228.

[4] Zhao, F., Liu, H. and Jiao, L. (2011) Spectral Clus-tering with Fuzzy Similarity Measure. Digital Signal Processing, 21, 56-63.
http://dx.doi.org/10.1016/j.dsp.2011.07.002

[5] Alzate, C., Johan, A. and Suykens, K. (2012) Hierarchical Kernel Spectral Clustering. Pattern Recognition, 35, 24-35.

[6] Zhang, X., Jiao, L., Liu, F., et al. (2008) Spectral Clustering Ensemble Applied to SAR Image Segmentation. IEEE Transactions on Geosciences and Remote Sensing, 46, 2126-2136.

[7] Fiedler, M. (1975) A Property of Eigenvectors of Non-Negative Symmetric Matrices and Its Appli-cation to Graph Theory. Czechoslovak Mathematical Journal, 25, 619-633.

[8] 贾建华. 谱聚类集成算法研究[M]. 天津: 天津大学出版社, 2011.

[9] Geng, X., Zhan, D.C. and Zhou, Z.H. (2005) Supervised Nonlinear Dimensionality Reduction for Visualization and Classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cy-bernetics, 35, 1098-1107.
http://dx.doi.org/10.1109/TSMCB.2005.850151

分享
Top