﻿ 基于拉普拉斯约束的半监督模糊C均值算法

# 基于拉普拉斯约束的半监督模糊C均值算法Semi-Supervised Fuzzy C-Means Algorithm Based on Laplace Constraint

Abstract: As one of the classical unsupervised algorithms, fuzzy clustering algorithm is easy to fall into local optimum without providing prior information. In order to combine supervised learning with unsupervised learning and use both labeled and unlabeled data for training learning, this paper proved the effectiveness of the algorithm through Laplace constraint on the objective function and verification that the range of membership is always greater than or equal to zero. On this basis, prior information is added to mine a lot of useful information, so that the algorithm can reasonably and effectively use the category information of part of the identified samples to affect the unidentified samples, so as to improve the clustering performance of the semi-clustering algorithm. Finally, the two improved algorithms proposed in this paper are compared with the original Fuzzy C-means (FCM) for clustering index, and the results show that the proposed algorithm has good clustering effect.

1. 引言

2. 基于拉普拉斯约束的模糊c均值(FCML)

2.1. 目标函数设计

$\mathrm{min}\underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{‖{x}_{i}-{v}_{j}‖}_{2}^{2}{u}_{ij}^{2}$

$\text{st}\text{.}\text{\hspace{0.17em}}\underset{i=1}{\overset{c}{\sum }}{u}_{ij}=1$ (1)

$\mathrm{min}\underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{‖{x}_{i}-{v}_{j}‖}_{2}^{2}{u}_{ij}^{2}+2\lambda Tr\left({U}^{\text{T}}{L}_{S}U\right)$ 2)

2.2. 理论分析

US固定时，更新V可得到

${V}_{j}=\frac{\underset{i=1}{\overset{n}{\sum }}{u}_{ij}^{2}{x}_{i}}{\underset{i=1}{\overset{n}{\sum }}{u}_{ij}^{2}}$ (3)

VS固定时，更新U，由目标函数可知与U有关的函数为：

$\mathrm{min}\underset{i=1}{\overset{c}{\sum }}\underset{j=1}{\overset{n}{\sum }}{u}_{ij}^{2}{d}_{ij}^{2}+2\lambda Tr\left(U{L}_{S}{U}^{\text{T}}\right)$ (4)

$2Tr\left(U{L}_{S}{U}^{\text{T}}\right)=\underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{n}{\sum }}{‖{u}_{i}-{u}_{j}‖}_{F}^{2}{s}_{ij}$ (5)

${U}^{\text{T}}{L}_{S}U={U}^{\text{T}}\left(D-S\right)U={U}^{\text{T}}DU-{U}^{\text{T}}SU$

$\begin{array}{c}2Tr\left({U}^{\text{T}}{L}_{S}U\right)=\underset{i=1}{\overset{n}{\sum }}{u}_{i}^{2}{d}_{i}-\underset{i,j=1}{\overset{n}{\sum }}{u}_{i}{u}_{j}{s}_{ij}=\frac{1}{2}\left(\underset{i=1}{\overset{n}{\sum }}{u}_{i}^{2}{d}_{i}-2\underset{i,j=1}{\overset{n}{\sum }}{u}_{i}{u}_{j}{s}_{ij}+\underset{i=1}{\overset{n}{\sum }}{u}_{j}^{2}{d}_{j}\right)\\ =\underset{i,j=1}{\overset{n}{\sum }}{‖{u}_{i}-{u}_{j}‖}_{F}^{2}{s}_{ij}\end{array}$

$J=\mathrm{min}\underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{‖{x}_{i}-{v}_{j}‖}_{2}^{2}{u}_{ij}^{2}+\lambda \underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{‖{u}_{i}-{u}_{j}‖}_{F}^{2}{s}_{ij}+\mu \underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{‖{x}_{i}-{x}_{j}‖}_{2}^{2}{s}_{ij}^{2}$ (6)

${J}_{u}=\mathrm{min}\underset{i=1}{\overset{c}{\sum }}\underset{j=1}{\overset{n}{\sum }}{u}_{ij}^{2}{d}_{ij}^{2}+\lambda \underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{n}{\sum }}{‖{u}_{i}-{u}_{j}‖}_{F}^{2}{s}_{ij}$ (7)

$\begin{array}{l}{u}_{ij}^{2}{d}_{ij}^{2}+2\lambda \underset{l\ne j}{\sum }{‖{u}_{l}-{u}_{j}‖}_{F}^{2}{s}_{lj}\\ ={u}_{ij}^{2}{d}_{ij}^{2}+2\lambda \left({‖{u}_{1}-{u}_{j}‖}_{F}^{2}{s}_{1j}+\cdots +{‖{u}_{j-1}-{u}_{j}‖}_{F}^{2}{s}_{j-1,j}+{‖{u}_{j+1}-{u}_{j}‖}_{F}^{2}{s}_{j+1,j}+\cdots +{‖{u}_{n}-{u}_{j}‖}_{F}^{2}{s}_{nj}\right)\end{array}$ (8)

${u}_{ij}^{2}{d}_{ij}^{2}+2\lambda \underset{l\ne j}{\sum }{\left({u}_{ij}-{u}_{il}\right)}^{2}{s}_{lj}$ (9)

$2{u}_{ij}{d}_{ij}^{2}+4\lambda \underset{l\ne j}{\sum }{u}_{ij}{s}_{lj}-4\lambda \underset{l\ne j}{\sum }{u}_{il}{s}_{lj}-{\xi }_{j}=0$

${u}_{ij}=\frac{4\lambda \underset{l\ne j}{\sum }{u}_{il}{s}_{lj}+{\xi }_{j}}{2{d}_{ij}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}$ (10)

$\underset{k=1}{\overset{c}{\sum }}{u}_{kj}=\underset{k=1}{\overset{c}{\sum }}\frac{4\lambda \underset{l\ne j}{\sum }{u}_{kl}{s}_{lj}+{\xi }_{j}}{2{d}_{kj}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}=1$ (11)

${\xi }_{j}=\frac{1-\underset{k=1}{\overset{c}{\sum }}\frac{4\lambda \underset{l\ne j}{\sum }{u}_{kl}{s}_{lj}}{2{d}_{kj}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}}{\underset{k=1}{\overset{c}{\sum }}\frac{1}{2{d}_{kj}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}}$ (12)

${u}_{ij}=\frac{4\lambda \underset{l\ne j}{\sum }{u}_{il}{s}_{lj}+\frac{1-\underset{k=1}{\overset{c}{\sum }}\frac{4\lambda \underset{l\ne j}{\sum }{u}_{kl}{s}_{lj}}{2{d}_{kj}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}}{\underset{k=1}{\overset{c}{\sum }}\frac{1}{2{d}_{kj}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}}}{2{d}_{ij}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}$ (13)

$\underset{k=1}{\overset{c}{\sum }}\frac{4\lambda \underset{l\ne j}{\sum }{u}_{kl}{s}_{lj}}{2{d}_{kj}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}\le 1$

$\frac{4\lambda \underset{l\ne j}{\sum }{u}_{kl}{s}_{lj}}{2{d}_{kj}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}\le \frac{4\lambda \underset{l\ne j}{\sum }{u}_{kl}{s}_{lj}}{4\lambda \underset{l\ne j}{\sum }{s}_{lj}}=\frac{\underset{l\ne j}{\sum }{u}_{kl}{s}_{lj}}{\underset{l\ne j}{\sum }{s}_{lj}}$

$\underset{k=1}{\overset{c}{\sum }}\frac{4\lambda \underset{l\ne j}{\sum }{u}_{kl}{s}_{lj}}{2{d}_{kj}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}\le \underset{k=1}{\overset{c}{\sum }}\frac{\underset{l\ne j}{\sum }{u}_{kl}{s}_{lj}}{\underset{l\ne j}{\sum }{s}_{lj}}$

$\begin{array}{c}\underset{k=1}{\overset{c}{\sum }}\frac{\underset{l\ne j}{\sum }{u}_{kl}{s}_{lj}}{\underset{l\ne j}{\sum }{s}_{lj}}=\underset{k=1}{\overset{c}{\sum }}\frac{{u}_{k1}{s}_{1j}+{u}_{k2}{s}_{2j}+\cdots +{u}_{k,j-1}{s}_{j-1,j}+{u}_{k,j+1}{s}_{j+1,j}+\cdots +{u}_{kn}{s}_{nj}}{{s}_{1j}+{s}_{2j}+\cdots +{s}_{j-1,j}+{s}_{j+1,j}+\cdots +{s}_{nj}}\\ =\frac{{s}_{1j}\underset{k=1}{\overset{c}{\sum }}{u}_{k1}+{s}_{2j}\underset{k=1}{\overset{c}{\sum }}{u}_{k2}+\cdots +{s}_{j-1,j}\underset{k=1}{\overset{c}{\sum }}{u}_{k,j-1}+{s}_{j+1,j}\underset{k=1}{\overset{c}{\sum }}{u}_{k,j+1}+\cdots +{s}_{nj}\underset{k=1}{\overset{c}{\sum }}{u}_{kn}}{{s}_{1j}+{s}_{2j}+\cdots +{s}_{j-1,j}+{s}_{j+1,j}+\cdots +{s}_{nj}}\end{array}$ (14)

$\lambda Tr\left({U}^{\text{T}}{L}_{S}U\right)+\mu \underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{‖{x}_{i}-{x}_{j}‖}_{2}^{2}{s}_{ij}^{2}=\lambda \underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{‖{u}_{i}-{u}_{j}‖}_{2}^{2}{s}_{ij}+\mu \underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{‖{x}_{i}-{x}_{j}‖}_{2}^{2}{s}_{ij}^{2}$ (15)

${s}_{i}^{\text{T}}D{s}_{i}+\frac{\lambda }{\mu }{s}_{i}^{\text{T}}{V}_{i}$ (16)

2.3. 算法流程

3. 基于拉普拉斯约束的半监督模糊c均值算法(SFCML)

3.1. 目标函数设计

$\begin{array}{l}J=\mathrm{min}\underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{‖{x}_{i}-{v}_{j}‖}_{2}^{2}{u}_{ij}^{2}+\lambda \underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{‖{u}_{i}-{u}_{j}‖}_{F}^{2}{s}_{ij}+\alpha \underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{\left({u}_{ij}-{f}_{ij}{b}_{i}\right)}^{2}{d}_{ij}^{2}\\ \text{st}\text{.}\text{\hspace{0.17em}}\underset{i=1}{\overset{n}{\sum }}{u}_{ij}=1,{u}_{ij}\ge 0\end{array}$ (17)

$\left\{\begin{array}{l}{b}_{i}=1,{x}_{i}被标记\\ {b}_{i}=0,其他\end{array}$ (18)

3.2. 理论分析

US固定时，更新V可得到

${v}_{j}=\frac{\underset{i=1}{\overset{n}{\sum }}{u}_{ij}^{2}{x}_{i}}{\underset{i=1}{\overset{n}{\sum }}{u}_{ij}^{2}}$ (19)

VS固定时，更新U，由目标函数可知与U有关的函数为：

$J=\mathrm{min}\underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{‖{x}_{i}-{v}_{j}‖}_{2}^{2}{u}_{ij}^{2}+\lambda \underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{‖{u}_{i}-{u}_{j}‖}_{F}^{2}{s}_{ij}+\alpha \underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{\left({u}_{ij}-{f}_{ij}{b}_{i}\right)}^{2}{d}_{ij}^{2}$ (20)

$\begin{array}{l}{u}_{ij}^{2}{d}_{ij}^{2}+2\lambda \underset{l\ne j}{\sum }{‖{u}_{l}-{u}_{j}‖}_{F}^{2}{s}_{lj}+\alpha \underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{\left({u}_{ij}-{f}_{ij}{b}_{i}\right)}^{2}{d}_{ij}^{2}\\ ={u}_{ij}^{2}{d}_{ij}^{2}+2\lambda \left({‖{u}_{1}-{u}_{j}‖}_{F}^{2}{s}_{1j}+\cdots +{‖{u}_{j-1}-{u}_{j}‖}_{F}^{2}{s}_{j-1,j}+{‖{u}_{j+1}-{u}_{j}‖}_{F}^{2}{s}_{j+1,j}+\cdots +{‖{u}_{n}-{u}_{j}‖}_{F}^{2}{s}_{nj}\right)\\ \text{}+\alpha \left({\left({u}_{11}-{f}_{11}{b}_{1}\right)}^{2}{d}_{11}^{2}+\cdots +{\left({u}_{1j}-{f}_{1j}{b}_{1}\right)}^{2}{d}_{1j}^{2}+\cdots +{\left({u}_{ij}-{f}_{ij}{b}_{i}\right)}^{2}{d}_{ij}^{2}+\cdots +{\left({u}_{nc}-{f}_{nc}{b}_{n}\right)}^{2}{d}_{nc}^{2}\right)\end{array}$ (21)

${u}_{ij}^{2}{d}_{ij}^{2}+2\lambda \underset{l\ne j}{\sum }{\left({u}_{ij}-{u}_{il}\right)}^{2}{s}_{lj}+\alpha \underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{\left({u}_{ij}-{f}_{ij}{b}_{i}\right)}^{2}{d}_{ij}^{2}$ (22)

$L\left({u}_{ij},\eta ,{\beta }_{ij}\right)={u}_{ij}^{2}{d}_{ij}^{2}+2\lambda \underset{l\ne j}{\sum }{\left({u}_{ij}-{u}_{il}\right)}^{2}{s}_{lj}+\alpha \underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{c}{\sum }}{\left({u}_{ij}-{f}_{ij}{b}_{i}\right)}^{2}{d}_{ij}^{2}-\eta \left(\underset{i=1}{\overset{n}{\sum }}{u}_{ij}-1\right)-{\beta }_{ij}{u}_{ij}$ (23)

$\frac{\partial {J}_{u}}{\partial u}=2{u}_{ij}{d}_{ij}^{2}+4\lambda \underset{l\ne j}{\sum }\left({u}_{ij}-{u}_{il}\right){s}_{lj}+2\alpha \left({u}_{ij}-{f}_{ij}{b}_{i}\right){d}_{ij}^{2}-\eta -{\beta }_{ij}$

$2{u}_{ij}{d}_{ij}^{2}+4\lambda \underset{l\ne j}{\sum }{u}_{ij}{s}_{lj}-4\lambda \underset{l\ne j}{\sum }{u}_{il}{s}_{lj}-2\alpha \left({u}_{ij}-{f}_{ij}{b}_{i}\right){d}_{ij}^{2}-\eta -{\beta }_{ij}=0$

${u}_{ij}={\left(\frac{4\lambda \underset{l\ne j}{\sum }{u}_{il}{s}_{lj}-2\alpha {f}_{ij}{b}_{i}{d}_{ij}^{2}+\eta +{\beta }_{ij}}{2{d}_{ij}^{2}-2\alpha {d}_{ij}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}\right)}_{+}$ (24)

$\underset{k=1}{\overset{c}{\sum }}{u}_{kj}=\underset{k=1}{\overset{c}{\sum }}\frac{4\lambda \underset{l\ne j}{\sum }{u}_{kl}{s}_{lj}-2\alpha {f}_{kj}{b}_{k}{d}_{kj}^{2}+{\xi }_{j}}{2{d}_{kj}^{2}-2\alpha {d}_{kj}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}=1$ (25)

${\xi }_{j}=\frac{1-\underset{k=1}{\overset{c}{\sum }}\frac{4\lambda \underset{l\ne j}{\sum }{u}_{kl}{s}_{lj}}{2{d}_{kj}^{2}-2\alpha {d}_{kj}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}+\underset{k=1}{\overset{c}{\sum }}\frac{2\alpha {f}_{kj}{b}_{k}{d}_{kj}^{2}}{2{d}_{kj}^{2}-2\alpha {d}_{kj}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}}{\underset{k=1}{\overset{c}{\sum }}\frac{1}{2{d}_{kj}^{2}-2\alpha {d}_{kj}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}}$ (26)

${u}_{ij}=\frac{4\lambda \underset{l\ne j}{\sum }{u}_{il}{s}_{lj}-2\alpha {f}_{ij}{b}_{i}{d}_{ij}^{2}+\frac{1-\underset{k=1}{\overset{c}{\sum }}\frac{4\lambda \underset{l\ne j}{\sum }{u}_{kl}{s}_{lj}}{2{d}_{kj}^{2}-2\alpha {d}_{kj}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}+\underset{k=1}{\overset{c}{\sum }}\frac{2\alpha {f}_{kj}{b}_{k}{d}_{kj}^{2}}{2{d}_{kj}^{2}-2\alpha {d}_{kj}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}}{\underset{k=1}{\overset{c}{\sum }}\frac{1}{2{d}_{kj}^{2}-2\alpha {d}_{kj}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}}}{2{d}_{ij}^{2}-2\alpha {d}_{ij}^{2}+4\lambda \underset{l\ne j}{\sum }{s}_{lj}}$ (27)

3.3. 算法2

4. 实验结果分析

4.1. 实验设计

$ACC=\frac{\underset{i=1}{\overset{n}{\sum }}\delta \left(map\left({r}_{i}\right),{q}_{i}\right)}{n}$

NMI指标用于确定聚类的质量，给定一个聚类结果，则

$NMI=\frac{\underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{n}{\sum }}{n}_{ij}\mathrm{lg}\frac{{n}_{ij}}{{n}_{i}{\stackrel{^}{n}}_{j}}}{\sqrt{\left(\underset{i=1}{\overset{n}{\sum }}{n}_{i}\mathrm{lg}\frac{{n}_{i}}{n}\right)\left(\underset{j=1}{\overset{n}{\sum }}{\stackrel{^}{n}}_{j}\mathrm{lg}\frac{{\stackrel{^}{n}}_{j}}{n}\right)}}$

4.2. 真实数据集实验结果

Table 1. Accuracy of FCM, FCML and SFCML algorithms on Iris data set

Table 2. Clustering effect comparison of FCM, FCML and SFCML on Iris data set

5. 总结与展望

[1] Johnson, S.C. (1967) Hierarchical Clustering Schemes. Psychometrika, 32, 241-254.
https://doi.org/10.1007/BF02289588

[2] Ng, A.Y., Jordan, M.I. and Weiss, Y. (2002) On Spectral Clustering: Analysis and an Algorithm. The Conference and Workshop on Neural Information Processing Systems, Vol. 14, 849-856.

[3] Bezdek, J.C. (1981) Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York.
https://doi.org/10.1007/978-1-4757-0450-1

[4] Wu, L., Hoi, S., Jin, R., et al. (2010) Learning Bregman Distance Functions for Semi-Supervised Clustering. IEEE Transactions on Knowledge & Data Engineering, 24, 478-491.
https://doi.org/10.1109/TKDE.2010.215

[5] 白福均, 高建瓴, 宋文慧, 等. 半监督模糊聚类算法的研究与改进[J]. 通信技术, 2018, 317(5): 71-75.

[6] Brodinova, S., Filzmoser, P., Ortner, T., et al. (2019) Robust and Sparse K-Means Clustering for High-Dimensional Data. Advances in Data Analysis & Classification, 13, 905-932.
https://doi.org/10.1007/s11634-019-00356-9

[7] 朱乐为. 模糊C-means聚类算法的拓展研究[J]. 云南民族大学学报(自然科学版), 2019, 28(3): 64-70.

[8] Pedrycz, W. and Waletzky, J. (1997) Fuzzy Clustering with Partial Supervision. IEEE Transactions on Systems Man and Cybernetics Part B—Cybernetics, 27, 787-795.
https://doi.org/10.1109/3477.623232

[9] Tari, L., Baral, C. and Kim, S. (2009) Fuzzy c-Means Clustering with Prior Biological Knowledge. Journal of Biomedical Informatics, 42, 74-81.
https://doi.org/10.1016/j.jbi.2008.05.009

[10] Zhang, H.X. and Lu, J. (2009) Semi-Supervised Fuzzy Clustering: A Kernel-Based Approach. Knowledge-Based Systems, 22, 477-481.
https://doi.org/10.1016/j.knosys.2009.06.009

[11] Zhang, D.Q., Zhou, Z.H. and Chen, S.C. (2007) Semi-Supervised Dimensionality Reduction. Proceedings of the Seventh Siam International Conference on Data Mining, Minneapolis, 26-28 April 2007, 629-634.
https://doi.org/10.1137/1.9781611972771.73

[12] Zhang, R., Nie, F. and Li, X. (2017) Self-Weighted Spectral Clustering with Parameter-Free Constraint. Neurocomputing, 241, 164-170.
https://doi.org/10.1016/j.neucom.2017.01.085

[13] Zhang, R., Nie, F., Guo, M., et al. (2019) Joint Learning of Fuzzy k-Means and Nonnegative Spectral Clustering with Side Information. IEEE Transactions on Image Processing, 28, 2152-2162.
https://doi.org/10.1109/TIP.2018.2882925

[14] Wang, D., Nie, F. and Huang, H. (2015) Feature Selection via Global Redundancy Minimization. IEEE Transactions on Knowledge & Data Engineering, 27, 2743-2755.
https://doi.org/10.1109/TKDE.2015.2426703

[15] 李龙龙, 何东健, 王美丽. 模糊半监督加权聚类算法的有效性评价研究[J]. 计算机技术与发展, 2016, 26(6): 65-68.

[16] Li, L.L., Jonathan, G., He, D.J., et al. (2015) Semi-Supervised Fuzzy Clustering with Feature Discrimination. PLoS ONE, 10, 131-160.
https://doi.org/10.1371/journal.pone.0131160

[17] 郭新辰, 郗仙田, 樊秀玲, 等. 基于半监督的模糊C-均值聚类算法[J]. 吉林大学学报: 理学版, 2015, 53(4): 705-709.

Top