﻿ 基于核主成分分析支持向量机的乳腺癌辅助诊断

# 基于核主成分分析支持向量机的乳腺癌辅助诊断Auxiliary Diagnosis of Breast Cancer Based on Kernel Principal Component Analysis Support Vector Machine

Abstract: Kernel principal component analysis (KPCA) was used to extract the feature factors of breast cancer. The principal components were obtained as support vector machine (SVM) feature vector to establish support vector machine model. The model parameters were selected and optimized re-spectively by PSO and GA. KPCA-PSO-SVM model and KPCA-GA-SVM model were constructed to classify the breast masses as malignant. The experimental results show that the KPCA-PSO-SVM model and KPCA-GA-SVM model both improve the classification accuracy and the operation speed compared with the PSO-SVM model and GA-SVM model, which shows that the principal component analysis support vector machine can be used in the auxiliary diagnosis of breast cancer and can provide strong decision-making support for the diagnosis of breast cancer in medical institutions.

1. 引言

2. SVM分类器

SVM的工作原理：以二分类为例，设样本集设为 $\left\{\left({x}_{i},{y}_{i}\right),i=1,2,\cdot \cdot \cdot ,l\right\}$ ，其中 ${x}_{i}\in {R}^{l}$ 表示输入变量， ${y}_{i}=\left\{-1,1\right\}$ 表示输出标量，l为样本集个数。通过引入非线性映射函数 $\phi \left(x\right)$ ，将处于低维空间的输入变量

$f\left(x\right)=w\cdot \phi \left(x\right)+b=0,$ (1)

${y}_{i}\left(\left(\omega \cdot \phi \left({x}_{i}\right)\right)+b\right)\ge 1,\text{\hspace{0.17em}}i=1,2,\cdot \cdot \cdot ,l.$ (2)

$\mathrm{min}\frac{1}{2}{‖\omega ‖}^{2}+C\underset{i=1}{\overset{l}{\sum }}{y}_{i}\left(\omega \cdot \phi \left({x}_{i}\right)+b\right)+{\xi }_{i}\ge 1,{\xi }_{i}\ge 0,\text{\hspace{0.17em}}i=1,2,\cdot \cdot \cdot ,l.$ (3)

$\begin{array}{l}\mathrm{min}\frac{1}{2}\underset{i,j=1}{\overset{l}{\sum }}{y}_{i}{y}_{j}{\alpha }_{i}{\alpha }_{j}\left(\phi \left({x}_{i}\right)\cdot \phi \left({x}_{j}\right)\right)-\underset{j=1}{\overset{l}{\sum }}{\alpha }_{j}\\ \text{s}\text{.t}\text{.}\text{\hspace{0.17em}}\underset{i=1}{\overset{l}{\sum }}{y}_{i}{\alpha }_{i}=0\\ \left(0\le {\alpha }_{i}\le C,i=1,2,\cdot \cdot \cdot ,l\right)\end{array}$ (4)

$\begin{array}{l}\mathrm{min}\frac{1}{2}\underset{i,j=1}{\overset{l}{\sum }}{y}_{i}{y}_{j}{\alpha }_{i}{\alpha }_{j}K\left({x}_{i},{x}_{j}\right)-\underset{j=1}{\overset{l}{\sum }}{\alpha }_{j}\\ \text{s}\text{.t}\text{.}\text{\hspace{0.17em}}\underset{i=1}{\overset{l}{\sum }}{y}_{i}{\alpha }_{i}=0\\ \left(0\le {\alpha }_{i}\le C,i=1,2,\cdot \cdot \cdot ,l\right)\end{array}$ (5)

$f\left(x\right)=\underset{i=1}{\overset{n}{\sum }}{\alpha }_{i}{y}_{i}K\left({x}_{i},x\right)+b=0$ (6)

SVM采用不同的核函数就会生成不同的SVM分类器，目前支持向量机常用的核函数主要有：径向

3. KPCA算法

$\stackrel{˜}{\Phi }\left({x}_{i}\right)=\Phi \left({x}_{i}\right)-\stackrel{¯}{\Phi }$ (7)

$\stackrel{¯}{C}=\frac{1}{n}{\sum }_{k=1}^{n}\stackrel{˜}{\Phi }\left({x}_{k}\right)\stackrel{˜}{\Phi }{\left({x}_{k}\right)}^{\text{T}}=\frac{1}{n}\stackrel{˜}{\Phi }\left(X\right)\stackrel{˜}{\Phi }{\left(X\right)}^{\text{T}}$ (8)

$\stackrel{¯}{C}V=\lambda V$ (9)

${\alpha }_{k}={\left({\alpha }_{k1},\cdot \cdot \cdot ,{\alpha }_{kn}\right)}^{\text{T}}$ 为中心化的核矩阵 $\stackrel{˜}{K}$ 的第K个特征向量(对应的特征值为 ${\stackrel{˜}{\lambda }}_{k}$ )。则有 ${\lambda }_{k}=\frac{{\stackrel{˜}{\lambda }}_{k}}{n},{v}_{k}=\stackrel{˜}{\Phi }{\alpha }_{k}$

${\alpha }_{k}$ 进行归一化处理，则有：

$‖{\alpha }_{k}‖=\frac{1}{\sqrt{{\stackrel{˜}{\lambda }}_{k}}}$ (10)

${v}_{k}^{\text{T}}\stackrel{˜}{\Phi }\left(x\right)=\underset{i}{\overset{n}{\sum }}\frac{{\alpha }_{ki}}{\sqrt{{\stackrel{˜}{\lambda }}_{k}}}\left[\stackrel{˜}{\Phi }{\left({x}_{i}\right)}^{\text{T}}\cdot \stackrel{˜}{\Phi }\left(x\right)\right]=\underset{i}{\overset{n}{\sum }}\frac{{\alpha }_{ki}}{\sqrt{{\stackrel{˜}{\lambda }}_{k}}}\stackrel{˜}{K}\left({x}_{i},x\right)$ (11)

${t}_{k}={v}_{k}^{\text{T}}\stackrel{˜}{\Phi }\left(x\right)=\underset{i}{\overset{n}{\sum }}\frac{{\alpha }_{ki}}{\sqrt{{\stackrel{˜}{\lambda }}_{k}}}\stackrel{˜}{K}\left({x}_{i},x\right)$ (12)

4. KPCA-SVM模型

4.1. 详细流程如下

4.2. KPCA-SVM模型的介绍

KPCA-PSO-SVM模型是先用KPCA算法对乳腺癌数据进行特征提取，提取原始数据中的非线性的特征信息，这些特征向量去除了原始数据中的噪声，数据质量有所提高。将KPCA算法提取出来的特征信息作为支持向量机的特征向量，SVM基于这些特征向量进行建模，在建立模型的过程中采用PSO算法对SVM的参数进行迭代寻优。KPCA-GA-SVM模型与KPCA-PSO-SVM模型类似，只是KPCA-GA-SVM模型是采用GA算法对SVM的参数进行迭代寻优。

5. 仿真实验

5.1. 数据源

5.2. 实验设置

5.3. 实验结果分析

Table 1. Comparison of experimental results

Table 2. Comparison of experimental results

$Sen=\frac{TP}{TP+FN}$ (13)

$Spe=\frac{TN}{TN+FP}$ (14)

$F\text{-score}=2×\frac{TP}{TP+FP}×\frac{TP}{TP+FN}÷\left(\frac{TP}{TP+FP}+\frac{TP}{TP+FN}\right)$ (15)

SVM、KPCA-PSO-SVM和KPCA-GA-SVM分类测试结果如表所示，其中准确率指测试数据的平均分类准确率。

6. 结语

NOTES

*通讯作者。

[1] 彭建兵, 焦莉. 基于极小T-不变量增加的Petri网可达性分析[J]. 计算机应用研究, 2010, 27(10): 3798-3802.

[2] 江志斌. Petri网及其在制造系统建模与控制中的应用[M]. 北京: 机械工业出版社, 2004.

[3] 白杨, 朱金福. 基于随机Petri网的航空货运出港系统分析[J]. 数理统计与管理, 2012, 31(2): 199-206.

[4] 刘兴华, 蔡从中, 袁前飞, 肖汉光, 孔春阳. 基于支持向量机的乳腺癌辅助诊断[J]. 重庆大学学报(自然科学版): 2007, 30(6): 140-144.

[5] 章永来, 史海波, 尚文利, 周晓锋, 纪晓楠. 面向乳腺癌辅助诊断的改进支持向量机方法[J]. 计算机应用研究, 2013, 30(8): 2373-2376.

[6] Vapnik, V.N. (1997) The Nature of Statistical Learning Theory. IEEE Transactions on Neural Networks, 8, 1564.
https://doi.org/10.1109/TNN.1997.641482

[7] 刘洛霞. 基于SVM的多变量函数回归分析研究[J]. 电光与控制, 2013, 20(6): 50-57.

[8] Huanrui, H. (2016) New Mixed Kernel Functions of SVM Used in Pattern Recognition. Cybernetics & Information Technologies, 16, 5-14.
https://doi.org/10.1515/cait-2016-0047

[9] Korpela, J., Miyaji, R., Maekawa, T., Nozaki, K. and Tamagawa, H. (2016) Toothbrushing Performance Evaluation Using Smartphone Audio Based on Hybrid HMM-Recognition/SVM-Regression Model. Journal of Information Processing, 24, 302-313.
https://doi.org/10.2197/ipsjjip.24.302

[10] 彭令, 牛瑞卿, 赵艳南, 邓清禄. 基于核主成分分析和粒子群优化支持向量机的滑坡位移预测[J]. 武汉大学学报(信息科学版), 2013, 38(2): 148-152.

[11] 宋晖, 薛云, 张良均. 基于SVM分类问题的核函数选择仿真研究[J]. 计算机与现代化, 2011, 2011(8): 133-136.

[12] 袁前飞, 蔡从中, 肖汉光, 刘兴华, 孔春阳. 基于支持向量机的乳腺癌预后状态预测和疗效评估[J]. 北京生物医学工程, 2007, 26(4): 372-376.

Top