# 基于核主成分分析支持向量机的乳腺癌辅助诊断Auxiliary Diagnosis of Breast Cancer Based on Kernel Principal Component Analysis Support Vector Machine

Abstract: Kernel principal component analysis (KPCA) was used to extract the feature factors of breast cancer. The principal components were obtained as support vector machine (SVM) feature vector to establish support vector machine model. The model parameters were selected and optimized re-spectively by PSO and GA. KPCA-PSO-SVM model and KPCA-GA-SVM model were constructed to classify the breast masses as malignant. The experimental results show that the KPCA-PSO-SVM model and KPCA-GA-SVM model both improve the classification accuracy and the operation speed compared with the PSO-SVM model and GA-SVM model, which shows that the principal component analysis support vector machine can be used in the auxiliary diagnosis of breast cancer and can provide strong decision-making support for the diagnosis of breast cancer in medical institutions.

1. 引言

2. SVM分类器

SVM的工作原理：以二分类为例，设样本集设为 $\left\{\left({x}_{i},{y}_{i}\right),i=1,2,\cdot \cdot \cdot ,l\right\}$ ，其中 ${x}_{i}\in {R}^{l}$ 表示输入变量， ${y}_{i}=\left\{-1,1\right\}$ 表示输出标量，l为样本集个数。通过引入非线性映射函数 $\phi \left(x\right)$ ，将处于低维空间的输入变量

$f\left(x\right)=w\cdot \phi \left(x\right)+b=0,$ (1)

${y}_{i}\left(\left(\omega \cdot \phi \left({x}_{i}\right)\right)+b\right)\ge 1,\text{\hspace{0.17em}}i=1,2,\cdot \cdot \cdot ,l.$ (2)

$\mathrm{min}\frac{1}{2}{‖\omega ‖}^{2}+C\underset{i=1}{\overset{l}{\sum }}{y}_{i}\left(\omega \cdot \phi \left({x}_{i}\right)+b\right)+{\xi }_{i}\ge 1,{\xi }_{i}\ge 0,\text{\hspace{0.17em}}i=1,2,\cdot \cdot \cdot ,l.$ (3)

$\begin{array}{l}\mathrm{min}\frac{1}{2}\underset{i,j=1}{\overset{l}{\sum }}{y}_{i}{y}_{j}{\alpha }_{i}{\alpha }_{j}\left(\phi \left({x}_{i}\right)\cdot \phi \left({x}_{j}\right)\right)-\underset{j=1}{\overset{l}{\sum }}{\alpha }_{j}\\ \text{s}\text{.t}\text{.}\text{\hspace{0.17em}}\underset{i=1}{\overset{l}{\sum }}{y}_{i}{\alpha }_{i}=0\\ \left(0\le {\alpha }_{i}\le C,i=1,2,\cdot \cdot \cdot ,l\right)\end{array}$ (4)

$\begin{array}{l}\mathrm{min}\frac{1}{2}\underset{i,j=1}{\overset{l}{\sum }}{y}_{i}{y}_{j}{\alpha }_{i}{\alpha }_{j}K\left({x}_{i},{x}_{j}\right)-\underset{j=1}{\overset{l}{\sum }}{\alpha }_{j}\\ \text{s}\text{.t}\text{.}\text{\hspace{0.17em}}\underset{i=1}{\overset{l}{\sum }}{y}_{i}{\alpha }_{i}=0\\ \left(0\le {\alpha }_{i}\le C,i=1,2,\cdot \cdot \cdot ,l\right)\end{array}$ (5)

$f\left(x\right)=\underset{i=1}{\overset{n}{\sum }}{\alpha }_{i}{y}_{i}K\left({x}_{i},x\right)+b=0$ (6)

SVM采用不同的核函数就会生成不同的SVM分类器，目前支持向量机常用的核函数主要有：径向

3. KPCA算法

$\stackrel{˜}{\Phi }\left({x}_{i}\right)=\Phi \left({x}_{i}\right)-\stackrel{¯}{\Phi }$ (7)

$\stackrel{¯}{C}=\frac{1}{n}{\sum }_{k=1}^{n}\stackrel{˜}{\Phi }\left({x}_{k}\right)\stackrel{˜}{\Phi }{\left({x}_{k}\right)}^{\text{T}}=\frac{1}{n}\stackrel{˜}{\Phi }\left(X\right)\stackrel{˜}{\Phi }{\left(X\right)}^{\text{T}}$ (8)

$\stackrel{¯}{C}V=\lambda V$ (9)

${\alpha }_{k}={\left({\alpha }_{k1},\cdot \cdot \cdot ,{\alpha }_{kn}\right)}^{\text{T}}$ 为中心化的核矩阵 $\stackrel{˜}{K}$ 的第K个特征向量(对应的特征值为 ${\stackrel{˜}{\lambda }}_{k}$ )。则有 ${\lambda }_{k}=\frac{{\stackrel{˜}{\lambda }}_{k}}{n},{v}_{k}=\stackrel{˜}{\Phi }{\alpha }_{k}$

${\alpha }_{k}$ 进行归一化处理，则有：

$‖{\alpha }_{k}‖=\frac{1}{\sqrt{{\stackrel{˜}{\lambda }}_{k}}}$ (10)

${v}_{k}^{\text{T}}\stackrel{˜}{\Phi }\left(x\right)=\underset{i}{\overset{n}{\sum }}\frac{{\alpha }_{ki}}{\sqrt{{\stackrel{˜}{\lambda }}_{k}}}\left[\stackrel{˜}{\Phi }{\left({x}_{i}\right)}^{\text{T}}\cdot \stackrel{˜}{\Phi }\left(x\right)\right]=\underset{i}{\overset{n}{\sum }}\frac{{\alpha }_{ki}}{\sqrt{{\stackrel{˜}{\lambda }}_{k}}}\stackrel{˜}{K}\left({x}_{i},x\right)$ (11)

${t}_{k}={v}_{k}^{\text{T}}\stackrel{˜}{\Phi }\left(x\right)=\underset{i}{\overset{n}{\sum }}\frac{{\alpha }_{ki}}{\sqrt{{\stackrel{˜}{\lambda }}_{k}}}\stackrel{˜}{K}\left({x}_{i},x\right)$ (12)

4. KPCA-SVM模型

4.1. 详细流程如下

4.2. KPCA-SVM模型的介绍

KPCA-PSO-SVM模型是先用KPCA算法对乳腺癌数据进行特征提取，提取原始数据中的非线性的特征信息，这些特征向量去除了原始数据中的噪声，数据质量有所提高。将KPCA算法提取出来的特征信息作为支持向量机的特征向量，SVM基于这些特征向量进行建模，在建立模型的过程中采用PSO算法对SVM的参数进行迭代寻优。KPCA-GA-SVM模型与KPCA-PSO-SVM模型类似，只是KPCA-GA-SVM模型是采用GA算法对SVM的参数进行迭代寻优。

5. 仿真实验

5.1. 数据源

5.2. 实验设置

5.3. 实验结果分析

Table 1. Comparison of experimental results

Table 2. Comparison of experimental results

$Sen=\frac{TP}{TP+FN}$ (13)

$Spe=\frac{TN}{TN+FP}$ (14)

$F\text{-score}=2×\frac{TP}{TP+FP}×\frac{TP}{TP+FN}÷\left(\frac{TP}{TP+FP}+\frac{TP}{TP+FN}\right)$ (15)

SVM、KPCA-PSO-SVM和KPCA-GA-SVM分类测试结果如表所示，其中准确率指测试数据的平均分类准确率。

6. 结语

