﻿ 支撑向量机的一致光滑牛顿法

# 支撑向量机的一致光滑牛顿法A Uniformly Smoothing Newton Method for Support Vector Machine

Abstract: To improve the classification performance of smooth support vector machines, a new uniformly smooth approximation function is introduced to replace the positive sign function. It doesn’t only overcome the non-differentiability in the process of solving the support vector machine model, but also is superior to other smooth functions in solving convergence rate. Based on the uniform smooth approximation function, we design a corresponding Newton algorithm and es-tablish the convergence of the proposed algorithm. Finally, numerical simulation shows the su-perior performance of the function in accuracy, efficiency and generalization adaptability for solving the smooth support vector machine model.

1. 引言

2005年，袁玉波 [4] 等人提出了两个多项式光滑函数，通过对支撑向量机模型的无约束目标函数进行光滑化处理，引出了PSSVM模型，也证明了PSSVM相比SSVM而言有着更好的分类性能。之后又提出了三阶样条光滑函数 [5]，从而引出了新的支撑向量机的TSSVM模型。通过分析TSSVM模型的收敛性和对应数据实验结果可以说明TSSVM有着更优秀的分类效果。2008年，熊金志等人 [6] 更是提出了高于三阶样条光滑函数一个数量级的六阶光滑函数。

2. 光滑支撑向量机

2.1. 支撑向量机

$\left\{\left({x}_{1},{y}_{1}\right),\left({x}_{2},{y}_{2}\right),\cdots ,\left({x}_{m},{y}_{m}\right)\right\}$

$\forall i,{y}_{i}h\left({x}_{i}\right)=1$ (1)

$h\left({x}_{i}\right):=sign\left({w}^{\text{T}}{x}_{i}+b\right)$ (2)

$\forall i,{y}_{i}\left({w}^{\text{T}}{x}_{i}+b\right)>0$ (3)

${y}_{i}h\left({x}_{i}\right)=1⇔{y}_{i}sign\left({w}^{\text{T}}{x}_{i}+b\right)=1$

$⇔{y}_{i}\left({\omega }^{\text{T}}{x}_{i}+b\right)>0$ (4)

$\left\{\begin{array}{l}\underset{\omega ,b}{\mathrm{min}}\frac{1}{2}{\omega }^{\text{T}}\omega +C{\sum }_{i=1}^{m}I\left(.\right)\\ {y}_{i}\ne sign\left({\omega }^{\text{T}}\phi \left({x}_{i}\right)+b\right)\\ \text{s}\text{.t}\text{.}\text{\hspace{0.17em}}{y}_{i}\left({\omega }^{\text{T}}\phi \left({x}_{i}\right)+b\right)\ge 1\end{array}$ (5)

${\xi }_{i}=\left\{\begin{array}{l}0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}若{y}_{i}\left({\omega }^{\text{T}}\phi \left({x}_{i}\right)+b\right)\ge 1\\ 1-{y}_{i}\left({\omega }^{\text{T}}\phi \left({x}_{i}\right)+b\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}若{y}_{i}=sign\left({\omega }^{\text{T}}\phi \left({x}_{i}\right)+b\right)\end{array}$ (6)

$\left\{\begin{array}{l}\underset{\omega ,b}{\mathrm{min}}\frac{1}{2}{\omega }^{\text{T}}\omega +C{\sum }_{i=1}^{m}{\xi }_{i}\hfill \\ \text{s}\text{.t}\text{.}\text{\hspace{0.17em}}{y}_{i}\left({\omega }^{\text{T}}\phi \left({x}_{i}\right)+b\right)\ge 1-{\xi }_{i}\hfill \\ i=1,2,\cdots ,m,{\xi }_{i}\ge 0,i=1,2,\cdots ,m\hfill \end{array}$ (7)

$\left\{\begin{array}{l}\underset{\alpha }{\mathrm{min}}\frac{1}{2}{\sum }_{i=1}^{m}{\sum }_{j=1}^{m}{\alpha }_{i}{\alpha }_{j}{y}_{i}{y}_{j}\phi {\left({x}_{i}\right)}^{\text{T}}\phi \left({x}_{j}\right)-{\sum }_{i=1}^{m}{\alpha }_{i}\\ \text{s}\text{.t}\text{.}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\sum }_{i=1}^{m}{\alpha }_{i}{y}_{i}=0,\text{\hspace{0.17em}}0\le {\alpha }_{i}\le {\xi }_{i},\text{\hspace{0.17em}}i=1,2,\cdots ,m\end{array}$ (8)

2.2. SSVM

$\begin{array}{c}f\left(x\right)=sign\left[\underset{i=1}{\overset{m}{\sum }}{y}_{i}{\alpha }_{i}^{*}\left(\phi \left({x}_{i}\right),\phi \left(x\right)\right)+{b}^{*}\right]\\ =sign\left[{\sum }_{i=1}^{m}{y}_{i}{\alpha }_{i}^{*}K\left({x}_{i},x\right)+{b}^{*}\right]\end{array}$ (9)

${b}^{*}=\frac{1}{m}\underset{i=1}{\overset{m}{\sum }}\left({y}_{i}-\underset{i=1}{\overset{m}{\sum }}{y}_{i}{\alpha }_{j}^{*}\left(\phi \left({x}_{j}\right),\phi \left({x}_{i}\right)\right)\right)$

$X={\left({x}_{1},{x}_{2},\cdots ,{x}_{m}\right)}^{\text{T}}$

${x}_{i}=\left({x}_{i}{}_{1},{x}_{i}{}_{2},\cdots ,{x}_{i}{}_{n}\right)\text{\hspace{0.17em}}\left(i=1,2,\cdots ,m\right)$

$e=ones\left(m,1\right)$

$D=diag\left({y}_{1},{y}_{2},\cdots ,{y}_{m}\right)$, $A=DK\left(X,{X}^{\text{T}}\right)D$

$\left\{\begin{array}{l}\underset{\alpha ,b}{\mathrm{min}}\frac{1}{2}{\alpha }^{\text{T}}A\alpha +C{e}^{\text{T}}\xi \\ \text{s}\text{.t}\text{.}\text{\hspace{0.17em}}\text{\hspace{0.17em}}D\left(K\left(X,{X}^{\text{T}}\right)D\alpha +eb\right)-e+\xi \ge 0,\xi \ge 0\end{array}$ (10)

$\left\{\begin{array}{l}\underset{\alpha ,b}{\mathrm{min}}\frac{1}{2}\left({\alpha }^{\text{T}}\alpha +{b}^{2}\right)+\frac{c}{2}{\xi }^{\text{T}}\xi \\ \text{s}\text{.t}\text{.}\text{\hspace{0.17em}}\text{\hspace{0.17em}}D\left(K\left(X,{X}^{\text{T}}\right)D\alpha +eb\right)-e+\xi \ge 0,\xi \ge 0\end{array}$ (11)

$\xi ={\left(e-D\left(K\left(X,{X}^{\text{T}}\right)D\alpha +eb\right)\right)}_{+}$

$\underset{\alpha ,b}{\mathrm{min}}\frac{1}{2}\left({\alpha }^{\text{T}}\alpha +{b}^{2}\right)+\frac{c}{2}{‖{\left(e-D\left(K\left(X,{X}^{\text{T}}\right)D\alpha +eb\right)\right)}_{+}‖}^{2}$ (12)

$z=\left(\begin{array}{c}\alpha \\ b\end{array}\right),r={\left({r}_{1},{r}_{2},\cdots ,{r}_{l}\right)}^{\text{T}}$

${\left({r}_{+}\right)}_{i}=\mathrm{max}\left\{{r}_{i},0\right\},{r}_{i}=1-{H}_{i}z,{H}_{i}={y}_{i}\left[{K}_{i}-1\right]$

${K}_{i}$ 是核矩阵 ${K}_{i}$ 的第i行。则将(13)改写为一般形式为

$\underset{z}{\mathrm{min}}f\left(z\right)=\frac{1}{2}{z}^{\text{T}}z+\frac{c}{2}{‖{r}_{+}‖}_{2}^{2}$ (13)

$\underset{z}{\mathrm{min}}{f}_{p}\left(z\right)=\frac{1}{2}{z}^{\text{T}}z+\frac{c}{2}{\left(p\left(r,k\right)\right)}^{\text{T}}p\left(r,k\right)$ (14)

3. 一致光滑逼近函数及其性质

${h}_{\lambda }\left(t\right)=\mu \mathrm{ln}\left({\text{e}}^{\frac{t}{\mu }}+{\text{e}}^{-\frac{t}{\mu }}\right)+\left(\lambda -1\right)\mu \mathrm{ln}2$,

$\mu >0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\lambda \in \left[0,1\right]$ (15)

$\mu \to 0$ 时，

,;

,.

Figure 1. The approximation degree of smooth functions

2)的值随参数的减小而单调下降，且当时，

3)，且

2)的值随参数的减小而单调下降的证明可参考于李兴斯等 [8] [9] 关于凝聚函数的性质推导过程。结合，即，这表明有界，利用单调有界必有极限的定理及夹逼准则，则有

3)，而，即，因此，且

2)的值随参数的减小而单调递增，且当时，

3)，且

，则有

Figure 2. The approximation degree of y1 and y2

2)的值随参数的减小而单调递增的证明可参考于杨庆之等人 [10] [11] 等关于凝聚函数法的文献。

3)，而，即，因此，且

(16)

Figure 3. The approximation degree of y1 and y3

Figure 4. The approximation degree of y1 and y4

Figure 5. The approximation degree of y1 and y4

4. Newton-Armijo算法

Newton-Armijo算法是对于牛顿法的一个改进。采用Armijo线搜索，并利用牛顿法进行求解。记式(13)的目标函数梯度为

,

Hessian矩阵，其中；式(15)的目标函数梯度为：

,

Hessian矩阵，其中

Step 1

Initialization. Given the initial point, let, , parameter: generated from the input data;

Step 2

Calculate. If, stop; or turn to step 3;

Step 3

Calculate Hessian matrix, solve and then we get;

Step 4 (Armijo line search) select

and let

Step 5

Let, , turn to step 2.

Step 1

Initialization. Given the initial point, let, , parameter: generated from the input data;

Step 2

Calculate. If, stop; or turn to step 3;

Step 3

Calculate Hessian matrix, solve and then we get;

Step 4 (Armijo line search) select

and let

Step 5

Let, , turn to step 2.

5. 数值结果和比较

5.1. 一致光滑牛顿法求解

5.2. 分析与比较

1) 一般来说，随着光滑函数与目标函数的逼近程度的靠近，求解SVM的解的精度会更好，在迭代次数有要求的情况下，光滑函数的选取决定了它是否能够满足实验精度要求，否则只能在最大迭代次数下才能得到满足实验精度的结果。新选取的光滑函数的参数选择也决定了它是否能够很好的契合实际问

Figure 6. The result (1) of classification

Figure 7. The result (2) of classification

Figure 8. The result (3) of classification

Figure 9. The result (4) of classification

Table 1. Experimental results of different solving algorithms for different smooth functions

2) 一般来说，光滑函数的逼近可以使得训练错误率的下降逐渐减小甚至为0，但是数据训练的时间也会越来越多。这就要求我们在求解大规模问题中需要寻求一个适应性很好的光滑函数，在光滑函数的逼近程度和函数本身的复杂中间需要我们做好平衡，如何保证在尽量减少计算量的情况下最大化计算效率和求解速度，并在算法参数和光滑参数的结合上做一个好的调整是一件值得考虑的事情。

6. 结论

NOTES

*第一作者。

#通讯作者。

[1] Vapnik, V.N. (1995) The Nature of Statistical Learning Theory. Spring Verlag, New York, 1-299.
https://doi.org/ 10.1007/978-1-4757-2440-0

[2] Lee, Y.J. and Mangasarian, O.L. (2001) SSVM: A Smooth Support Vector Machine for Classification. Computational Optimization and Applications, 20, 5-22.
https://doi.org/ 10.1023/A:1011215321374

[3] Lee, Y.J. and Mangasarian, O.L. (2001) RSVM: Reduced Support Vector Machine. Proceedings of the SIAM International Conference on Data Mining, Chicago, 5-7 April 2001, 1-17.
https://doi.org/ 10.1137/1.9781611972719.13

[4] 袁玉波, 严杰, 徐成贤. 多项式光滑的支撑向量机[J]. 计算机学, 2005, 28(1): 9-17.

[5] Yuan, Y., Fan, W. and Pu, D. (2007) Spline Function Smooth Support Vector for Classification. Journal of Industrial and Management Optimization, 3, 529-542.

[6] 黄仁泰, 熊金志. 一个支持向量机的新光滑函数及其性能分析[J]. 计算机工程与应用, 2008, 44(18): 64-65.

[7] 雍龙泉. 绝对值函数的一致光滑逼近函数[J]. 数学的实践与认识, 2015, 45(20): 250-255.

[8] 李兴斯. 解非线性规划的凝聚函数法[J]. 中国科学: A辑, 1991, 12(12): 1283-1288.

[9] 李兴斯. 一类不可微优化问题的有效解法[J]. 中国科学: A辑, 1994, 24(4): 371-377.

[10] 杨庆之, 杨德庄, 张敏洪. 调节熵函数法[J]. 计算数学, 2001, 23(1): 81-86.

[11] 杨庆之. 对凝聚函数法的分析[J]. 计算数学, 1996, 18(4): 405-410.

Top