﻿ Armijo搜索下求解无约束优化问题的扰动BFGS方法

# Armijo搜索下求解无约束优化问题的扰动BFGS方法The Perturbed BFGS Method for the Unconstrained Optimization with the Armijo Line Search

Abstract: In [1], a perturbed BFGS method was proposed to solve the unconstrained optimization and was proved to be globally convergent when the Wolfe line search was used. In this paper, we show that the perturbed BFGS method also possesses global convergence for nonconvex problems with the relatively weaker Armijo line search. Numerical results show that this method with the Armijo search is also promising.

1. 引言

$\mathrm{min}f\left(x\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\in {R}^{n},$ (1.1)

${x}_{k+1}={x}_{k}+{\alpha }_{k}{d}_{k},\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\ge 1,$ (1.2)

${B}_{k}{d}_{k}+\nabla f\left({x}_{k}\right)=0,$ (1.3)

${B}_{k+1}={B}_{k}-\frac{{B}_{k}{s}_{k}{s}_{k}^{\text{T}}{B}_{k}}{{s}_{k}^{\text{T}}{B}_{k}{s}_{k}}+\frac{{y}_{k}{y}_{k}^{\text{T}}}{{y}_{k}^{\text{T}}{s}_{k}},$ (1.4)

Powell在文献 [4] 中证明了求解凸优化问题的BFGS方法只有全局收敛性。但对非凸优化问题，Dai [5] 和Mascarenhas [6] 给出例子说明采用Wolfe型非精确线性搜索和精确线性搜索的BFGS方法不一定具有全局收敛性。在文献 [1] 中，Liu提出了一种扰动的BFGS方法，简称PBFGS，在该算法中，搜索方向 ${d}_{k}$ 由以下线性方程组确定。

$\left({B}_{k}+{\mu }_{k}Q\right){d}_{k}+\nabla f\left({x}_{k}\right)=0,$ (1.5)

${B}_{k+1}=\left\{\begin{array}{l}{B}_{k}-\frac{{B}_{k}{s}_{k}{s}_{k}^{\text{T}}{B}_{k}}{{s}_{k}^{\text{T}}{B}_{k}{s}_{k}}+\frac{{y}_{k}{y}_{k}^{\text{T}}}{{y}_{k}^{\text{T}}{s}_{k}}, \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}如果\text{ }{y}_{k}^{\text{T}}{s}_{k}>0\hfill \\ {B}_{k}, \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}其他\hfill \end{array}$ (1.6)

2. 算法与收敛性分析

$\left({B}_{k}+{\mu }_{k}Q\right){d}_{k}+\nabla f\left({x}_{k}\right)=0,$ (2.1)

$f\left({x}_{k}+{\alpha }_{k}{d}_{k}\right)-f\left({x}_{k}\right)\le {\sigma }_{1}{\alpha }_{k}\nabla f{\left({x}_{k}\right)}^{\text{T}}{d}_{k},$ (2.2)

${B}_{k+1}=\left\{\begin{array}{l}{B}_{k}-\frac{{B}_{k}{s}_{k}{s}_{k}^{\text{T}}{B}_{k}}{{s}_{k}^{\text{T}}{B}_{k}{s}_{k}}+\frac{{y}_{k}{y}_{k}^{\text{T}}}{{y}_{k}^{\text{T}}{s}_{k}},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}如果\text{ }{y}_{k}^{\text{T}}{s}_{k}>0\hfill \\ {B}_{k}, \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}其他\hfill \end{array}$ (2.3)

${\mu }_{k+1}=\left\{\begin{array}{l}{\epsilon }_{k+1}{‖{B}_{k+1}‖}_{F},如果{‖{B}_{k+1}‖}_{F}\ge {M}_{k+1}\hfill \\ {\epsilon }_{k+1},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}其它\hfill \end{array}$ (2.4)

${\epsilon }_{k+1}=\left\{\begin{array}{l}\tau {\epsilon }_{k},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}如果‖\nabla f\left({x}_{k+1}\right)‖/\delta \le \eta \\ {\epsilon }_{k},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}否则\end{array}$ (2.5)

$\underset{k\to \infty }{\mathrm{lim}\mathrm{inf}}‖\nabla f\left({x}_{k}\right)‖=0$ (2.6)

$‖\nabla f\left(x\right)-\nabla f\left(y\right)‖\le L‖x-y‖,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\left(\forall x,y\in \Omega \right).$ (2.7)

$\begin{array}{l}f\left({x}_{1}+{\alpha }_{1}{d}_{1}\right)-f\left({x}_{1}\right)\le {\sigma }_{1}{\alpha }_{1}\nabla f{\left({x}_{1}\right)}^{\text{T}}{d}_{1}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}⋮\\ f\left({x}_{k}+{\alpha }_{k}{d}_{k}\right)-f\left({x}_{k}\right)\le {\sigma }_{1}{\alpha }_{k}\nabla f{\left({x}_{k}\right)}^{\text{T}}{d}_{k}\end{array}$

$f\left({x}_{k}+{\alpha }_{k}{d}_{k}\right)-f\left({x}_{1}\right)\le {\sigma }_{1}\underset{k=1}{\overset{\infty }{\sum }}{\alpha }_{k}\nabla f{\left({x}_{k}\right)}^{\text{T}}{d}_{k}.$

$-\underset{k=1}{\overset{\infty }{\sum }}{\alpha }_{k}\nabla f{\left({x}_{k}\right)}^{\text{T}}{d}_{k}\le -\frac{{C}_{1}}{{\delta }_{1}}<+\infty .$

$-\underset{k\to +\infty }{\mathrm{lim}}{\alpha }_{k}\nabla f{\left({x}_{k}\right)}^{\text{T}}{d}_{k}=-\underset{k\to +\infty }{\mathrm{lim}}\nabla f{\left({x}_{k}\right)}^{\text{T}}{s}_{k}=0.$ (2.8)

${K}_{\tau }=\left\{k:{\epsilon }_{k+1}=\tau {\epsilon }_{k}\right\},$ ${K}_{B}=\left\{k:{‖{B}_{k}‖}_{F}\ge {M}_{k}\right\}.$ (2.9)

$-\nabla f{\left({x}_{k}\right)}^{\text{T}}{d}_{k}={d}_{k}^{\text{T}}\left({B}_{k}+{\mu }_{k}I\right){d}_{k}\ge {\epsilon }_{k}{‖{B}_{k}‖}_{F}{‖{d}_{k}‖}^{2}\ge {\epsilon }_{k}\mathrm{min}\left\{1,{M}_{B}\right\}{‖{d}_{k}‖}^{2}.$ (2.10)

$f\left({x}_{k}+{\rho }^{-1}{\alpha }_{k}{d}_{k}\right)-f\left({x}_{k}\right)>{\sigma }_{1}{\rho }^{-1}{\alpha }_{k}\nabla f{\left({x}_{k}\right)}^{\text{T}}{d}_{k}.$ (2.11)

$\begin{array}{l}f\left({x}_{k}+{\rho }^{-1}{\alpha }_{k}{d}_{k}\right)-f\left({x}_{k}\right)\\ ={\rho }^{-1}{\alpha }_{k}\nabla f{\left({x}_{k}+{\theta }_{k}{\rho }^{-1}{\alpha }_{k}{d}_{k}\right)}^{\text{T}}{d}_{k}\\ ={\rho }^{-1}{\alpha }_{k}\nabla f{\left({x}_{k}\right)}^{\text{T}}{d}_{k}+{\rho }^{-1}{\alpha }_{k}\left[\nabla f{\left({x}_{k}+{\theta }_{k}{\rho }^{-1}{\alpha }_{k}{d}_{k}\right)}^{\text{T}}-\nabla f\left({x}_{k}\right)\right]{d}_{k}\\ \le {\rho }^{-1}{\alpha }_{k}\nabla f{\left({x}_{k}\right)}^{\text{T}}{d}_{k}+L{\rho }^{-2}{\alpha }_{k}^{2}{‖{d}_{k}‖}^{2}\end{array}$(2.12)

$\begin{array}{l}{\alpha }_{k}\ge \frac{-\left(1-{\sigma }_{1}\right)\rho \nabla f{\left({x}_{k}\right)}^{\text{T}}{d}_{k}}{L{‖{d}_{k}‖}^{2}}=\frac{\left(1-{\sigma }_{1}\right)\rho {d}_{k}^{\text{T}}\left({B}_{k}+{\mu }_{k}I\right){d}_{k}}{L{‖{d}_{k}‖}^{2}}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\ge \frac{\left(1-{\sigma }_{1}\right)\rho {\epsilon }_{k}{}_{{}_{1}}}{L}\mathrm{min}\left\{1,{M}_{B}\right\}>0.\end{array}$ (2.13)

${\alpha }_{k}=1$ 时，显然可得 ${\alpha }_{k}>0$ 。又由(2.8)，(2.10)可得

$0=-\underset{k\to +\infty }{\mathrm{lim}}{\alpha }_{k}\nabla f{\left({x}_{k}\right)}^{\text{T}}{d}_{k}=\underset{k\to +\infty }{\mathrm{lim}}{\alpha }_{k}{d}_{k}^{\text{T}}\left({B}_{k}+{\mu }_{k}I\right){d}_{k}\ge \underset{k\to +\infty }{\mathrm{lim}}{\alpha }_{k}{\epsilon }_{k}\mathrm{min}\left\{1,{M}_{B}\right\}{‖{d}_{k}‖}^{2}>0,$

$\underset{k\to \infty }{\mathrm{lim}}\mathrm{inf}‖\nabla f\left({x}_{k}\right)‖=0.$ (2.14)

${B}_{k}{d}_{k}+{\epsilon }_{{k}_{1}}{‖{B}_{k}‖}_{F}{d}_{k}+\nabla f\left({x}_{k}\right)=0.$

$-\nabla f{\left({x}_{k}\right)}^{\text{T}}{\stackrel{^}{d}}_{k}={d}_{k}^{\text{T}}{B}_{k}{\stackrel{^}{d}}_{k}+{\epsilon }_{{k}_{1}}{‖{\stackrel{^}{d}}_{k}‖}^{2}.$ (2.15)

$L{\rho }^{-2}{\alpha }_{k}^{2}{‖{d}_{k}‖}^{2}\ge \left(\sigma -1\right){\rho }^{-1}{\alpha }_{k}\nabla f{\left({x}_{k}\right)}^{\text{T}}{d}_{k}.$

$L{\rho }^{-2}{\stackrel{^}{\alpha }}_{k}^{2}{‖{\stackrel{^}{d}}_{k}‖}^{2}\ge \left(\sigma -1\right){\rho }^{-1}{\stackrel{^}{\alpha }}_{k}\nabla f{\left({x}_{k}\right)}^{\text{T}}{\stackrel{^}{d}}_{k},$

${\stackrel{^}{\alpha }}_{k}\ge \frac{-\left(1-{\sigma }_{1}\right)\rho \nabla f{\left({x}_{k}\right)}^{\text{T}}{\stackrel{^}{d}}_{k}}{L{‖{\stackrel{^}{d}}_{k}‖}^{2}}=\frac{\left(1-{\sigma }_{1}\right)\rho \left({d}_{k}^{\text{T}}{B}_{k}{\stackrel{^}{d}}_{k}+{\epsilon }_{{k}_{1}}{‖{\stackrel{^}{d}}_{k}‖}^{2}\right)}{L{‖{\stackrel{^}{d}}_{k}‖}^{2}}\ge \frac{\left(1-{\sigma }_{1}\right)\rho {\epsilon }_{{k}_{1}}}{L}>0.$

$0=-\underset{k\to +\infty }{\mathrm{lim}}{\stackrel{^}{\alpha }}_{k}\nabla f{\left({x}_{k}\right)}^{\text{T}}{\stackrel{^}{d}}_{k}=\underset{k\to +\infty }{\mathrm{lim}}\stackrel{^}{\alpha }\left({d}_{k}^{\text{T}}{B}_{k}{\stackrel{^}{d}}_{k}+{\epsilon }_{{k}_{1}}{‖{\stackrel{^}{d}}_{k}‖}^{2}\right)\ge \underset{k\to +\infty }{\mathrm{lim}}{\stackrel{^}{\alpha }}_{k}{\epsilon }_{{k}_{1}}{‖{\stackrel{^}{d}}_{k}‖}^{2}>0.$

$\underset{k\to \infty ,k\in {K}_{B}}{\mathrm{lim}}{\stackrel{^}{d}}_{k}=0.$ (2.16)

$‖\nabla f\left({x}_{k}\right)‖\le ‖{B}_{k}‖‖{\stackrel{^}{d}}_{k}‖/{‖{B}_{k}‖}_{F}+{\epsilon }_{{k}_{1}}‖{\stackrel{^}{d}}_{k}‖\le \left(1+{\epsilon }_{{k}_{1}}\right)‖{\stackrel{^}{d}}_{k}‖.$

$\underset{k\to \infty }{\mathrm{lim}}\mathrm{inf}‖\nabla f\left({x}_{k}\right)‖=0.$

$\underset{k\to \infty }{\mathrm{lim}}\mathrm{inf}‖\nabla f\left({x}_{k}\right)‖=0.$

${‖{B}_{k}‖}_{F}<\mathrm{max}\left\{{M}_{B},1/‖\nabla f\left({x}_{k}\right)‖\right\}\le \mathrm{max}\left\{{M}_{B},1/\gamma \right\}.$ (2.17)

${K}_{\tau }$ 为有限集时，对所有的 $k\ge {k}_{1}$ 有(2.10)式成立。由引理1和(2.8)可知，当 $k\to \infty$ 时， ${d}_{k}\to 0$ 。进一步由修正后的拟牛顿公式(2.1)以及(2.17)式可得，对所有的 $k\notin {K}_{B}$ 有下面不等式

$‖\nabla f\left({x}_{k}\right)‖\le \left(‖{B}_{k}‖+{\epsilon }_{k}\right)‖{d}_{k}‖\le \left({‖{B}_{k}‖}_{F}+{\epsilon }_{1}\right)‖{d}_{k}‖\le \left(\mathrm{max}\left\{{M}_{B},1/\gamma \right\}+{\epsilon }_{1}\right)‖{d}_{k}‖$

$\underset{k\to \infty ,k\notin {K}_{B}}{\mathrm{lim}}\mathrm{inf}‖\nabla f\left({x}_{k}\right)‖=0,$

$\underset{k\to \infty }{\mathrm{lim}}\mathrm{inf}‖\nabla f\left({x}_{k}\right)‖=0.$ (2.18)

3. 数值试验及结果分析

$\theta \left({x}_{1},{x}_{2}\right)=\left\{\begin{array}{l}\frac{1}{\text{2π}}\mathrm{arctan}\left(\frac{{x}_{2}}{{x}_{1}}\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{x}_{1}>0\\ \frac{1}{\text{2π}}\mathrm{arctan}\left(\frac{{x}_{2}}{{x}_{1}}\right)+0.5,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{x}_{1}<0\end{array}$

Table 1. Experimental results of algorithm

[1] 刘陶文. BFGS方法及其在求解约束优化问题中的应用[D]: [博士学位论文]. 长沙: 湖南大学, 2006.

[2] 李董辉, 童小娇, 万中. 数值最优化算法与理论[M]. 北京: 科学出版社, 2010: 51-63.

[3] 袁亚湘, 孙文瑜. 最优化理论与方法[M]. 北京: 中国科学出版社, 2001: 1-627.

[4] Powell, M.J.D., Cottle, R.W. and Lemke, C.E. (1976) Some Convergence Properties of a Variable Metric Algorithm for Minimization without Exact Line Search. SIAM Publications, 53-72.

[5] Dai, Y.H. (2002) Convergence Properties of the BFGS Algorithm. SIAM Journal on Optimization, 13, 693-701.
https://doi.org/10.1137/S1052623401383455

[6] Mascarenhas, W.F. (2004) The BFGS Method with Exact Line Searches Fails for Non-Convex Objective Functions. Mathematical Programming, 99, 49-61.
https://doi.org/10.1007/s10107-003-0421-7

[7] Li, D.H. and Fukushima, M. (2001) On the Global Conver-gence of the BFGS Method for Nonconvex Unconstrained Problems. SIAM Journal on Optimization, 11, 1054-1064.
https://doi.org/10.1137/S1052623499354242

[8] Armijo, L. (1966) Minimization of Functions Having Lipschitz Continuous Partial Derivatives. Pacific Journal of Mathematics, 16, 1-3.
https://doi.org/10.2140/pjm.1966.16.1

[9] Li, D.H. and Fukushima, M. (2001) A Modified BFGS Method and Its Global Convergence in Nonconvex Minimization. Journal of Computational and Applied Mathematics, 192, 15-35.
https://doi.org/10.1016/S0377-0427(00)00540-9

[10] More, J.J., Garbow, B.S. and Hillstrom, K.E. (1981) Testing Unconstrained Optimization Software. ACM Transactions on Mathematical Software, 7, 17-41.
https://doi.org/10.1145/355934.355936

Top