最优精确线搜索梯度下降法的最佳收敛率分析

Research

arXiv

The Average and Essential Best Rate of Convergence of the Exact Line Search Gradient Descent Method

摘要 Abstract

大家都知道，当精确线搜索梯度下降法应用于凸二次目标函数时，最坏情况下的收敛率（ROC），在所有初始向量中，随着目标函数Hessian矩阵条件数的增长而恶化。由于H. Akaike的精妙分析，普遍认为——但未被证明——在病态条件下，几乎所有初始向量的ROC以及因此的平均ROC接近最坏情况的ROC。我们利用中心流形定理完成了Akaike的分析。我们的分析还通过以下有趣结果揭示了Hessian矩阵中间特征值的影响：在没有中间特征值的情况下，随着Hessian矩阵变得越来越病态，平均收敛率变得任意“快”——而非慢。我们顺便讨论了一些精确线搜索梯度下降法在成像和数据科学中的多项式优化问题中的当代应用。特别地，我们观察到，针对相位检索问题中产生的POP（多项式优化问题）定制的精确线搜索梯度下降算法，每迭代的计算成本仅比其固定步长的对应算法高50%，而其收敛率仅能由最优调整的（固定）步长匹配，但这种最优调整在实践中几乎无法实现。

It is very well known that when the exact line search gradient descent method is applied to a convex quadratic objective, the worst-case rate of convergence (ROC), among all seed vectors, deteriorates as the condition number of the Hessian of the objective grows. By an elegant analysis due to H. Akaike, it is generally believed -- but not proved -- that in the ill-conditioned regime the ROC for almost all initial vectors, and hence also the average ROC, is close to the worst case ROC. We complete Akaike's analysis using the theorem of center and stable manifolds. Our analysis also makes apparent the effect of an intermediate eigenvalue in the Hessian by establishing the following somewhat amusing result: In the absence of an intermediate eigenvalue, the average ROC gets arbitrarily \emph{fast} -- not slow -- as the Hessian gets increasingly ill-conditioned. We discuss in passing some contemporary applications of exact line search GD to polynomial optimization problems arising from imaging and data sciences. In particular, we observe that a tailored exact line search GD algorithm for a POP arising from the phase retrieval problem is only 50\% more expensive per iteration than its constant step size counterpart, while promising a ROC only matched by the optimally tuned (constant) step size which can almost never be achieved in practice.