In IEEE transactions on neural networks and learning systems
The proximal inertial gradient descent (PIGD) is efficient for the composite minimization and applicable for broad of machine learning problems. In this article, we revisit the computational complexity of this algorithm and present other novel results, especially on the convergence rates of the objective function values. The nonergodic O(1/k) rate is proved for PIGD with constant step size when the objective function is coercive. When the objective function fails to promise coercivity, we prove the sublinear rate with diminishing inertial parameters. In the case that the objective function satisfies the Polyak-Łojasiewicz (PŁ) property, the linear convergence is proved with much larger and general step size than the previous literature. We also extend our results to the multiblock version and present the computational complexity. Both cyclic and stochastic index selection strategies are considered.
Sun Tao, Qiao Linbo, Li Dongsheng