In IEEE transactions on neural networks and learning systems
Stochastic approximation (SA) algorithms have been widely applied in minimization problems when the loss functions and/or the gradient information are only accessible through noisy evaluations. Stochastic gradient (SG) descent-a first-order algorithm and a workhorse of much machine learning-is perhaps the most famous form of SA. Among all SA algorithms, the second-order simultaneous perturbation stochastic approximation (2SPSA) and the second-order stochastic gradient (2SG) are particularly efficient in handling high-dimensional problems, covering both gradient-free and gradient-based scenarios. However, due to the necessary matrix operations, the per-iteration floating-point-operations (FLOPs) cost of the standard 2SPSA/2SG is O(p³), where p is the dimension of the underlying parameter. Note that the O(p³) FLOPs cost is distinct from the classical SPSA-based per-iteration O(1) cost in terms of the number of noisy function evaluations. In this work, we propose a technique to efficiently implement the 2SPSA/2SG algorithms via the symmetric indefinite matrix factorization and show that the FLOPs cost is reduced from O(p³) to O(p²). The formal almost sure convergence and rate of convergence for the newly proposed approach are directly inherited from the standard 2SPSA/2SG. The improvement in efficiency and numerical stability is demonstrated in two numerical studies.
Zhu Jingyi, Wang Long, Spall James C