Home > manopt > solvers > arc > arc_lanczos.m

arc_lanczos

PURPOSE ^

Subproblem solver for ARC based on a Lanczos process.

SYNOPSIS ^

function [eta, Heta, hesscalls, stop_str, stats] = arc_lanczos(problem, x, grad, gradnorm, sigma, options, storedb, key)

DESCRIPTION ^

 Subproblem solver for ARC based on a Lanczos process.

 [eta, Heta, hesscalls, stop_str, stats] = 
     arc_lanczos(problem, x, grad, gradnorm, sigma, options, storedb, key)

 This routine approximately solves the following problem:

   min_{eta in T_x M}  m(eta),  where

       m(eta) = <eta, g> + .5 <eta, H[eta]> + (sigma/3) ||eta||^3

 where eta is a tangent vector at x on the manifold given by problem.M,
 g = grad is a tangent vector at x, H[eta] is the result of applying the
 Hessian of the problem at x along eta and the inner product and norm
 are those from the Riemannian structure on the tangent space T_x M.

 The solve is approximate in the sense that the returned s only ought to
 satisfy the following conditions:

     ||gradient of m at s|| <= theta*||s||^2   and   m(s) <= m(0),

 where theta is specified in options.theta (see below for default value.)
 Since the gradient of the model at 0 is g, if it is zero, then s = 0 is
 returned. This is the only scenario where s = 0 is returned.

 Numerical errors can perturb the described expected behavior.

 Inputs:
     problem: Manopt optimization problem structure
     x: point on the manifold problem.M
     grad: gradient of the cost function of the problem at x
     gradnorm: norm of the gradient, often available to the caller
     sigma: cubic regularization parameter (positive scalar)
     options: structure containing options for the subproblem solver
     storedb, key: caching data for problem at x

 Options specific to this subproblem solver:
   theta (.5)
     Stopping criterion parameter for subproblem solver: the gradient of
     the model at the returned step should have norm no more than theta
     times the squared norm of the step.
   maxinner (M.dim())
     Maximum number of iterations of the Lanczos process, which is nearly
     the same as the maximum number of calls to the Hessian.
   maxiter_newton (100)
     Maximum number of iterations of the Newton root finder to solve each
     tridiagonal cubic problem.
   tol_newton (1e-16)
     Tolerance for the Newton root finder.

 Outputs:
     eta: approximate solution to the cubic regularized subproblem at x
     Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it
           is often naturally available to the subproblem solver at the
           end of execution, so that it may be cheaper to return it here.
     hesscalls: number of Hessian calls during execution
     stop_str: string describing why the subsolver stopped
     stats: a structure specifying some statistics about inner work

 See also: arc minimize_cubic_newton

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [eta, Heta, hesscalls, stop_str, stats] = arc_lanczos(problem, x, grad, gradnorm, sigma, options, storedb, key)
0002 % Subproblem solver for ARC based on a Lanczos process.
0003 %
0004 % [eta, Heta, hesscalls, stop_str, stats] =
0005 %     arc_lanczos(problem, x, grad, gradnorm, sigma, options, storedb, key)
0006 %
0007 % This routine approximately solves the following problem:
0008 %
0009 %   min_{eta in T_x M}  m(eta),  where
0010 %
0011 %       m(eta) = <eta, g> + .5 <eta, H[eta]> + (sigma/3) ||eta||^3
0012 %
0013 % where eta is a tangent vector at x on the manifold given by problem.M,
0014 % g = grad is a tangent vector at x, H[eta] is the result of applying the
0015 % Hessian of the problem at x along eta and the inner product and norm
0016 % are those from the Riemannian structure on the tangent space T_x M.
0017 %
0018 % The solve is approximate in the sense that the returned s only ought to
0019 % satisfy the following conditions:
0020 %
0021 %     ||gradient of m at s|| <= theta*||s||^2   and   m(s) <= m(0),
0022 %
0023 % where theta is specified in options.theta (see below for default value.)
0024 % Since the gradient of the model at 0 is g, if it is zero, then s = 0 is
0025 % returned. This is the only scenario where s = 0 is returned.
0026 %
0027 % Numerical errors can perturb the described expected behavior.
0028 %
0029 % Inputs:
0030 %     problem: Manopt optimization problem structure
0031 %     x: point on the manifold problem.M
0032 %     grad: gradient of the cost function of the problem at x
0033 %     gradnorm: norm of the gradient, often available to the caller
0034 %     sigma: cubic regularization parameter (positive scalar)
0035 %     options: structure containing options for the subproblem solver
0036 %     storedb, key: caching data for problem at x
0037 %
0038 % Options specific to this subproblem solver:
0039 %   theta (.5)
0040 %     Stopping criterion parameter for subproblem solver: the gradient of
0041 %     the model at the returned step should have norm no more than theta
0042 %     times the squared norm of the step.
0043 %   maxinner (M.dim())
0044 %     Maximum number of iterations of the Lanczos process, which is nearly
0045 %     the same as the maximum number of calls to the Hessian.
0046 %   maxiter_newton (100)
0047 %     Maximum number of iterations of the Newton root finder to solve each
0048 %     tridiagonal cubic problem.
0049 %   tol_newton (1e-16)
0050 %     Tolerance for the Newton root finder.
0051 %
0052 % Outputs:
0053 %     eta: approximate solution to the cubic regularized subproblem at x
0054 %     Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it
0055 %           is often naturally available to the subproblem solver at the
0056 %           end of execution, so that it may be cheaper to return it here.
0057 %     hesscalls: number of Hessian calls during execution
0058 %     stop_str: string describing why the subsolver stopped
0059 %     stats: a structure specifying some statistics about inner work
0060 %
0061 % See also: arc minimize_cubic_newton
0062 
0063 % This file is part of Manopt: www.manopt.org.
0064 % Original authors: May 1, 2018,
0065 %    Naman Agarwal, Brian Bullins, Nicolas Boumal and Coralia Cartis.
0066 % Contributors:
0067 % Change log:
0068 %   Aug 16, 2019 (NB):
0069 %       Default value for theta changed from 50 to 0.5.
0070 %       Option maxiter_lanczos renamed to maxinner to match trustregions.
0071 
0072 % TODO: think whether we can save the Lanczos basis in the storedb at the
0073 % given key in case we get a rejection, and simply "start where we left
0074 % off" with the updated sigma.
0075 
0076 % TODO: Lanczos is notoriously numerically unstable, with loss of
0077 % orthogonality being a main hurdle. Look into the literature (Paige,
0078 % Parlett), for possible numerical fixes.
0079 
0080     % Some shortcuts
0081     M = problem.M;
0082     n = M.dim();
0083     inner = @(u, v) M.inner(x, u, v);
0084     rnorm = @(u) M.norm(x, u);
0085     tangent = @(u) problem.M.tangent(x, u);
0086     Hess = @(u) getHessian(problem, x, u, storedb, key);
0087     
0088     % Counter for Hessian calls issued
0089     hesscalls = 0;
0090     
0091     % If the gradient has norm zero, return a zero step
0092     if gradnorm == 0
0093         eta = M.zerovec(x);
0094         Heta = eta;
0095         stop_str = 'Cost gradient is zero';
0096         stats = struct('newton_iterations', 0, 'gradnorms', 0, 'func_values', 0);
0097         return;
0098     end
0099     
0100     % Set local defaults here
0101     localdefaults.theta = .5;
0102     localdefaults.maxinner = n;
0103     % The following are here for the Newton solver called below
0104     localdefaults.maxiter_newton = 100;
0105     localdefaults.tol_newton = 1e-16;
0106     
0107     % Merge local defaults with user options, if any
0108     if ~exist('options', 'var') || isempty(options)
0109         options = struct();
0110     end
0111     options = mergeOptions(localdefaults, options);
0112     
0113     % Vectors where we keep track of the Newton root finder's work, the
0114     % gradient norm, and the function values at each inner iteration
0115     newton_iterations = zeros(n, 1);
0116     gradnorms = zeros(n, 1);
0117     func_values = zeros(n, 1);
0118     
0119     % Lanczos iteratively produces an orthonormal basis of tangent vectors
0120     % which tridiagonalize the Hessian. The corresponding tridiagonal
0121     % matrix is preallocated here as a sparse matrix.
0122     T = spdiags(ones(n, 3), -1:1, n, n);
0123     
0124     % The orthonormal basis (n tangent vectors at x) is stored in this cell
0125     Q = cell(n, 1);
0126     
0127     % Initialize Lanczos along the gradient direction (it is nonzero)
0128     q = M.lincomb(x, 1/gradnorm, grad);
0129     Q{1} = q;
0130     Hq = Hess(q);
0131     hesscalls = hesscalls + 1;
0132     alpha = inner(Hq, q);
0133     T(1, 1) = alpha;
0134     Hq_perp = M.lincomb(x, 1, Hq, -alpha, q);
0135     
0136     % Minimizing the cubic restricted to the one-dimensional space spanned
0137     % by Q{1} is easy: it amounts to minimizing a univariate cubic. Indeed,
0138     % with eta = y*q where y is a scalar, we minimize (since g = ||g||q):
0139     %  h(y) = <y*q, g> + .5 <y*q, H[y*q]> + (sigma/3) ||y*q||^3
0140     %       = ||g||*y + .5*alpha*y^2 + (sigma/3) |y|^3.
0141     % The sign of y affects only the linear term, hence it is clear we need
0142     % to pick y nonpositive. In that case, h becomes a cubic polynomial:
0143     %  h(y) = ||g||*y + .5*alpha*y^2 - (sigma/3) y^3
0144     % The derivative is a quadratic polynomial:
0145     %  h'(y) = ||g|| + alpha*y - sigma*y^2.
0146     % Since ||g|| and sigma are positive, h' has two real roots, one
0147     % positive and one negative (strictly). The negative root is the only
0148     % root of interest. It necessarily identifies a minimizer since
0149     % h(0) = 0, h(-inf) = inf and h'(0) > 0.
0150     %
0151     % We take the real part only to be safe.
0152     y = min(real(roots([-sigma, alpha, gradnorm])));
0153     
0154     % Main Lanczos iteration
0155     gradnorm_reached = false;
0156     for j = 1 : min(options.maxinner, n) - 1
0157 
0158         % Knowing that j Lanczos steps have been executed completely, now
0159         % execute the j+1st step to produce Q{j+1} and populate the
0160         % tridiagonal matrix T over the whole principal submatrix of size
0161         % j+1. This involves one Hessian call.
0162         %
0163         % In effect, we are computing this one step ahead. The reason is
0164         % that this makes it cheaper to compute the norm of the gradient of
0165         % the model, which is needed to check the stopping criterion (see
0166         % below).
0167         beta = rnorm(Hq_perp);
0168         % TODO: Figure out a sensible relative threshold
0169         if beta > 1e-12
0170             q = M.lincomb(x, 1/beta, Hq_perp);
0171         else
0172             % It appears the Krylov space maxed out (Hq is very nearly in
0173             % the space spanned by the existing Lanczos vectors). In order
0174             % to continue, the standard procedure is to generate a random
0175             % vector, and to orthogonalize it against the current basis.
0176             % This event is supposed to be rare.
0177             v = M.randvec(x);
0178             % Orthogonalize in the style of a modified Gram-Schmidt.
0179             for k = 1 : j
0180                 v = M.lincomb(x, 1, v, -inner(v, Q{k}), Q{k});
0181             end
0182             q = M.lincomb(x, 1/rnorm(v), v);
0183         end
0184         
0185         q = tangent(q);
0186         
0187         Hq = Hess(q);
0188         hesscalls = hesscalls + 1;
0189         Hqm = M.lincomb(x, 1, Hq, -beta, Q{j});
0190         % In exact arithmetic, alpha = <Hq, q>. Doing the computations in
0191         % this order amounts to a modified GS, which may help numerically.
0192         alpha = inner(Hqm, q);
0193         Hq_perp = M.lincomb(x, 1, Hqm, -alpha, q);
0194         Q{j+1} = q;
0195         T(j, j+1) = beta;     %#ok<SPRIX>
0196         T(j+1, j) = beta;     %#ok<SPRIX>
0197         T(j+1, j+1) = alpha;  %#ok<SPRIX>
0198         % End of the Lanczos procedure for step j.
0199 
0200         % Computing the norm of the gradient of the model at the computed
0201         % step 'Qy' (linear combination of the Q's with coefficients y.)
0202         % We actually compute the norm of a vector of coordinates for the
0203         % gradient of the model in the basis Q{1}, ..., Q{j+1}.
0204         model_gradnorm = norm(gradnorm*eye(j+1, 1) + ...
0205                               T(1:j+1, 1:j)*y + ...
0206                               sigma*norm(y)*[y ; 0]);
0207         gradnorms(j) = model_gradnorm;
0208         func_values(j) = y(1)*gradnorm + 0.5 * dot(y, T(1:j, 1:j)*y) + (sigma/3) * norm(y)^3;
0209 
0210         if options.verbosity >= 4
0211             fprintf('\nModel grad norm: %.16e, Iterate norm: %.16e', model_gradnorm, norm(y));
0212         end
0213         
0214         % Check termination condition
0215         if model_gradnorm <= options.theta*norm(y)^2
0216             stop_str = 'Model grad norm condition satisfied';
0217             gradnorm_reached = true;
0218             break;
0219         end
0220         
0221         % Minimize the cubic model restricted to the subspace spanned by
0222         % the available Lanczos vectors. In its current form, this solver
0223         % cannot reuse prior work from earlier Lanczos iterations: this may
0224         % be a point to improve.
0225         [y, newton_iter] = minimize_cubic_newton(T(1:j+1, 1:j+1), ...
0226                                      gradnorm*eye(j+1, 1), sigma, options);
0227         newton_iterations(j+1) = newton_iter;
0228     end
0229     
0230     % Check why we stopped iterating
0231     points = numel(y);
0232     if ~gradnorm_reached
0233         stop_str = sprintf(['Reached max number of Lanczos iterations ' ...
0234                '(options.maxinner = %d)'], options.maxinner);
0235         points = points - 1;
0236     end
0237     
0238     % Construct the tangent vector eta as a linear combination of the basis
0239     % vectors and make sure the result is tangent up to numerical accuracy.
0240     eta = lincomb(M, x, Q(1:numel(y)), y);
0241     eta = tangent(eta);
0242     % We could easily return the norm of eta as the norm of the coefficient
0243     % vector y here, but numerical errors might accumulate.
0244     
0245     % In principle we could avoid this call by computing an appropriate
0246     % linear combination of available vectors. For now at least, we favor
0247     % this numerically safer approach.
0248     Heta = Hess(eta);
0249     hesscalls = hesscalls + 1;
0250     
0251     stats = struct('newton_iterations', newton_iterations(1:points), ...
0252                    'gradnorms', gradnorms(1:points), ...
0253                    'func_values', func_values(1:points));
0254     
0255     if options.verbosity >= 4
0256         fprintf('\n');
0257     end
0258         
0259 end

Generated on Fri 30-Sep-2022 13:18:25 by m2html © 2005