Subproblem solver for ARC based on a Lanczos process. [eta, Heta, hesscalls, stop_str, stats] = arc_lanczos(problem, x, grad, gradnorm, sigma, options, storedb, key) This routine approximately solves the following problem: min_{eta in T_x M} m(eta), where m(eta) = <eta, g> + .5 <eta, H[eta]> + (sigma/3) ||eta||^3 where eta is a tangent vector at x on the manifold given by problem.M, g = grad is a tangent vector at x, H[eta] is the result of applying the Hessian of the problem at x along eta and the inner product and norm are those from the Riemannian structure on the tangent space T_x M. The solve is approximate in the sense that the returned s only ought to satisfy the following conditions: ||gradient of m at s|| <= theta*||s||^2 and m(s) <= m(0), where theta is specified in options.theta (see below for default value.) Since the gradient of the model at 0 is g, if it is zero, then s = 0 is returned. This is the only scenario where s = 0 is returned. Numerical errors can perturb the described expected behavior. Inputs: problem: Manopt optimization problem structure x: point on the manifold problem.M grad: gradient of the cost function of the problem at x gradnorm: norm of the gradient, often available to the caller sigma: cubic regularization parameter (positive scalar) options: structure containing options for the subproblem solver storedb, key: caching data for problem at x Options specific to this subproblem solver: theta (.5) Stopping criterion parameter for subproblem solver: the gradient of the model at the returned step should have norm no more than theta times the squared norm of the step. maxinner (M.dim()) Maximum number of iterations of the Lanczos process, which is nearly the same as the maximum number of calls to the Hessian. maxiter_newton (100) Maximum number of iterations of the Newton root finder to solve each tridiagonal cubic problem. tol_newton (1e-16) Tolerance for the Newton root finder. Outputs: eta: approximate solution to the cubic regularized subproblem at x Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it is often naturally available to the subproblem solver at the end of execution, so that it may be cheaper to return it here. hesscalls: number of Hessian calls during execution stop_str: string describing why the subsolver stopped stats: a structure specifying some statistics about inner work See also: arc minimize_cubic_newton
0001 function [eta, Heta, hesscalls, stop_str, stats] = arc_lanczos(problem, x, grad, gradnorm, sigma, options, storedb, key) 0002 % Subproblem solver for ARC based on a Lanczos process. 0003 % 0004 % [eta, Heta, hesscalls, stop_str, stats] = 0005 % arc_lanczos(problem, x, grad, gradnorm, sigma, options, storedb, key) 0006 % 0007 % This routine approximately solves the following problem: 0008 % 0009 % min_{eta in T_x M} m(eta), where 0010 % 0011 % m(eta) = <eta, g> + .5 <eta, H[eta]> + (sigma/3) ||eta||^3 0012 % 0013 % where eta is a tangent vector at x on the manifold given by problem.M, 0014 % g = grad is a tangent vector at x, H[eta] is the result of applying the 0015 % Hessian of the problem at x along eta and the inner product and norm 0016 % are those from the Riemannian structure on the tangent space T_x M. 0017 % 0018 % The solve is approximate in the sense that the returned s only ought to 0019 % satisfy the following conditions: 0020 % 0021 % ||gradient of m at s|| <= theta*||s||^2 and m(s) <= m(0), 0022 % 0023 % where theta is specified in options.theta (see below for default value.) 0024 % Since the gradient of the model at 0 is g, if it is zero, then s = 0 is 0025 % returned. This is the only scenario where s = 0 is returned. 0026 % 0027 % Numerical errors can perturb the described expected behavior. 0028 % 0029 % Inputs: 0030 % problem: Manopt optimization problem structure 0031 % x: point on the manifold problem.M 0032 % grad: gradient of the cost function of the problem at x 0033 % gradnorm: norm of the gradient, often available to the caller 0034 % sigma: cubic regularization parameter (positive scalar) 0035 % options: structure containing options for the subproblem solver 0036 % storedb, key: caching data for problem at x 0037 % 0038 % Options specific to this subproblem solver: 0039 % theta (.5) 0040 % Stopping criterion parameter for subproblem solver: the gradient of 0041 % the model at the returned step should have norm no more than theta 0042 % times the squared norm of the step. 0043 % maxinner (M.dim()) 0044 % Maximum number of iterations of the Lanczos process, which is nearly 0045 % the same as the maximum number of calls to the Hessian. 0046 % maxiter_newton (100) 0047 % Maximum number of iterations of the Newton root finder to solve each 0048 % tridiagonal cubic problem. 0049 % tol_newton (1e-16) 0050 % Tolerance for the Newton root finder. 0051 % 0052 % Outputs: 0053 % eta: approximate solution to the cubic regularized subproblem at x 0054 % Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it 0055 % is often naturally available to the subproblem solver at the 0056 % end of execution, so that it may be cheaper to return it here. 0057 % hesscalls: number of Hessian calls during execution 0058 % stop_str: string describing why the subsolver stopped 0059 % stats: a structure specifying some statistics about inner work 0060 % 0061 % See also: arc minimize_cubic_newton 0062 0063 % This file is part of Manopt: www.manopt.org. 0064 % Original authors: May 1, 2018, 0065 % Naman Agarwal, Brian Bullins, Nicolas Boumal and Coralia Cartis. 0066 % Contributors: 0067 % Change log: 0068 % Aug 16, 2019 (NB): 0069 % Default value for theta changed from 50 to 0.5. 0070 % Option maxiter_lanczos renamed to maxinner to match trustregions. 0071 0072 % TODO: think whether we can save the Lanczos basis in the storedb at the 0073 % given key in case we get a rejection, and simply "start where we left 0074 % off" with the updated sigma. 0075 0076 % TODO: Lanczos is notoriously numerically unstable, with loss of 0077 % orthogonality being a main hurdle. Look into the literature (Paige, 0078 % Parlett), for possible numerical fixes. 0079 0080 % Some shortcuts 0081 M = problem.M; 0082 n = M.dim(); 0083 inner = @(u, v) M.inner(x, u, v); 0084 rnorm = @(u) M.norm(x, u); 0085 tangent = @(u) problem.M.tangent(x, u); 0086 Hess = @(u) getHessian(problem, x, u, storedb, key); 0087 0088 % Counter for Hessian calls issued 0089 hesscalls = 0; 0090 0091 % If the gradient has norm zero, return a zero step 0092 if gradnorm == 0 0093 eta = M.zerovec(x); 0094 Heta = eta; 0095 stop_str = 'Cost gradient is zero'; 0096 stats = struct('newton_iterations', 0, 'gradnorms', 0, 'func_values', 0); 0097 return; 0098 end 0099 0100 % Set local defaults here 0101 localdefaults.theta = .5; 0102 localdefaults.maxinner = n; 0103 % The following are here for the Newton solver called below 0104 localdefaults.maxiter_newton = 100; 0105 localdefaults.tol_newton = 1e-16; 0106 0107 % Merge local defaults with user options, if any 0108 if ~exist('options', 'var') || isempty(options) 0109 options = struct(); 0110 end 0111 options = mergeOptions(localdefaults, options); 0112 0113 % Vectors where we keep track of the Newton root finder's work, the 0114 % gradient norm, and the function values at each inner iteration 0115 newton_iterations = zeros(n, 1); 0116 gradnorms = zeros(n, 1); 0117 func_values = zeros(n, 1); 0118 0119 % Lanczos iteratively produces an orthonormal basis of tangent vectors 0120 % which tridiagonalize the Hessian. The corresponding tridiagonal 0121 % matrix is preallocated here as a sparse matrix. 0122 T = spdiags(ones(n, 3), -1:1, n, n); 0123 0124 % The orthonormal basis (n tangent vectors at x) is stored in this cell 0125 Q = cell(n, 1); 0126 0127 % Initialize Lanczos along the gradient direction (it is nonzero) 0128 q = M.lincomb(x, 1/gradnorm, grad); 0129 Q{1} = q; 0130 Hq = Hess(q); 0131 hesscalls = hesscalls + 1; 0132 alpha = inner(Hq, q); 0133 T(1, 1) = alpha; 0134 Hq_perp = M.lincomb(x, 1, Hq, -alpha, q); 0135 0136 % Minimizing the cubic restricted to the one-dimensional space spanned 0137 % by Q{1} is easy: it amounts to minimizing a univariate cubic. Indeed, 0138 % with eta = y*q where y is a scalar, we minimize (since g = ||g||q): 0139 % h(y) = <y*q, g> + .5 <y*q, H[y*q]> + (sigma/3) ||y*q||^3 0140 % = ||g||*y + .5*alpha*y^2 + (sigma/3) |y|^3. 0141 % The sign of y affects only the linear term, hence it is clear we need 0142 % to pick y nonpositive. In that case, h becomes a cubic polynomial: 0143 % h(y) = ||g||*y + .5*alpha*y^2 - (sigma/3) y^3 0144 % The derivative is a quadratic polynomial: 0145 % h'(y) = ||g|| + alpha*y - sigma*y^2. 0146 % Since ||g|| and sigma are positive, h' has two real roots, one 0147 % positive and one negative (strictly). The negative root is the only 0148 % root of interest. It necessarily identifies a minimizer since 0149 % h(0) = 0, h(-inf) = inf and h'(0) > 0. 0150 % 0151 % We take the real part only to be safe. 0152 y = min(real(roots([-sigma, alpha, gradnorm]))); 0153 0154 % Main Lanczos iteration 0155 gradnorm_reached = false; 0156 for j = 1 : min(options.maxinner, n) - 1 0157 0158 % Knowing that j Lanczos steps have been executed completely, now 0159 % execute the j+1st step to produce Q{j+1} and populate the 0160 % tridiagonal matrix T over the whole principal submatrix of size 0161 % j+1. This involves one Hessian call. 0162 % 0163 % In effect, we are computing this one step ahead. The reason is 0164 % that this makes it cheaper to compute the norm of the gradient of 0165 % the model, which is needed to check the stopping criterion (see 0166 % below). 0167 beta = rnorm(Hq_perp); 0168 % TODO: Figure out a sensible relative threshold 0169 if beta > 1e-12 0170 q = M.lincomb(x, 1/beta, Hq_perp); 0171 else 0172 % It appears the Krylov space maxed out (Hq is very nearly in 0173 % the space spanned by the existing Lanczos vectors). In order 0174 % to continue, the standard procedure is to generate a random 0175 % vector, and to orthogonalize it against the current basis. 0176 % This event is supposed to be rare. 0177 v = M.randvec(x); 0178 % Orthogonalize in the style of a modified Gram-Schmidt. 0179 for k = 1 : j 0180 v = M.lincomb(x, 1, v, -inner(v, Q{k}), Q{k}); 0181 end 0182 q = M.lincomb(x, 1/rnorm(v), v); 0183 end 0184 0185 q = tangent(q); 0186 0187 Hq = Hess(q); 0188 hesscalls = hesscalls + 1; 0189 Hqm = M.lincomb(x, 1, Hq, -beta, Q{j}); 0190 % In exact arithmetic, alpha = <Hq, q>. Doing the computations in 0191 % this order amounts to a modified GS, which may help numerically. 0192 alpha = inner(Hqm, q); 0193 Hq_perp = M.lincomb(x, 1, Hqm, -alpha, q); 0194 Q{j+1} = q; 0195 T(j, j+1) = beta; %#ok<SPRIX> 0196 T(j+1, j) = beta; %#ok<SPRIX> 0197 T(j+1, j+1) = alpha; %#ok<SPRIX> 0198 % End of the Lanczos procedure for step j. 0199 0200 % Computing the norm of the gradient of the model at the computed 0201 % step 'Qy' (linear combination of the Q's with coefficients y.) 0202 % We actually compute the norm of a vector of coordinates for the 0203 % gradient of the model in the basis Q{1}, ..., Q{j+1}. 0204 model_gradnorm = norm(gradnorm*eye(j+1, 1) + ... 0205 T(1:j+1, 1:j)*y + ... 0206 sigma*norm(y)*[y ; 0]); 0207 gradnorms(j) = model_gradnorm; 0208 func_values(j) = y(1)*gradnorm + 0.5 * dot(y, T(1:j, 1:j)*y) + (sigma/3) * norm(y)^3; 0209 0210 if options.verbosity >= 4 0211 fprintf('\nModel grad norm: %.16e, Iterate norm: %.16e', model_gradnorm, norm(y)); 0212 end 0213 0214 % Check termination condition 0215 if model_gradnorm <= options.theta*norm(y)^2 0216 stop_str = 'Model grad norm condition satisfied'; 0217 gradnorm_reached = true; 0218 break; 0219 end 0220 0221 % Minimize the cubic model restricted to the subspace spanned by 0222 % the available Lanczos vectors. In its current form, this solver 0223 % cannot reuse prior work from earlier Lanczos iterations: this may 0224 % be a point to improve. 0225 [y, newton_iter] = minimize_cubic_newton(T(1:j+1, 1:j+1), ... 0226 gradnorm*eye(j+1, 1), sigma, options); 0227 newton_iterations(j+1) = newton_iter; 0228 end 0229 0230 % Check why we stopped iterating 0231 points = numel(y); 0232 if ~gradnorm_reached 0233 stop_str = sprintf(['Reached max number of Lanczos iterations ' ... 0234 '(options.maxinner = %d)'], options.maxinner); 0235 points = points - 1; 0236 end 0237 0238 % Construct the tangent vector eta as a linear combination of the basis 0239 % vectors and make sure the result is tangent up to numerical accuracy. 0240 eta = lincomb(M, x, Q(1:numel(y)), y); 0241 eta = tangent(eta); 0242 % We could easily return the norm of eta as the norm of the coefficient 0243 % vector y here, but numerical errors might accumulate. 0244 0245 % In principle we could avoid this call by computing an appropriate 0246 % linear combination of available vectors. For now at least, we favor 0247 % this numerically safer approach. 0248 Heta = Hess(eta); 0249 hesscalls = hesscalls + 1; 0250 0251 stats = struct('newton_iterations', newton_iterations(1:points), ... 0252 'gradnorms', gradnorms(1:points), ... 0253 'func_values', func_values(1:points)); 0254 0255 if options.verbosity >= 4 0256 fprintf('\n'); 0257 end 0258 0259 end