Description of arc

 Adaptive regularization by cubics (ARC) minimization algorithm for Manopt

 function [x, cost, info, options] = arc(problem)
 function [x, cost, info, options] = arc(problem, x0)
 function [x, cost, info, options] = arc(problem, x0, options)
 function [x, cost, info, options] = arc(problem, [], options)

 Apply the ARC minimization algorithm to the problem defined in the
 problem structure, starting at x0 if it is provided (otherwise, at a
 random point on the manifold). To specify options whilst not specifying
 an initial guess, give x0 as [] (the empty matrix).

 In most of the examples bundled with the toolbox (see link below), the
 solver can be replaced by the present one as is.

 With the default subproblem solver (@arc_conjugate_gradient), tuning
 parameter options.theta properly appears important for performance.
 Users may want to try different values in the range 1e-3 to 1e3 for a
 particular application.

 The outputs x and cost are the last reached point on the manifold and its
 cost. The struct-array info contains information about the iterations:
   iter (integer)
       The (outer) iteration number, i.e., number of steps considered
       so far (whether accepted or rejected). The initial guess is 0.
   cost (double)
       The corresponding cost value.
   gradnorm (double)
       The (Riemannian) norm of the gradient.
   hessiancalls (integer)
       The number of Hessian calls issued by the subproblem solver to
       compute this iterate.
   time (double)
       The total elapsed time in seconds to reach the corresponding cost.
   rho (double)
       The regularized performance ratio for the iterate.
       See options.rho_regularization.
   rhonum, rhoden (double)
       Numerator and denominator of the performance ratio, before
       regularization.
   accepted (boolean)
       Whether the proposed iterate was accepted or not.
   stepsize (double)
       The (Riemannian) norm of the vector returned by the subproblem
       solver and which is retracted to obtain the proposed next iterate.
       If accepted = true for the corresponding iterate, this is the size
       of the step from the previous to the new iterate. If accepted is
       false, the step was not executed and this is the size of the
       rejected step.
   sigma (double)
       The cubic regularization parameter at the outer iteration.
   And possibly additional information logged by options.statsfun or by
   the subproblem solver.
 For example, type [info.gradnorm] to obtain a vector of the successive
 gradient norms reached and [info.time] to obtain a vector with the
 corresponding computation times to reach that point.

 The options structure is used to overwrite the default values. All
 options have a default value and are hence optional. To force an option
 value, pass an options structure with a field options.optionname, where
 optionname is one of the following. The default value is indicated
 between parentheses. The subproblem solver may also accept options.

   tolgradnorm (1e-6)
       The algorithm terminates if the norm of the gradient drops below this.
   maxiter (1000)
       The algorithm terminates if maxiter (outer) iterations have been executed.
   maxtime (Inf)
       The algorithm terminates if maxtime seconds elapsed.
   sigma_0 (100 / trust-regions default maximum radius)
       Initial regularization parameter.
   sigma_min (1e-7)
       Minimum regularization parameter.
   eta_1 (0.1)
       If rho is below this, the step is unsuccessful (rejected).
   eta_2 (0.9)
       If rho exceeds this, the step is very successful.
   gamma_1 (0.1)
       Shrinking factor for regularization parameter if very successful.
   gamma_2 (2)
       Expansion factor for regularization parameter if unsuccessful.
   subproblemsolver (@arc_conjugate_gradient)
       Function handle to a subproblem solver. The subproblem solver will
       also see this options structure, so that parameters can be passed
       to it through here as well. Built-in solvers included:
           arc_lanczos
           arc_conjugate_gradient
           arc_gradient_descent
   rho_regularization (1e3)
       See help for the same parameter in the trustregions solver.
   statsfun (none)
       Function handle to a function that will be called after each
       iteration to provide the opportunity to log additional statistics.
       They will be returned in the info struct. See the generic Manopt
       documentation about solvers for further information.
   stopfun (none)
       Function handle to a function that will be called at each iteration
       to provide the opportunity to specify additional stopping criteria.
       See the generic Manopt documentation about solvers for further
       information.
   verbosity (3)
       Integer number used to tune the amount of output the algorithm
       generates during execution (mostly as text in the command window).
       The higher, the more output. 0 means silent.
   storedepth (2)
       Maximum number of different points x of the manifold for which a
       store structure will be kept in memory in the storedb. If the
       caching features of Manopt are not used, this is irrelevant. As of
       version 5.0, this is not particularly important.


 Please cite the Manopt paper as well as the research paper:
 @article{agarwal2018arcfirst,
   author  = {Agarwal, N. and Boumal, N. and Bullins, B. and Cartis, C.},
   title   = {Adaptive regularization with cubics on manifolds},
   journal = {arXiv preprint arXiv:1806.00065},
   year    = {2018}
 }
 

 See also: trustregions conjugategradient manopt/examples arc_lanczos arc_conjugate_gradient

0001 function [x, cost, info, options] = arc(problem, x, options)
0002 % Adaptive regularization by cubics (ARC) minimization algorithm for Manopt
0003 %
0004 % function [x, cost, info, options] = arc(problem)
0005 % function [x, cost, info, options] = arc(problem, x0)
0006 % function [x, cost, info, options] = arc(problem, x0, options)
0007 % function [x, cost, info, options] = arc(problem, [], options)
0008 %
0009 % Apply the ARC minimization algorithm to the problem defined in the
0010 % problem structure, starting at x0 if it is provided (otherwise, at a
0011 % random point on the manifold). To specify options whilst not specifying
0012 % an initial guess, give x0 as [] (the empty matrix).
0013 %
0014 % In most of the examples bundled with the toolbox (see link below), the
0015 % solver can be replaced by the present one as is.
0016 %
0017 % With the default subproblem solver (@arc_conjugate_gradient), tuning
0018 % parameter options.theta properly appears important for performance.
0019 % Users may want to try different values in the range 1e-3 to 1e3 for a
0020 % particular application.
0021 %
0022 % The outputs x and cost are the last reached point on the manifold and its
0023 % cost. The struct-array info contains information about the iterations:
0024 %   iter (integer)
0025 %       The (outer) iteration number, i.e., number of steps considered
0026 %       so far (whether accepted or rejected). The initial guess is 0.
0027 %   cost (double)
0028 %       The corresponding cost value.
0029 %   gradnorm (double)
0030 %       The (Riemannian) norm of the gradient.
0031 %   hessiancalls (integer)
0032 %       The number of Hessian calls issued by the subproblem solver to
0033 %       compute this iterate.
0034 %   time (double)
0035 %       The total elapsed time in seconds to reach the corresponding cost.
0036 %   rho (double)
0037 %       The regularized performance ratio for the iterate.
0038 %       See options.rho_regularization.
0039 %   rhonum, rhoden (double)
0040 %       Numerator and denominator of the performance ratio, before
0041 %       regularization.
0042 %   accepted (boolean)
0043 %       Whether the proposed iterate was accepted or not.
0044 %   stepsize (double)
0045 %       The (Riemannian) norm of the vector returned by the subproblem
0046 %       solver and which is retracted to obtain the proposed next iterate.
0047 %       If accepted = true for the corresponding iterate, this is the size
0048 %       of the step from the previous to the new iterate. If accepted is
0049 %       false, the step was not executed and this is the size of the
0050 %       rejected step.
0051 %   sigma (double)
0052 %       The cubic regularization parameter at the outer iteration.
0053 %   And possibly additional information logged by options.statsfun or by
0054 %   the subproblem solver.
0055 % For example, type [info.gradnorm] to obtain a vector of the successive
0056 % gradient norms reached and [info.time] to obtain a vector with the
0057 % corresponding computation times to reach that point.
0058 %
0059 % The options structure is used to overwrite the default values. All
0060 % options have a default value and are hence optional. To force an option
0061 % value, pass an options structure with a field options.optionname, where
0062 % optionname is one of the following. The default value is indicated
0063 % between parentheses. The subproblem solver may also accept options.
0064 %
0065 %   tolgradnorm (1e-6)
0066 %       The algorithm terminates if the norm of the gradient drops below this.
0067 %   maxiter (1000)
0068 %       The algorithm terminates if maxiter (outer) iterations have been executed.
0069 %   maxtime (Inf)
0070 %       The algorithm terminates if maxtime seconds elapsed.
0071 %   sigma_0 (100 / trust-regions default maximum radius)
0072 %       Initial regularization parameter.
0073 %   sigma_min (1e-7)
0074 %       Minimum regularization parameter.
0075 %   eta_1 (0.1)
0076 %       If rho is below this, the step is unsuccessful (rejected).
0077 %   eta_2 (0.9)
0078 %       If rho exceeds this, the step is very successful.
0079 %   gamma_1 (0.1)
0080 %       Shrinking factor for regularization parameter if very successful.
0081 %   gamma_2 (2)
0082 %       Expansion factor for regularization parameter if unsuccessful.
0083 %   subproblemsolver (@arc_conjugate_gradient)
0084 %       Function handle to a subproblem solver. The subproblem solver will
0085 %       also see this options structure, so that parameters can be passed
0086 %       to it through here as well. Built-in solvers included:
0087 %           arc_lanczos
0088 %           arc_conjugate_gradient
0089 %           arc_gradient_descent
0090 %   rho_regularization (1e3)
0091 %       See help for the same parameter in the trustregions solver.
0092 %   statsfun (none)
0093 %       Function handle to a function that will be called after each
0094 %       iteration to provide the opportunity to log additional statistics.
0095 %       They will be returned in the info struct. See the generic Manopt
0096 %       documentation about solvers for further information.
0097 %   stopfun (none)
0098 %       Function handle to a function that will be called at each iteration
0099 %       to provide the opportunity to specify additional stopping criteria.
0100 %       See the generic Manopt documentation about solvers for further
0101 %       information.
0102 %   verbosity (3)
0103 %       Integer number used to tune the amount of output the algorithm
0104 %       generates during execution (mostly as text in the command window).
0105 %       The higher, the more output. 0 means silent.
0106 %   storedepth (2)
0107 %       Maximum number of different points x of the manifold for which a
0108 %       store structure will be kept in memory in the storedb. If the
0109 %       caching features of Manopt are not used, this is irrelevant. As of
0110 %       version 5.0, this is not particularly important.
0111 %
0112 %
0113 % Please cite the Manopt paper as well as the research paper:
0114 % @article{agarwal2018arcfirst,
0115 %   author  = {Agarwal, N. and Boumal, N. and Bullins, B. and Cartis, C.},
0116 %   title   = {Adaptive regularization with cubics on manifolds},
0117 %   journal = {arXiv preprint arXiv:1806.00065},
0118 %   year    = {2018}
0119 % }
0120 %
0121 %
0122 % See also: trustregions conjugategradient manopt/examples arc_lanczos arc_conjugate_gradient
0123 
0124 % This file is part of Manopt: www.manopt.org.
0125 % Original authors: May 1, 2018,
0126 %    Naman Agarwal, Brian Bullins, Nicolas Boumal and Coralia Cartis.
0127 % Contributors:
0128 % Change log:
0129 %
0130 %   Aug 14, 2019 (NB):
0131 %       Default subproblem solver for ARC is now arc_conjugate_gradient
0132 %       instead of arc_lanczos. Default gamma_2 changed to 2 from 5.
0133 
0134     M = problem.M;
0135     
0136     % Verify that the problem description is sufficient for the solver.
0137     if ~canGetCost(problem)
0138         warning('manopt:getCost', ...
0139                 'No cost provided. The algorithm will likely abort.');
0140     end
0141     if ~canGetGradient(problem) && ~canGetApproxGradient(problem)
0142         % Note: we do not give a warning if an approximate gradient is
0143         % explicitly given in the problem description, as in that case the
0144         % user seems to be aware of the issue.
0145         warning('manopt:getGradient:approx', ['No gradient provided. ' ...
0146                 'Using an FD approximation instead (slow).\n' ...
0147                 'It may be necessary to increase options.tolgradnorm.\n'...
0148                 'To disable this warning: ' ...
0149                 'warning(''off'', ''manopt:getGradient:approx'')']);
0150         problem.approxgrad = approxgradientFD(problem);
0151     end
0152     if ~canGetHessian(problem) && ~canGetApproxHessian(problem)
0153         % Note: we do not give a warning if an approximate Hessian is
0154         % explicitly given in the problem description, as in that case the
0155         % user seems to be aware of the issue.
0156         warning('manopt:getHessian:approx', ['No Hessian provided. ' ...
0157                 'Using an FD approximation instead.\n' ...
0158                 'To disable this warning: ' ...
0159                 'warning(''off'', ''manopt:getHessian:approx'')']);
0160         problem.approxhess = approxhessianFD(problem);
0161     end
0162 
0163     % Set local defaults here
0164     localdefaults.tolgradnorm = 1e-6;
0165     localdefaults.maxiter = 1000;
0166     localdefaults.maxtime = inf;
0167     localdefaults.sigma_min = 1e-7;
0168     localdefaults.eta_1 = 0.1;
0169     localdefaults.eta_2 = 0.9;
0170     localdefaults.gamma_1 = 0.1;
0171     localdefaults.gamma_2 = 2;
0172     localdefaults.storedepth = 2;
0173     localdefaults.subproblemsolver = @arc_conjugate_gradient;
0174     localdefaults.rho_regularization = 1e3;
0175     
0176     % Merge global and local defaults, then merge w/ user options, if any.
0177     localdefaults = mergeOptions(getGlobalDefaults(), localdefaults);
0178     if ~exist('options', 'var') || isempty(options)
0179         options = struct();
0180     end
0181     options = mergeOptions(localdefaults, options);
0182     
0183     % Default initial sigma_0 is based on the initial Delta_bar of the
0184     % trustregions solver.
0185     if ~isfield(options, 'sigma_0')
0186         if isfield(M, 'typicaldist')
0187             options.sigma_0 = 100/M.typicaldist();
0188         else
0189             options.sigma_0 = 100/sqrt(M.dim());
0190         end 
0191     end
0192     
0193     
0194     timetic = tic();
0195     
0196     % If no initial point x is given by the user, generate one at random.
0197     if ~exist('x', 'var') || isempty(x)
0198         x = M.rand();
0199     end
0200 
0201     % Create a store database and get a key for the current x.
0202     storedb = StoreDB(options.storedepth);
0203     key = storedb.getNewKey();
0204 
0205     % Compute objective-related quantities for x.
0206     [cost, grad] = getCostGrad(problem, x, storedb, key);
0207     gradnorm = M.norm(x, grad);
0208     
0209     % Initialize regularization parameter.
0210     sigma = options.sigma_0;
0211 
0212     % Iteration counter.
0213     % At any point, iter is the number of fully executed iterations so far.
0214     iter = 0;
0215     
0216     % Save stats in a struct array info, and preallocate.
0217     stats = savestats(problem, x, storedb, key, options);
0218     info(1) = stats;
0219     info(min(10000, options.maxiter+1)).iter = [];
0220     
0221     if options.verbosity >= 2
0222         fprintf(' iter\t\t\t\t\tcost val\t\t grad norm       sigma   #Hess\n');
0223     end
0224     
0225     % Iterate until stopping criterion triggers.
0226     while true
0227 
0228         % Display iteration information.
0229         if options.verbosity >= 2
0230             fprintf('%5d\t %+.16e\t %.8e   %.2e', ...
0231                     iter, cost, gradnorm, sigma);
0232         end
0233         
0234         % Start timing this iteration.
0235         timetic = tic();
0236         
0237         % Run standard stopping criterion checks.
0238         [stop, reason] = stoppingcriterion(problem, x, options, ...
0239                                                              info, iter+1);
0240         
0241         if stop
0242             if options.verbosity >= 1
0243                 fprintf(['\n' reason '\n']);
0244             end
0245             break;
0246         end
0247 
0248         % Solve the ARC subproblem.
0249         % Outputs: eta is the tentative step (it is a tangent vector at x);
0250         % Heta is the result of applying the Hessian at x along eta (this
0251         % is often a natural by-product of the subproblem solver);
0252         % hesscalls is the number of Hessian calls issued in the solver;
0253         % stop_str is a string describing why the solver terminated; and
0254         % substats is some statistics about the solver's work to be logged.
0255         [eta, Heta, hesscalls, stop_str, substats] = ...
0256              options.subproblemsolver(problem, x, grad, gradnorm, ...
0257                                              sigma, options, storedb, key);
0258         
0259         etanorm = M.norm(x, eta);
0260 
0261         % Get a tentative next x by retracting the proposed step.
0262         newx = M.retr(x, eta);
0263         newkey = storedb.getNewKey();
0264 
0265         % Compute the new cost-related quantities for proposal x.
0266         % We could just compute the cost here, as the gradient is only
0267         % necessary if the step is accepted; but we expect most steps are
0268         % accepted, and sometimes the gradient can be computed faster if it
0269         % is computed in conjunction with the cost.
0270         [newcost, newgrad] = getCostGrad(problem, newx, storedb, newkey);
0271 
0272         % Compute a regularized ratio between actual and model improvement.
0273         rho_num = cost - newcost;
0274         vec_rho = M.lincomb(x, 1, grad, .5, Heta);
0275         rho_den = -M.inner(x, eta, vec_rho);
0276         rho_reg = options.rho_regularization*eps*max(1, abs(cost));
0277         rho = (rho_num+rho_reg) / (rho_den+rho_reg);
0278         
0279         % In principle, the subproblem solver should guarantee rho_den > 0.
0280         % In practice, it may fail, in which case we reject the step.
0281         subproblem_failure = (rho_den+rho_reg <= 0);
0282         if subproblem_failure
0283             stop_str = sprintf( ...
0284                  'SUBPROBLEM FAILURE! (Though it returned: %s)', stop_str);
0285         end
0286         
0287         % Determine if the tentative step should be accepted or not.
0288         if rho >= options.eta_1 && ~subproblem_failure
0289             accept = true;
0290             arc_str = 'acc ';
0291             % We accepted this step: erase cache of the previous point.
0292             storedb.removefirstifdifferent(key, newkey);
0293             x = newx;
0294             key = newkey;
0295             cost = newcost;
0296             grad = newgrad;
0297             gradnorm = M.norm(x, grad);
0298         else
0299             accept = false;
0300             arc_str = 'REJ ';
0301             % We rejected this step: erase cache of the tentative point.
0302             storedb.removefirstifdifferent(newkey, key);
0303         end
0304         
0305         % Update the regularization parameter.
0306         if rho >= options.eta_2 && ~subproblem_failure
0307             % Very successful step
0308             arc_str(4) = '-';
0309             if options.gamma_1 > 0
0310                 sigma = max(options.sigma_min, options.gamma_1 * sigma);
0311             else
0312                 sigma = max(options.sigma_min, min(sigma, gradnorm)); % TODO document this
0313             end
0314         elseif rho >= options.eta_1 && ~subproblem_failure
0315             % Successful step
0316             arc_str(4) = ' ';
0317         else
0318             % Unsuccessful step
0319             arc_str(4) = '+';
0320             sigma = options.gamma_2 * sigma;
0321         end
0322 
0323         % iter is the number of iterations we have completed.
0324         iter = iter + 1;
0325 
0326         % Make sure we don't use too much memory for the store database.
0327         storedb.purge();
0328         
0329         % Log statistics for freshly executed iteration.
0330         stats = savestats(problem, x, storedb, key, options);
0331         info(iter+1) = stats;
0332 
0333         if options.verbosity >= 2
0334             fprintf('   %5d  %s\n', hesscalls, [arc_str ' ' stop_str]);
0335         end
0336         
0337         % When the subproblem solver fails, it would be nice to have an
0338         % alternative, such as a slower but more robust solver. For now, we
0339         % force the solver to terminate when that happens.
0340         if subproblem_failure
0341             if options.verbosity >= 1
0342                 fprintf(['\nThe subproblem solver failed to make ' ...
0343                          'progress even on the model; this is ' ...
0344                          'likely due to numerical errors.\n']);
0345             end
0346             break;
0347         end
0348         
0349     end
0350     
0351     % Truncate the struct-array to the part we populated
0352     info = info(1:iter+1);
0353 
0354     if options.verbosity >= 1
0355         fprintf('Total time is %f [s] (excludes statsfun)\n', info(end).time);
0356     end
0357     
0358     
0359     % Routine in charge of collecting the current iteration statistics
0360     function stats = savestats(problem, x, storedb, key, options)
0361         
0362         stats.iter = iter;
0363         stats.cost = cost;
0364         stats.gradnorm = gradnorm;
0365         stats.sigma = sigma;
0366         
0367         if iter == 0
0368             stats.hessiancalls = 0;
0369             stats.stepsize = NaN;
0370             stats.time = toc(timetic);
0371             stats.rho = inf;
0372             stats.rhonum = NaN;
0373             stats.rhoden = NaN;
0374             stats.accepted = true;
0375             stats.subproblem = struct();
0376         else
0377             stats.hessiancalls = hesscalls;
0378             stats.stepsize = etanorm;
0379             stats.time = info(iter).time + toc(timetic);
0380             stats.rho = rho;
0381             stats.rhonum = rho_num;
0382             stats.rhoden = rho_den;
0383             stats.accepted = accept;
0384             stats.subproblem = substats;
0385         end
0386         
0387         % Similar to statsfun with trustregions: the x and store passed to
0388         % statsfun are that of the most recently accepted point after the
0389         % iteration fully executed.
0390         stats = applyStatsfun(problem, x, storedb, key, options, stats);
0391         
0392     end
0393     
0394 end

arc

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SUBFUNCTIONS

SOURCE CODE