Home > manopt > solvers > arc > arc.m

arc

PURPOSE ^

Adaptive regularization by cubics (ARC) minimization algorithm for Manopt

SYNOPSIS ^

function [x, cost, info, options] = arc(problem, x, options)

DESCRIPTION ^

 Adaptive regularization by cubics (ARC) minimization algorithm for Manopt

 function [x, cost, info, options] = arc(problem)
 function [x, cost, info, options] = arc(problem, x0)
 function [x, cost, info, options] = arc(problem, x0, options)
 function [x, cost, info, options] = arc(problem, [], options)

 Apply the ARC minimization algorithm to the problem defined in the
 problem structure, starting at x0 if it is provided (otherwise, at a
 random point on the manifold). To specify options whilst not specifying
 an initial guess, give x0 as [] (the empty matrix).

 In most of the examples bundled with the toolbox (see link below), the
 solver can be replaced by the present one as is.

 The outputs x and cost are the last reached point on the manifold and its
 cost. The struct-array info contains information about the iterations:
   iter (integer)
       The (outer) iteration number, i.e., number of steps considered
       so far (whether accepted or rejected). The initial guess is 0.
    cost (double)
       The corresponding cost value.
    gradnorm (double)
       The (Riemannian) norm of the gradient.
    hessiancalls (integer)
       The number of Hessian calls issued by the subproblem solver to
       compute this iterate.
    time (double)
       The total elapsed time in seconds to reach the corresponding cost.
    rho (double)
       The regularized performance ratio for the iterate.
       See options.rho_regularization.
    rhonum, rhoden (double)
       Numerator and denominator of the performance ratio, before
       regularization.
    accepted (boolean)
       Whether the proposed iterate was accepted or not.
    stepsize (double)
       The (Riemannian) norm of the vector returned by the subproblem
       solver and which is retracted to obtain the proposed next iterate.
       If accepted = true for the corresponding iterate, this is the size
       of the step from the previous to the new iterate. If accepted is
       false, the step was not executed and this is the size of the
       rejected step.
    sigma (double)
       The cubic regularization parameter at the outer iteration.
   And possibly additional information logged by options.statsfun or by
   the subproblem solver.
 For example, type [info.gradnorm] to obtain a vector of the successive
 gradient norms reached and [info.time] to obtain a vector with the
 corresponding computation times to reach that point.

 The options structure is used to overwrite the default values. All
 options have a default value and are hence optional. To force an option
 value, pass an options structure with a field options.optionname, where
 optionname is one of the following. The default value is indicated
 between parentheses. The subproblem solver may also accept options.

   tolgradnorm (1e-6)
       The algorithm terminates if the norm of the gradient drops below this.
   maxiter (1000)
       The algorithm terminates if maxiter (outer) iterations have been executed.
   maxtime (Inf)
       The algorithm terminates if maxtime seconds elapsed.
   sigma_0 (100 / trust-regions default maximum radius)
       Initial regularization parameter.
   sigma_min (1e-7)
       Minimum regularization parameter.
   eta_1 (0.1)
       If rho is below this, the step is unsuccessful (rejected).
   eta_2 (0.9)
       If rho exceeds this, the step is very successful.
   gamma_1 (0.1)
       Shrinking factor for regularization parameter if very successful.
   gamma_2 (1.5)
       Expansion factor for regularization parameter if unsuccessful.
   subproblemsolver (@arc_lanczos)
       Function handle to a subproblem solver. The subproblem solver will
       also see this options structure, so that parameters can be passed
       to it through here as well.
   rho_regularization (1e3)
       See help for the same parameter in the trustregions solver.
   statsfun (none)
       Function handle to a function that will be called after each
       iteration to provide the opportunity to log additional statistics.
       They will be returned in the info struct. See the generic Manopt
       documentation about solvers for further information.
   stopfun (none)
       Function handle to a function that will be called at each iteration
       to provide the opportunity to specify additional stopping criteria.
       See the generic Manopt documentation about solvers for further
       information.
   verbosity (3)
       Integer number used to tune the amount of output the algorithm
       generates during execution (mostly as text in the command window).
       The higher, the more output. 0 means silent.
   storedepth (2)
       Maximum number of different points x of the manifold for which a
       store structure will be kept in memory in the storedb. If the
       caching features of Manopt are not used, this is irrelevant. As of
       version 5.0, this is not particularly important.


 Please cite the Manopt paper as well as the research paper:
 @article{agarwal2018arcfirst,
   author  = {Agarwal, N. and Boumal, N. and Bullins, B. and Cartis, C.},
   title   = {Adaptive regularization with cubics on manifolds with a first-order analysis},
   journal = {arXiv preprint arXiv:1806.00065},
   year    = {2018}
 }
 

 See also: trustregions conjugategradient manopt/examples arc_lanczos

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [x, cost, info, options] = arc(problem, x, options)
0002 % Adaptive regularization by cubics (ARC) minimization algorithm for Manopt
0003 %
0004 % function [x, cost, info, options] = arc(problem)
0005 % function [x, cost, info, options] = arc(problem, x0)
0006 % function [x, cost, info, options] = arc(problem, x0, options)
0007 % function [x, cost, info, options] = arc(problem, [], options)
0008 %
0009 % Apply the ARC minimization algorithm to the problem defined in the
0010 % problem structure, starting at x0 if it is provided (otherwise, at a
0011 % random point on the manifold). To specify options whilst not specifying
0012 % an initial guess, give x0 as [] (the empty matrix).
0013 %
0014 % In most of the examples bundled with the toolbox (see link below), the
0015 % solver can be replaced by the present one as is.
0016 %
0017 % The outputs x and cost are the last reached point on the manifold and its
0018 % cost. The struct-array info contains information about the iterations:
0019 %   iter (integer)
0020 %       The (outer) iteration number, i.e., number of steps considered
0021 %       so far (whether accepted or rejected). The initial guess is 0.
0022 %    cost (double)
0023 %       The corresponding cost value.
0024 %    gradnorm (double)
0025 %       The (Riemannian) norm of the gradient.
0026 %    hessiancalls (integer)
0027 %       The number of Hessian calls issued by the subproblem solver to
0028 %       compute this iterate.
0029 %    time (double)
0030 %       The total elapsed time in seconds to reach the corresponding cost.
0031 %    rho (double)
0032 %       The regularized performance ratio for the iterate.
0033 %       See options.rho_regularization.
0034 %    rhonum, rhoden (double)
0035 %       Numerator and denominator of the performance ratio, before
0036 %       regularization.
0037 %    accepted (boolean)
0038 %       Whether the proposed iterate was accepted or not.
0039 %    stepsize (double)
0040 %       The (Riemannian) norm of the vector returned by the subproblem
0041 %       solver and which is retracted to obtain the proposed next iterate.
0042 %       If accepted = true for the corresponding iterate, this is the size
0043 %       of the step from the previous to the new iterate. If accepted is
0044 %       false, the step was not executed and this is the size of the
0045 %       rejected step.
0046 %    sigma (double)
0047 %       The cubic regularization parameter at the outer iteration.
0048 %   And possibly additional information logged by options.statsfun or by
0049 %   the subproblem solver.
0050 % For example, type [info.gradnorm] to obtain a vector of the successive
0051 % gradient norms reached and [info.time] to obtain a vector with the
0052 % corresponding computation times to reach that point.
0053 %
0054 % The options structure is used to overwrite the default values. All
0055 % options have a default value and are hence optional. To force an option
0056 % value, pass an options structure with a field options.optionname, where
0057 % optionname is one of the following. The default value is indicated
0058 % between parentheses. The subproblem solver may also accept options.
0059 %
0060 %   tolgradnorm (1e-6)
0061 %       The algorithm terminates if the norm of the gradient drops below this.
0062 %   maxiter (1000)
0063 %       The algorithm terminates if maxiter (outer) iterations have been executed.
0064 %   maxtime (Inf)
0065 %       The algorithm terminates if maxtime seconds elapsed.
0066 %   sigma_0 (100 / trust-regions default maximum radius)
0067 %       Initial regularization parameter.
0068 %   sigma_min (1e-7)
0069 %       Minimum regularization parameter.
0070 %   eta_1 (0.1)
0071 %       If rho is below this, the step is unsuccessful (rejected).
0072 %   eta_2 (0.9)
0073 %       If rho exceeds this, the step is very successful.
0074 %   gamma_1 (0.1)
0075 %       Shrinking factor for regularization parameter if very successful.
0076 %   gamma_2 (1.5)
0077 %       Expansion factor for regularization parameter if unsuccessful.
0078 %   subproblemsolver (@arc_lanczos)
0079 %       Function handle to a subproblem solver. The subproblem solver will
0080 %       also see this options structure, so that parameters can be passed
0081 %       to it through here as well.
0082 %   rho_regularization (1e3)
0083 %       See help for the same parameter in the trustregions solver.
0084 %   statsfun (none)
0085 %       Function handle to a function that will be called after each
0086 %       iteration to provide the opportunity to log additional statistics.
0087 %       They will be returned in the info struct. See the generic Manopt
0088 %       documentation about solvers for further information.
0089 %   stopfun (none)
0090 %       Function handle to a function that will be called at each iteration
0091 %       to provide the opportunity to specify additional stopping criteria.
0092 %       See the generic Manopt documentation about solvers for further
0093 %       information.
0094 %   verbosity (3)
0095 %       Integer number used to tune the amount of output the algorithm
0096 %       generates during execution (mostly as text in the command window).
0097 %       The higher, the more output. 0 means silent.
0098 %   storedepth (2)
0099 %       Maximum number of different points x of the manifold for which a
0100 %       store structure will be kept in memory in the storedb. If the
0101 %       caching features of Manopt are not used, this is irrelevant. As of
0102 %       version 5.0, this is not particularly important.
0103 %
0104 %
0105 % Please cite the Manopt paper as well as the research paper:
0106 % @article{agarwal2018arcfirst,
0107 %   author  = {Agarwal, N. and Boumal, N. and Bullins, B. and Cartis, C.},
0108 %   title   = {Adaptive regularization with cubics on manifolds with a first-order analysis},
0109 %   journal = {arXiv preprint arXiv:1806.00065},
0110 %   year    = {2018}
0111 % }
0112 %
0113 %
0114 % See also: trustregions conjugategradient manopt/examples arc_lanczos
0115 
0116 % This file is part of Manopt: www.manopt.org.
0117 % Original authors: May 1, 2018,
0118 %    Naman Agarwal, Brian Bullins, Nicolas Boumal and Coralia Cartis.
0119 % Contributors:
0120 % Change log:
0121 
0122     M = problem.M;
0123     
0124     % Verify that the problem description is sufficient for the solver.
0125     if ~canGetCost(problem)
0126         warning('manopt:getCost', ...
0127                 'No cost provided. The algorithm will likely abort.');
0128     end
0129     if ~canGetGradient(problem) && ~canGetApproxGradient(problem)
0130         % Note: we do not give a warning if an approximate gradient is
0131         % explicitly given in the problem description, as in that case the
0132         % user seems to be aware of the issue.
0133         warning('manopt:getGradient:approx', ['No gradient provided. ' ...
0134                 'Using an FD approximation instead (slow).\n' ...
0135                 'It may be necessary to increase options.tolgradnorm.\n'...
0136                 'To disable this warning: ' ...
0137                 'warning(''off'', ''manopt:getGradient:approx'')']);
0138         problem.approxgrad = approxgradientFD(problem);
0139     end
0140     if ~canGetHessian(problem) && ~canGetApproxHessian(problem)
0141         % Note: we do not give a warning if an approximate Hessian is
0142         % explicitly given in the problem description, as in that case the
0143         % user seems to be aware of the issue.
0144         warning('manopt:getHessian:approx', ['No Hessian provided. ' ...
0145                 'Using an FD approximation instead.\n' ...
0146                 'To disable this warning: ' ...
0147                 'warning(''off'', ''manopt:getHessian:approx'')']);
0148         problem.approxhess = approxhessianFD(problem);
0149     end
0150 
0151     % Set local defaults here
0152     localdefaults.tolgradnorm = 1e-6;
0153     localdefaults.maxiter = 1000;
0154     localdefaults.maxtime = inf;
0155     localdefaults.sigma_min = 1e-7;
0156     localdefaults.eta_1 = 0.1;
0157     localdefaults.eta_2 = 0.9;
0158     localdefaults.gamma_1 = 0.1;
0159     localdefaults.gamma_2 = 1.5;
0160     localdefaults.storedepth = 2;
0161     localdefaults.subproblemsolver = @arc_lanczos;
0162     localdefaults.rho_regularization = 1e3;
0163     
0164     % Merge global and local defaults, then merge w/ user options, if any.
0165     localdefaults = mergeOptions(getGlobalDefaults(), localdefaults);
0166     if ~exist('options', 'var') || isempty(options)
0167         options = struct();
0168     end
0169     options = mergeOptions(localdefaults, options);
0170     
0171     % Default initial sigma_0 is based on the initial Delta_bar of the
0172     % trustregions solver.
0173     if ~isfield(options, 'sigma_0')
0174         if isfield(M, 'typicaldist')
0175             options.sigma_0 = 100/M.typicaldist();
0176         else
0177             options.sigma_0 = 100/sqrt(M.dim());
0178         end 
0179     end
0180     
0181     
0182     timetic = tic();
0183     
0184     % If no initial point x is given by the user, generate one at random.
0185     if ~exist('x', 'var') || isempty(x)
0186         x = M.rand();
0187     end
0188 
0189     % Create a store database and get a key for the current x.
0190     storedb = StoreDB(options.storedepth);
0191     key = storedb.getNewKey();
0192 
0193     % Compute objective-related quantities for x.
0194     [cost, grad] = getCostGrad(problem, x, storedb, key);
0195     gradnorm = M.norm(x, grad);
0196     
0197     % Initialize regularization parameter.
0198     sigma = options.sigma_0;
0199 
0200     % Iteration counter.
0201     % At any point, iter is the number of fully executed iterations so far.
0202     iter = 0;
0203     
0204     % Save stats in a struct array info, and preallocate.
0205     stats = savestats(problem, x, storedb, key, options);
0206     info(1) = stats;
0207     info(min(10000, options.maxiter+1)).iter = [];
0208     
0209     if options.verbosity >= 2
0210         fprintf(' iter\t\t\t\t\tcost val\t\t grad norm       sigma   #Hess\n');
0211     end
0212     
0213     % Iterate until stopping criterion triggers.
0214     while true
0215 
0216         % Display iteration information.
0217         if options.verbosity >= 2
0218             fprintf('%5d\t %+.16e\t %.8e   %.2e', ...
0219                     iter, cost, gradnorm, sigma);
0220         end
0221         
0222         % Start timing this iteration.
0223         timetic = tic();
0224         
0225         % Run standard stopping criterion checks.
0226         [stop, reason] = stoppingcriterion(problem, x, options, ...
0227                                                              info, iter+1);
0228         
0229         if stop
0230             if options.verbosity >= 1
0231                 fprintf(['\n' reason '\n']);
0232             end
0233             break;
0234         end
0235 
0236         % Solve the ARC subproblem.
0237         % Outputs: eta is the tentative step (it is a tangent vector at x);
0238         % Heta is the result of applying the Hessian at x along eta (this
0239         % is often a natural by-product of the subproblem solver);
0240         % hesscalls is the number of Hessian calls issued in the solver;
0241         % stop_str is a string describing why the solver terminated; and
0242         % substats is some statistics about the solver's work to be logged.
0243         [eta, Heta, hesscalls, stop_str, substats] = ...
0244              options.subproblemsolver(problem, x, grad, gradnorm, ...
0245                                              sigma, options, storedb, key);
0246         
0247         etanorm = M.norm(x, eta);
0248 
0249         % Get a tentative next x by retracting the proposed step.
0250         newx = M.retr(x, eta);
0251         newkey = storedb.getNewKey();
0252 
0253         % Compute the new cost-related quantities for proposal x.
0254         % We could just compute the cost here, as the gradient is only
0255         % necessary if the step is accepted; but we expect most steps are
0256         % accepted, and sometimes the gradient can be computed faster if it
0257         % is computed in conjunction with the cost.
0258         [newcost, newgrad] = getCostGrad(problem, newx, storedb, newkey);
0259 
0260         % Compute a regularized ratio between actual and model improvement.
0261         rho_num = cost - newcost;
0262         vec_rho = M.lincomb(x, 1, grad, .5, Heta);
0263         rho_den = -M.inner(x, eta, vec_rho);
0264         rho_reg = options.rho_regularization*eps*max(1, abs(cost));
0265         rho = (rho_num+rho_reg) / (rho_den+rho_reg);
0266         
0267         % Determine if the tentative step should be accepted or not.
0268         if rho >= options.eta_1
0269             accept = true;
0270             arc_str = 'acc ';
0271             % We accepted this step: erase cache of the previous point.
0272             storedb.removefirstifdifferent(key, newkey);
0273             x = newx;
0274             key = newkey;
0275             cost = newcost;
0276             grad = newgrad;
0277             gradnorm = M.norm(x, grad);
0278         else
0279             accept = false;
0280             arc_str = 'REJ ';
0281             % We rejected this step: erase cache of the tentative point.
0282             storedb.removefirstifdifferent(newkey, key);
0283         end
0284         
0285         % Update the regularization parameter.
0286         if rho >= options.eta_2
0287             % Very successful step
0288             arc_str(4) = '-';
0289             sigma = max(options.sigma_min, options.gamma_1 * sigma);
0290         elseif rho >= options.eta_1
0291             % Successful step
0292             arc_str(4) = ' ';
0293         else
0294             % Unsuccessful step
0295             arc_str(4) = '+';
0296             sigma = options.gamma_2 * sigma;
0297         end
0298 
0299         % iter is the number of iterations we have completed.
0300         iter = iter + 1;
0301 
0302         % Make sure we don't use too much memory for the store database.
0303         storedb.purge();
0304         
0305         % Log statistics for freshly executed iteration.
0306         stats = savestats(problem, x, storedb, key, options);
0307         info(iter+1) = stats;
0308 
0309         if options.verbosity >= 2
0310             fprintf('   %5d  %s\n', hesscalls, [arc_str ' ' stop_str]);
0311         end
0312         
0313     end
0314     
0315     % Truncate the struct-array to the part we populated
0316     info = info(1:iter+1);
0317 
0318     if options.verbosity >= 1
0319         fprintf('Total time is %f [s] (excludes statsfun)\n', info(end).time);
0320     end
0321     
0322     
0323     % Routine in charge of collecting the current iteration statistics
0324     function stats = savestats(problem, x, storedb, key, options)
0325         
0326         stats.iter = iter;
0327         stats.cost = cost;
0328         stats.gradnorm = gradnorm;
0329         stats.sigma = sigma;
0330         
0331         if iter == 0
0332             stats.hessiancalls = 0;
0333             stats.stepsize = NaN;
0334             stats.time = toc(timetic);
0335             stats.rho = inf;
0336             stats.rhonum = NaN;
0337             stats.rhoden = NaN;
0338             stats.accepted = true;
0339             stats.subproblem = struct();
0340         else
0341             stats.hessiancalls = hesscalls;
0342             stats.stepsize = etanorm;
0343             stats.time = info(iter).time + toc(timetic);
0344             stats.rho = rho;
0345             stats.rhonum = rho_num;
0346             stats.rhoden = rho_den;
0347             stats.accepted = accept;
0348             stats.subproblem = substats;
0349         end
0350         
0351         % Similar to statsfun with trustregions: the x and store passed to
0352         % statsfun are that of the most recently accepted point after the
0353         % iteration fully executed.
0354         stats = applyStatsfun(problem, x, storedb, key, options, stats);
0355         
0356     end
0357     
0358 end

Generated on Mon 10-Sep-2018 11:48:06 by m2html © 2005