Description of conjugategradient

 Conjugate gradient minimization algorithm for Manopt.

 function [x, cost, info, options] = conjugategradient(problem)
 function [x, cost, info, options] = conjugategradient(problem, x0)
 function [x, cost, info, options] = conjugategradient(problem, x0, options)
 function [x, cost, info, options] = conjugategradient(problem, [], options)

 Apply the conjugate gradient minimization algorithm to the problem
 defined in the problem structure, starting at x0 if it is provided
 (otherwise, at a random point on the manifold). To specify options whilst
 not specifying an initial guess, give x0 as [] (the empty matrix).

 The outputs x and cost are the best reached point on the manifold and its
 cost. The struct-array info contains information about the iterations:
   iter : the iteration number (0 for the initial guess)
   cost : cost value
   time : elapsed time in seconds
   gradnorm : Riemannian norm of the gradient
   stepsize : norm of the last tangent vector retracted
   beta : value of the beta parameter (see options.beta_type)
   linesearch : information logged by options.linesearch
   And possibly additional information logged by options.statsfun.
 For example, type [info.gradnorm] to obtain a vector of the successive
 gradient norms reached.

 The options structure is used to overwrite the default values. All
 options have a default value and are hence optional. To force an option
 value, pass an options structure with a field options.optionname, where
 optionname is one of the following and the default value is indicated
 between parentheses:

   tolgradnorm (1e-6)
       The algorithm terminates if the norm of the gradient drops below this.
   maxiter (1000)
       The algorithm terminates if maxiter iterations have been executed.
   maxtime (Inf)
       The algorithm terminates if maxtime seconds elapsed.
   minstepsize (1e-10)
       The algorithm terminates if the linesearch returns a displacement
       vector (to be retracted) smaller in norm than this value.
   beta_type ('H-S')
       Conjugate gradient beta rule used to construct the new search
       direction, based on a linear combination of the previous search
       direction and the new (preconditioned) gradient. Possible values
       for this parameter are:
           'S-D', 'steep' for beta = 0 (preconditioned steepest descent)
           'F-R' for Fletcher-Reeves's rule
           'P-R' for Polak-Ribiere's modified rule
           'H-S' for Hestenes-Stiefel's modified rule
           'H-Z' for Hager-Zhang's modified rule
           'L-S' for Sato's Liu-Storey rule
       See Hager and Zhang 2006, "A survey of nonlinear conjugate gradient
       methods" for a description of these rules in the Euclidean case and
       for an explanation of how to adapt them to the preconditioned case.
       The adaption to the Riemannian case is straightforward: see in code
       for details. Modified rules take the max between 0 and the computed
       beta value, which provides automatic restart, except for H-Z and L-S 
       which use a different modification. Sato's Liu-Storey rule is 
       described in Sato 2021, "Riemannian conjugate gradient methods: 
       General framework and specific algorithms with convergence analyses"
   orth_value (Inf)
       Following Powell's restart strategy (Math. prog. 1977), restart CG
       (that is, make a -preconditioned- gradient step) if two successive
       -preconditioned- gradients are "too" parallel. See for example
       Hager and Zhang 2006, "A survey of nonlinear conjugate gradient
       methods", page 12. An infinite value disables this strategy. See in
       code formula for the specific criterion used.
   linesearch (@linesearch_adaptive or @linesearch_hint)
       Function handle to a line search function. The options structure is
       passed to the line search too, so you can pass it parameters. See
       each line search's documentation for info. Another available line
       search in manopt is @linesearch, in /manopt/linesearch/linesearch.m
       If the problem structure includes a line search hint, then the
       default line search used is @linesearch_hint.
   statsfun (none)
       Function handle to a function that will be called after each
       iteration to provide the opportunity to log additional statistics.
       They will be returned in the info struct. See the generic Manopt
       documentation about solvers for further information.
   stopfun (none)
       Function handle to a function that will be called at each iteration
       to provide the opportunity to specify additional stopping criteria.
       See the generic Manopt documentation about solvers for further
       information.
   verbosity (3)
       Integer number used to tune the amount of output the algorithm
       generates during execution (mostly as text in the command window).
       The higher, the more output. 0 means silent.
   storedepth (2)
       Maximum number of different points x of the manifold for which a
       store structure will be kept in memory in the storedb. If the
       caching features of Manopt are not used, this is irrelevant. For
       the CG algorithm, a store depth of 2 should always be sufficient.


 In most of the examples bundled with the toolbox (see link below), the
 solver can be replaced by the present one if need be.

 See also: steepestdescent trustregions manopt/solvers/linesearch manopt/examples

0001 function [x, cost, info, options] = conjugategradient(problem, x, options)
0002 % Conjugate gradient minimization algorithm for Manopt.
0003 %
0004 % function [x, cost, info, options] = conjugategradient(problem)
0005 % function [x, cost, info, options] = conjugategradient(problem, x0)
0006 % function [x, cost, info, options] = conjugategradient(problem, x0, options)
0007 % function [x, cost, info, options] = conjugategradient(problem, [], options)
0008 %
0009 % Apply the conjugate gradient minimization algorithm to the problem
0010 % defined in the problem structure, starting at x0 if it is provided
0011 % (otherwise, at a random point on the manifold). To specify options whilst
0012 % not specifying an initial guess, give x0 as [] (the empty matrix).
0013 %
0014 % The outputs x and cost are the best reached point on the manifold and its
0015 % cost. The struct-array info contains information about the iterations:
0016 %   iter : the iteration number (0 for the initial guess)
0017 %   cost : cost value
0018 %   time : elapsed time in seconds
0019 %   gradnorm : Riemannian norm of the gradient
0020 %   stepsize : norm of the last tangent vector retracted
0021 %   beta : value of the beta parameter (see options.beta_type)
0022 %   linesearch : information logged by options.linesearch
0023 %   And possibly additional information logged by options.statsfun.
0024 % For example, type [info.gradnorm] to obtain a vector of the successive
0025 % gradient norms reached.
0026 %
0027 % The options structure is used to overwrite the default values. All
0028 % options have a default value and are hence optional. To force an option
0029 % value, pass an options structure with a field options.optionname, where
0030 % optionname is one of the following and the default value is indicated
0031 % between parentheses:
0032 %
0033 %   tolgradnorm (1e-6)
0034 %       The algorithm terminates if the norm of the gradient drops below this.
0035 %   maxiter (1000)
0036 %       The algorithm terminates if maxiter iterations have been executed.
0037 %   maxtime (Inf)
0038 %       The algorithm terminates if maxtime seconds elapsed.
0039 %   minstepsize (1e-10)
0040 %       The algorithm terminates if the linesearch returns a displacement
0041 %       vector (to be retracted) smaller in norm than this value.
0042 %   beta_type ('H-S')
0043 %       Conjugate gradient beta rule used to construct the new search
0044 %       direction, based on a linear combination of the previous search
0045 %       direction and the new (preconditioned) gradient. Possible values
0046 %       for this parameter are:
0047 %           'S-D', 'steep' for beta = 0 (preconditioned steepest descent)
0048 %           'F-R' for Fletcher-Reeves's rule
0049 %           'P-R' for Polak-Ribiere's modified rule
0050 %           'H-S' for Hestenes-Stiefel's modified rule
0051 %           'H-Z' for Hager-Zhang's modified rule
0052 %           'L-S' for Sato's Liu-Storey rule
0053 %       See Hager and Zhang 2006, "A survey of nonlinear conjugate gradient
0054 %       methods" for a description of these rules in the Euclidean case and
0055 %       for an explanation of how to adapt them to the preconditioned case.
0056 %       The adaption to the Riemannian case is straightforward: see in code
0057 %       for details. Modified rules take the max between 0 and the computed
0058 %       beta value, which provides automatic restart, except for H-Z and L-S
0059 %       which use a different modification. Sato's Liu-Storey rule is
0060 %       described in Sato 2021, "Riemannian conjugate gradient methods:
0061 %       General framework and specific algorithms with convergence analyses"
0062 %   orth_value (Inf)
0063 %       Following Powell's restart strategy (Math. prog. 1977), restart CG
0064 %       (that is, make a -preconditioned- gradient step) if two successive
0065 %       -preconditioned- gradients are "too" parallel. See for example
0066 %       Hager and Zhang 2006, "A survey of nonlinear conjugate gradient
0067 %       methods", page 12. An infinite value disables this strategy. See in
0068 %       code formula for the specific criterion used.
0069 %   linesearch (@linesearch_adaptive or @linesearch_hint)
0070 %       Function handle to a line search function. The options structure is
0071 %       passed to the line search too, so you can pass it parameters. See
0072 %       each line search's documentation for info. Another available line
0073 %       search in manopt is @linesearch, in /manopt/linesearch/linesearch.m
0074 %       If the problem structure includes a line search hint, then the
0075 %       default line search used is @linesearch_hint.
0076 %   statsfun (none)
0077 %       Function handle to a function that will be called after each
0078 %       iteration to provide the opportunity to log additional statistics.
0079 %       They will be returned in the info struct. See the generic Manopt
0080 %       documentation about solvers for further information.
0081 %   stopfun (none)
0082 %       Function handle to a function that will be called at each iteration
0083 %       to provide the opportunity to specify additional stopping criteria.
0084 %       See the generic Manopt documentation about solvers for further
0085 %       information.
0086 %   verbosity (3)
0087 %       Integer number used to tune the amount of output the algorithm
0088 %       generates during execution (mostly as text in the command window).
0089 %       The higher, the more output. 0 means silent.
0090 %   storedepth (2)
0091 %       Maximum number of different points x of the manifold for which a
0092 %       store structure will be kept in memory in the storedb. If the
0093 %       caching features of Manopt are not used, this is irrelevant. For
0094 %       the CG algorithm, a store depth of 2 should always be sufficient.
0095 %
0096 %
0097 % In most of the examples bundled with the toolbox (see link below), the
0098 % solver can be replaced by the present one if need be.
0099 %
0100 % See also: steepestdescent trustregions manopt/solvers/linesearch manopt/examples
0101 
0102 % An explicit, general listing of this algorithm, with preconditioning,
0103 % can be found in the following paper:
0104 %     @Article{boumal2015lowrank,
0105 %       Title   = {Low-rank matrix completion via preconditioned optimization on the {G}rassmann manifold},
0106 %       Author  = {Boumal, N. and Absil, P.-A.},
0107 %       Journal = {Linear Algebra and its Applications},
0108 %       Year    = {2015},
0109 %       Pages   = {200--239},
0110 %       Volume  = {475},
0111 %       Doi     = {10.1016/j.laa.2015.02.027},
0112 %     }
0113 
0114 % This file is part of Manopt: www.manopt.org.
0115 % Original author: Bamdev Mishra, Dec. 30, 2012.
0116 % Contributors: Nicolas Boumal, Nick Vannieuwenhoven
0117 % Change log:
0118 %
0119 %   March 14, 2013, NB:
0120 %       Added preconditioner support : see Section 8 in
0121 %       https://www.math.lsu.edu/~hozhang/papers/cgsurvey.pdf
0122 %
0123 %   Sept. 13, 2013, NB:
0124 %       Now logging beta parameter too.
0125 %
0126 %   Nov. 7, 2013, NB:
0127 %       The search direction is no longer normalized before it is passed
0128 %       to the linesearch. This way, it is up to the designers of the
0129 %       linesearch to decide whether they want to use the norm of the
0130 %       search direction in their algorithm or not. There are reasons
0131 %       against it, but practical evidence that it may help too, so we
0132 %       allow it. The default linesearch_adaptive used does exploit the
0133 %       norm information. The base linesearch does not. You may select it
0134 %       by setting options.linesearch = @linesearch;
0135 %
0136 %   Nov. 29, 2013, NB:
0137 %       Documentation improved: options are now explicitly described.
0138 %       Removed the Daniel rule for beta: it was not appropriate for
0139 %       preconditioned CG and I could not find a proper reference for it.
0140 %
0141 %   April 3, 2015 (NB):
0142 %       Works with the new StoreDB class system.
0143 %
0144 %   Aug. 2, 2018 (NB):
0145 %       Now using storedb.remove() to keep the cache lean.
0146 %
0147 %   Feb. 7, 2022 (NV):
0148 %       Added support for Liu-Storey rule (L-S).
0149 
0150 M = problem.M;
0151 
0152 % Verify that the problem description is sufficient for the solver.
0153 if ~canGetCost(problem)
0154     warning('manopt:getCost', ...
0155         'No cost provided. The algorithm will likely abort.');
0156 end
0157 if ~canGetGradient(problem) && ~canGetApproxGradient(problem)
0158     warning('manopt:getGradient:approx', ...
0159            ['No gradient provided. Using an FD approximation instead (slow).\n' ...
0160             'It may be necessary to increase options.tolgradnorm.\n' ...
0161             'To disable this warning: warning(''off'', ''manopt:getGradient:approx'')']);
0162     problem.approxgrad = approxgradientFD(problem);
0163 end
0164 
0165 % Set local defaults here
0166 localdefaults.minstepsize = 1e-10;
0167 localdefaults.maxiter = 1000;
0168 localdefaults.tolgradnorm = 1e-6;
0169 localdefaults.storedepth = 20;
0170 % Changed by NB : H-S has the "auto restart" property.
0171 % See Hager-Zhang 2005/2006 survey about CG methods.
0172 % The auto restart comes from the 'max(0, ...)', not so much from the
0173 % reason stated in Hager-Zhang I think. P-R also has auto restart.
0174 localdefaults.beta_type = 'H-S';
0175 localdefaults.orth_value = Inf; % by BM as suggested in Nocedal and Wright
0176 
0177     
0178 % Depending on whether the problem structure specifies a hint for
0179 % line-search algorithms, choose a default line-search that works on
0180 % its own (typical) or that uses the hint.
0181 if ~canGetLinesearch(problem)
0182     localdefaults.linesearch = @linesearch_adaptive;
0183 else
0184     localdefaults.linesearch = @linesearch_hint;
0185 end
0186 
0187 % Merge global and local defaults, then merge w/ user options, if any.
0188 localdefaults = mergeOptions(getGlobalDefaults(), localdefaults);
0189 if ~exist('options', 'var') || isempty(options)
0190     options = struct();
0191 end
0192 options = mergeOptions(localdefaults, options);
0193 
0194 timetic = tic();
0195 
0196 % If no initial point x is given by the user, generate one at random.
0197 if ~exist('x', 'var') || isempty(x)
0198     x = M.rand();
0199 end
0200 
0201 % Create a store database and generate a key for the current x
0202 storedb = StoreDB(options.storedepth);
0203 key = storedb.getNewKey();
0204 
0205 % Compute cost-related quantities for x
0206 [cost, grad] = getCostGrad(problem, x, storedb, key);
0207 gradnorm = M.norm(x, grad);
0208 Pgrad = getPrecon(problem, x, grad, storedb, key);
0209 gradPgrad = M.inner(x, grad, Pgrad);
0210 
0211 % Iteration counter (at any point, iter is the number of fully executed
0212 % iterations so far)
0213 iter = 0;
0214 
0215 % Save stats in a struct array info and preallocate.
0216 stats = savestats();
0217 info(1) = stats;
0218 info(min(10000, options.maxiter+1)).iter = [];
0219 
0220 
0221 if options.verbosity >= 2
0222     fprintf(' iter\t               cost val\t    grad. norm\n');
0223 end
0224 
0225 % Compute a first descent direction (not normalized)
0226 desc_dir = M.lincomb(x, -1, Pgrad);
0227 
0228 
0229 % Start iterating until stopping criterion triggers
0230 while true
0231     
0232     % Display iteration information
0233     if options.verbosity >= 2
0234         fprintf('%5d\t%+.16e\t%.8e\n', iter, cost, gradnorm);
0235     end
0236     
0237     % Start timing this iteration
0238     timetic = tic();
0239     
0240     % Run standard stopping criterion checks
0241     [stop, reason] = stoppingcriterion(problem, x, options, info, iter+1);
0242     
0243     % Run specific stopping criterion check
0244     if ~stop && abs(stats.stepsize) < options.minstepsize
0245         stop = true;
0246         reason = sprintf(['Last stepsize smaller than minimum '  ...
0247                           'allowed; options.minstepsize = %g.'], ...
0248                           options.minstepsize);
0249     end
0250     
0251     if stop
0252         if options.verbosity >= 1
0253             fprintf([reason '\n']);
0254         end
0255         break;
0256     end
0257     
0258     
0259     % The line search algorithms require the directional derivative of the
0260     % cost at the current point x along the search direction.
0261     df0 = M.inner(x, grad, desc_dir);
0262         
0263     % If we didn't get a descent direction: restart, i.e., switch to the
0264     % negative gradient. Equivalent to resetting the CG direction to a
0265     % steepest descent step, which discards the past information.
0266     if df0 >= 0
0267         
0268         % Or we switch to the negative gradient direction.
0269         if options.verbosity >= 3
0270             fprintf(['Conjugate gradient info: got an ascent direction '...
0271                      '(df0 = %2e), reset to the (preconditioned) '...
0272                      'steepest descent direction.\n'], df0);
0273         end
0274         % Reset to negative gradient: this discards the CG memory.
0275         desc_dir = M.lincomb(x, -1, Pgrad);
0276         df0 = -gradPgrad;
0277         
0278     end
0279     
0280     
0281     % Execute line search
0282     [stepsize, newx, newkey, lsstats] = options.linesearch( ...
0283                    problem, x, desc_dir, cost, df0, options, storedb, key);
0284                
0285     
0286     % Compute the new cost-related quantities for newx
0287     [newcost, newgrad] = getCostGrad(problem, newx, storedb, newkey);
0288     newgradnorm = M.norm(newx, newgrad);
0289     Pnewgrad = getPrecon(problem, newx, newgrad, storedb, newkey);
0290     newgradPnewgrad = M.inner(newx, newgrad, Pnewgrad);
0291     
0292     
0293     % Apply the CG scheme to compute the next search direction.
0294     %
0295     % This paper https://www.math.lsu.edu/~hozhang/papers/cgsurvey.pdf
0296     % by Hager and Zhang lists many known beta rules. The rules defined
0297     % here can be found in that paper (or are provided with additional
0298     % references), adapted to the Riemannian setting.
0299     %
0300     if strcmpi(options.beta_type, 'steep') || ...
0301        strcmpi(options.beta_type, 'S-D')              % Gradient Descent
0302         
0303         beta = 0;
0304         desc_dir = M.lincomb(newx, -1, Pnewgrad);
0305         
0306     else
0307         
0308         oldgrad = M.transp(x, newx, grad);
0309         orth_grads = M.inner(newx, oldgrad, Pnewgrad) / newgradPnewgrad;
0310         
0311         % Powell's restart strategy (see page 12 of Hager and Zhang's
0312         % survey on conjugate gradient methods, for example)
0313         if abs(orth_grads) >= options.orth_value
0314             beta = 0;
0315             desc_dir = M.lincomb(x, -1, Pnewgrad);
0316             
0317         else % Compute the CG modification
0318             
0319             desc_dir = M.transp(x, newx, desc_dir);
0320             
0321             switch upper(options.beta_type)
0322             
0323                 case 'F-R'  % Fletcher-Reeves
0324                     beta = newgradPnewgrad / gradPgrad;
0325                 
0326                 case 'P-R'  % Polak-Ribiere+
0327                     % vector grad(new) - transported grad(current)
0328                     diff = M.lincomb(newx, 1, newgrad, -1, oldgrad);
0329                     ip_diff = M.inner(newx, Pnewgrad, diff);
0330                     beta = ip_diff / gradPgrad;
0331                     beta = max(0, beta);
0332                 
0333                 case 'H-S'  % Hestenes-Stiefel+
0334                     diff = M.lincomb(newx, 1, newgrad, -1, oldgrad);
0335                     ip_diff = M.inner(newx, Pnewgrad, diff);
0336                     beta = ip_diff / M.inner(newx, diff, desc_dir);
0337                     beta = max(0, beta);
0338 
0339                 case 'H-Z' % Hager-Zhang+
0340                     diff = M.lincomb(newx, 1, newgrad, -1, oldgrad);
0341                     Poldgrad = M.transp(x, newx, Pgrad);
0342                     Pdiff = M.lincomb(newx, 1, Pnewgrad, -1, Poldgrad);
0343                     deno = M.inner(newx, diff, desc_dir);
0344                     numo = M.inner(newx, diff, Pnewgrad);
0345                     numo = numo - 2*M.inner(newx, diff, Pdiff)*...
0346                                      M.inner(newx, desc_dir, newgrad) / deno;
0347                     beta = numo / deno;
0348 
0349                     % Robustness (see Hager-Zhang paper mentioned above)
0350                     desc_dir_norm = M.norm(newx, desc_dir);
0351                     eta_HZ = -1 / ( desc_dir_norm * min(0.01, gradnorm) );
0352                     beta = max(beta, eta_HZ);
0353                 
0354                 case 'L-S' % Liu-Storey+ from Sato
0355                     diff = M.lincomb(newx, 1, newgrad, -1, oldgrad);
0356                     ip_diff = M.inner(newx, Pnewgrad, diff);
0357                     denom = -1*M.inner(x, grad, desc_dir);
0358                     betaLS = ip_diff / denom;
0359                     betaCD = newgradPnewgrad / denom;
0360                     beta = max(0, min(betaLS, betaCD));
0361 
0362                 otherwise
0363                     error(['Unknown options.beta_type. ' ...
0364                            'Should be steep, S-D, F-R, P-R, H-S, H-Z, or L-S.']);
0365             end
0366             
0367             desc_dir = M.lincomb(newx, -1, Pnewgrad, beta, desc_dir);
0368         
0369         end
0370         
0371     end
0372     
0373     % Transfer iterate info.
0374     storedb.removefirstifdifferent(key, newkey);
0375     x = newx;
0376     key = newkey;
0377     cost = newcost;
0378     grad = newgrad;
0379     Pgrad = Pnewgrad;
0380     gradnorm = newgradnorm;
0381     gradPgrad = newgradPnewgrad;
0382     
0383     % iter is the number of iterations we have accomplished.
0384     iter = iter + 1;
0385     
0386     % Make sure we don't use too much memory for the store database.
0387     storedb.purge();
0388     
0389     % Log statistics for freshly executed iteration.
0390     stats = savestats();
0391     info(iter+1) = stats;
0392     
0393 end
0394 
0395 
0396 info = info(1:iter+1);
0397 
0398 if options.verbosity >= 1
0399     fprintf('Total time is %f [s] (excludes statsfun)\n', info(end).time);
0400 end
0401 
0402 
0403 % Routine in charge of collecting the current iteration stats
0404 function stats = savestats()
0405     stats.iter = iter;
0406     stats.cost = cost;
0407     stats.gradnorm = gradnorm;
0408     if iter == 0
0409         stats.stepsize = nan;
0410         stats.time = toc(timetic);
0411         stats.linesearch = [];
0412         stats.beta = 0;
0413     else
0414         stats.stepsize = stepsize;
0415         stats.time = info(iter).time + toc(timetic);
0416         stats.linesearch = lsstats;
0417         stats.beta = beta;
0418     end
0419     stats = applyStatsfun(problem, x, storedb, key, options, stats);
0420 end
0421 
0422 end
0423 
0424

conjugategradient

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SUBFUNCTIONS

SOURCE CODE