Home > manopt > solvers > conjugategradient > conjugategradient.m

conjugategradient

PURPOSE ^

Conjugate gradient minimization algorithm for Manopt.

SYNOPSIS ^

function [x, cost, info, options] = conjugategradient(problem, x, options)

DESCRIPTION ^

 Conjugate gradient minimization algorithm for Manopt.

 function [x, cost, info, options] = conjugategradient(problem)
 function [x, cost, info, options] = conjugategradient(problem, x0)
 function [x, cost, info, options] = conjugategradient(problem, x0, options)
 function [x, cost, info, options] = conjugategradient(problem, [], options)

 Apply the conjugate gradient minimization algorithm to the problem
 defined in the problem structure, starting at x0 if it is provided
 (otherwise, at a random point on the manifold). To specify options whilst
 not specifying an initial guess, give x0 as [] (the empty matrix).

 The outputs x and cost are the best reached point on the manifold and its
 cost. The struct-array info contains information about the iterations:
   iter : the iteration number (0 for the initial guess)
   cost : cost value
   time : elapsed time in seconds
   gradnorm : Riemannian norm of the gradient
   stepsize : norm of the last tangent vector retracted
   beta : value of the beta parameter (see options.beta_type)
   linesearch : information logged by options.linesearch
   And possibly additional information logged by options.statsfun.
 For example, type [info.gradnorm] to obtain a vector of the successive
 gradient norms reached.

 The options structure is used to overwrite the default values. All
 options have a default value and are hence optional. To force an option
 value, pass an options structure with a field options.optionname, where
 optionname is one of the following and the default value is indicated
 between parentheses:

   tolgradnorm (1e-6)
       The algorithm terminates if the norm of the gradient drops below this.
   maxiter (1000)
       The algorithm terminates if maxiter iterations have been executed.
   maxtime (Inf)
       The algorithm terminates if maxtime seconds elapsed.
   minstepsize (1e-10)
       The algorithm terminates if the linesearch returns a displacement
       vector (to be retracted) smaller in norm than this value.
   beta_type ('H-S')
       Conjugate gradient beta rule used to construct the new search
       direction, based on a linear combination of the previous search
       direction and the new (preconditioned) gradient. Possible values
       for this parameter are:
           'S-D', 'steep' for beta = 0 (preconditioned steepest descent)
           'F-R' for Fletcher-Reeves's rule
           'P-R' for Polak-Ribiere's modified rule
           'H-S' for Hestenes-Stiefel's modified rule
           'H-Z' for Hager-Zhang's modified rule
       See Hager and Zhang 2006, "A survey of nonlinear conjugate gradient
       methods" for a description of these rules in the Euclidean case and
       for an explanation of how to adapt them to the preconditioned case.
       The adaption to the Riemannian case is straightforward: see in code
       for details. Modified rules take the max between 0 and the computed
       beta value, which provides automatic restart, except for H-Z which
       uses a different modification.
   orth_value (Inf)
       Following Powell's restart strategy (Math. prog. 1977), restart CG
       (that is, make a -preconditioned- gradient step) if two successive
       -preconditioned- gradients are "too" parallel. See for example
       Hager and Zhang 2006, "A survey of nonlinear conjugate gradient
       methods", page 12. An infinite value disables this strategy. See in
       code formula for the specific criterion used.
   linesearch (@linesearch_adaptive or @linesearch_hint)
       Function handle to a line search function. The options structure is
       passed to the line search too, so you can pass it parameters. See
       each line search's documentation for info. Another available line
       search in manopt is @linesearch, in /manopt/linesearch/linesearch.m
       If the problem structure includes a line search hint, then the
       default line search used is @linesearch_hint.
   statsfun (none)
       Function handle to a function that will be called after each
       iteration to provide the opportunity to log additional statistics.
       They will be returned in the info struct. See the generic Manopt
       documentation about solvers for further information.
   stopfun (none)
       Function handle to a function that will be called at each iteration
       to provide the opportunity to specify additional stopping criteria.
       See the generic Manopt documentation about solvers for further
       information.
   verbosity (3)
       Integer number used to tune the amount of output the algorithm
       generates during execution (mostly as text in the command window).
       The higher, the more output. 0 means silent.
   storedepth (2)
       Maximum number of different points x of the manifold for which a
       store structure will be kept in memory in the storedb. If the
       caching features of Manopt are not used, this is irrelevant. For
       the CG algorithm, a store depth of 2 should always be sufficient.


 In most of the examples bundled with the toolbox (see link below), the
 solver can be replaced by the present one if need be.

 See also: steepestdescent trustregions manopt/solvers/linesearch manopt/examples

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [x, cost, info, options] = conjugategradient(problem, x, options)
0002 % Conjugate gradient minimization algorithm for Manopt.
0003 %
0004 % function [x, cost, info, options] = conjugategradient(problem)
0005 % function [x, cost, info, options] = conjugategradient(problem, x0)
0006 % function [x, cost, info, options] = conjugategradient(problem, x0, options)
0007 % function [x, cost, info, options] = conjugategradient(problem, [], options)
0008 %
0009 % Apply the conjugate gradient minimization algorithm to the problem
0010 % defined in the problem structure, starting at x0 if it is provided
0011 % (otherwise, at a random point on the manifold). To specify options whilst
0012 % not specifying an initial guess, give x0 as [] (the empty matrix).
0013 %
0014 % The outputs x and cost are the best reached point on the manifold and its
0015 % cost. The struct-array info contains information about the iterations:
0016 %   iter : the iteration number (0 for the initial guess)
0017 %   cost : cost value
0018 %   time : elapsed time in seconds
0019 %   gradnorm : Riemannian norm of the gradient
0020 %   stepsize : norm of the last tangent vector retracted
0021 %   beta : value of the beta parameter (see options.beta_type)
0022 %   linesearch : information logged by options.linesearch
0023 %   And possibly additional information logged by options.statsfun.
0024 % For example, type [info.gradnorm] to obtain a vector of the successive
0025 % gradient norms reached.
0026 %
0027 % The options structure is used to overwrite the default values. All
0028 % options have a default value and are hence optional. To force an option
0029 % value, pass an options structure with a field options.optionname, where
0030 % optionname is one of the following and the default value is indicated
0031 % between parentheses:
0032 %
0033 %   tolgradnorm (1e-6)
0034 %       The algorithm terminates if the norm of the gradient drops below this.
0035 %   maxiter (1000)
0036 %       The algorithm terminates if maxiter iterations have been executed.
0037 %   maxtime (Inf)
0038 %       The algorithm terminates if maxtime seconds elapsed.
0039 %   minstepsize (1e-10)
0040 %       The algorithm terminates if the linesearch returns a displacement
0041 %       vector (to be retracted) smaller in norm than this value.
0042 %   beta_type ('H-S')
0043 %       Conjugate gradient beta rule used to construct the new search
0044 %       direction, based on a linear combination of the previous search
0045 %       direction and the new (preconditioned) gradient. Possible values
0046 %       for this parameter are:
0047 %           'S-D', 'steep' for beta = 0 (preconditioned steepest descent)
0048 %           'F-R' for Fletcher-Reeves's rule
0049 %           'P-R' for Polak-Ribiere's modified rule
0050 %           'H-S' for Hestenes-Stiefel's modified rule
0051 %           'H-Z' for Hager-Zhang's modified rule
0052 %       See Hager and Zhang 2006, "A survey of nonlinear conjugate gradient
0053 %       methods" for a description of these rules in the Euclidean case and
0054 %       for an explanation of how to adapt them to the preconditioned case.
0055 %       The adaption to the Riemannian case is straightforward: see in code
0056 %       for details. Modified rules take the max between 0 and the computed
0057 %       beta value, which provides automatic restart, except for H-Z which
0058 %       uses a different modification.
0059 %   orth_value (Inf)
0060 %       Following Powell's restart strategy (Math. prog. 1977), restart CG
0061 %       (that is, make a -preconditioned- gradient step) if two successive
0062 %       -preconditioned- gradients are "too" parallel. See for example
0063 %       Hager and Zhang 2006, "A survey of nonlinear conjugate gradient
0064 %       methods", page 12. An infinite value disables this strategy. See in
0065 %       code formula for the specific criterion used.
0066 %   linesearch (@linesearch_adaptive or @linesearch_hint)
0067 %       Function handle to a line search function. The options structure is
0068 %       passed to the line search too, so you can pass it parameters. See
0069 %       each line search's documentation for info. Another available line
0070 %       search in manopt is @linesearch, in /manopt/linesearch/linesearch.m
0071 %       If the problem structure includes a line search hint, then the
0072 %       default line search used is @linesearch_hint.
0073 %   statsfun (none)
0074 %       Function handle to a function that will be called after each
0075 %       iteration to provide the opportunity to log additional statistics.
0076 %       They will be returned in the info struct. See the generic Manopt
0077 %       documentation about solvers for further information.
0078 %   stopfun (none)
0079 %       Function handle to a function that will be called at each iteration
0080 %       to provide the opportunity to specify additional stopping criteria.
0081 %       See the generic Manopt documentation about solvers for further
0082 %       information.
0083 %   verbosity (3)
0084 %       Integer number used to tune the amount of output the algorithm
0085 %       generates during execution (mostly as text in the command window).
0086 %       The higher, the more output. 0 means silent.
0087 %   storedepth (2)
0088 %       Maximum number of different points x of the manifold for which a
0089 %       store structure will be kept in memory in the storedb. If the
0090 %       caching features of Manopt are not used, this is irrelevant. For
0091 %       the CG algorithm, a store depth of 2 should always be sufficient.
0092 %
0093 %
0094 % In most of the examples bundled with the toolbox (see link below), the
0095 % solver can be replaced by the present one if need be.
0096 %
0097 % See also: steepestdescent trustregions manopt/solvers/linesearch manopt/examples
0098 
0099 % An explicit, general listing of this algorithm, with preconditioning,
0100 % can be found in the following paper:
0101 %     @Article{boumal2015lowrank,
0102 %       Title   = {Low-rank matrix completion via preconditioned optimization on the {G}rassmann manifold},
0103 %       Author  = {Boumal, N. and Absil, P.-A.},
0104 %       Journal = {Linear Algebra and its Applications},
0105 %       Year    = {2015},
0106 %       Pages   = {200--239},
0107 %       Volume  = {475},
0108 %       Doi     = {10.1016/j.laa.2015.02.027},
0109 %     }
0110 
0111 % This file is part of Manopt: www.manopt.org.
0112 % Original author: Bamdev Mishra, Dec. 30, 2012.
0113 % Contributors: Nicolas Boumal
0114 % Change log:
0115 %
0116 %   March 14, 2013, NB:
0117 %       Added preconditioner support : see Section 8 in
0118 %       https://www.math.lsu.edu/~hozhang/papers/cgsurvey.pdf
0119 %
0120 %   Sept. 13, 2013, NB:
0121 %       Now logging beta parameter too.
0122 %
0123 %    Nov. 7, 2013, NB:
0124 %       The search direction is no longer normalized before it is passed
0125 %       to the linesearch. This way, it is up to the designers of the
0126 %       linesearch to decide whether they want to use the norm of the
0127 %       search direction in their algorithm or not. There are reasons
0128 %       against it, but practical evidence that it may help too, so we
0129 %       allow it. The default linesearch_adaptive used does exploit the
0130 %       norm information. The base linesearch does not. You may select it
0131 %       by setting options.linesearch = @linesearch;
0132 %
0133 %    Nov. 29, 2013, NB:
0134 %       Documentation improved: options are now explicitly described.
0135 %       Removed the Daniel rule for beta: it was not appropriate for
0136 %       preconditioned CG and I could not find a proper reference for it.
0137 %
0138 %   April 3, 2015 (NB):
0139 %       Works with the new StoreDB class system.
0140 
0141 % Verify that the problem description is sufficient for the solver.
0142 if ~canGetCost(problem)
0143     warning('manopt:getCost', ...
0144         'No cost provided. The algorithm will likely abort.');
0145 end
0146 if ~canGetGradient(problem) && ~canGetApproxGradient(problem)
0147     warning('manopt:getGradient:approx', ...
0148            ['No gradient provided. Using an FD approximation instead (slow).\n' ...
0149             'It may be necessary to increase options.tolgradnorm.\n' ...
0150             'To disable this warning: warning(''off'', ''manopt:getGradient:approx'')']);
0151     problem.approxgrad = approxgradientFD(problem);
0152 end
0153 
0154 % Set local defaults here
0155 localdefaults.minstepsize = 1e-10;
0156 localdefaults.maxiter = 1000;
0157 localdefaults.tolgradnorm = 1e-6;
0158 localdefaults.storedepth = 20;
0159 % Changed by NB : H-S has the "auto restart" property.
0160 % See Hager-Zhang 2005/2006 survey about CG methods.
0161 % The auto restart comes from the 'max(0, ...)', not so much from the
0162 % reason stated in Hager-Zhang I think. P-R also has auto restart.
0163 localdefaults.beta_type = 'H-S';
0164 localdefaults.orth_value = Inf; % by BM as suggested in Nocedal and Wright
0165 
0166     
0167 % Depending on whether the problem structure specifies a hint for
0168 % line-search algorithms, choose a default line-search that works on
0169 % its own (typical) or that uses the hint.
0170 if ~canGetLinesearch(problem)
0171     localdefaults.linesearch = @linesearch_adaptive;
0172 else
0173     localdefaults.linesearch = @linesearch_hint;
0174 end
0175 
0176 % Merge global and local defaults, then merge w/ user options, if any.
0177 localdefaults = mergeOptions(getGlobalDefaults(), localdefaults);
0178 if ~exist('options', 'var') || isempty(options)
0179     options = struct();
0180 end
0181 options = mergeOptions(localdefaults, options);
0182 
0183 % For convenience
0184 inner = problem.M.inner;
0185 lincomb = problem.M.lincomb;
0186 
0187 timetic = tic();
0188 
0189 % If no initial point x is given by the user, generate one at random.
0190 if ~exist('x', 'var') || isempty(x)
0191     x = problem.M.rand();
0192 end
0193 
0194 % Create a store database and generate a key for the current x
0195 storedb = StoreDB(options.storedepth);
0196 key = storedb.getNewKey();
0197 
0198 % Compute cost-related quantities for x
0199 [cost, grad] = getCostGrad(problem, x, storedb, key);
0200 gradnorm = problem.M.norm(x, grad);
0201 Pgrad = getPrecon(problem, x, grad, storedb, key);
0202 gradPgrad = inner(x, grad, Pgrad);
0203 
0204 % Iteration counter (at any point, iter is the number of fully executed
0205 % iterations so far)
0206 iter = 0;
0207 
0208 % Save stats in a struct array info and preallocate.
0209 stats = savestats();
0210 info(1) = stats;
0211 info(min(10000, options.maxiter+1)).iter = [];
0212 
0213 
0214 if options.verbosity >= 2
0215     fprintf(' iter\t               cost val\t    grad. norm\n');
0216 end
0217 
0218 % Compute a first descent direction (not normalized)
0219 desc_dir = lincomb(x, -1, Pgrad);
0220 
0221 
0222 % Start iterating until stopping criterion triggers
0223 while true
0224     
0225     % Display iteration information
0226     if options.verbosity >= 2
0227         fprintf('%5d\t%+.16e\t%.8e\n', iter, cost, gradnorm);
0228     end
0229     
0230     % Start timing this iteration
0231     timetic = tic();
0232     
0233     % Run standard stopping criterion checks
0234     [stop, reason] = stoppingcriterion(problem, x, options, info, iter+1);
0235     
0236     % Run specific stopping criterion check
0237     if ~stop && abs(stats.stepsize) < options.minstepsize
0238         stop = true;
0239         reason = sprintf(['Last stepsize smaller than minimum '  ...
0240                           'allowed; options.minstepsize = %g.'], ...
0241                           options.minstepsize);
0242     end
0243     
0244     if stop
0245         if options.verbosity >= 1
0246             fprintf([reason '\n']);
0247         end
0248         break;
0249     end
0250     
0251     
0252     % The line search algorithms require the directional derivative of the
0253     % cost at the current point x along the search direction.
0254     df0 = inner(x, grad, desc_dir);
0255         
0256     % If we didn't get a descent direction: restart, i.e., switch to the
0257     % negative gradient. Equivalent to resetting the CG direction to a
0258     % steepest descent step, which discards the past information.
0259     if df0 >= 0
0260         
0261         % Or we switch to the negative gradient direction.
0262         if options.verbosity >= 3
0263             fprintf(['Conjugate gradient info: got an ascent direction '...
0264                      '(df0 = %2e), reset to the (preconditioned) '...
0265                      'steepest descent direction.\n'], df0);
0266         end
0267         % Reset to negative gradient: this discards the CG memory.
0268         desc_dir = lincomb(x, -1, Pgrad);
0269         df0 = -gradPgrad;
0270         
0271     end
0272     
0273     
0274     % Execute line search
0275     [stepsize, newx, newkey, lsstats] = options.linesearch( ...
0276                    problem, x, desc_dir, cost, df0, options, storedb, key);
0277                
0278     
0279     % Compute the new cost-related quantities for newx
0280     [newcost, newgrad] = getCostGrad(problem, newx, storedb, newkey);
0281     newgradnorm = problem.M.norm(newx, newgrad);
0282     Pnewgrad = getPrecon(problem, newx, newgrad, storedb, newkey);
0283     newgradPnewgrad = inner(newx, newgrad, Pnewgrad);
0284     
0285     
0286     % Apply the CG scheme to compute the next search direction.
0287     %
0288     % This paper https://www.math.lsu.edu/~hozhang/papers/cgsurvey.pdf
0289     % by Hager and Zhang lists many known beta rules. The rules defined
0290     % here can be found in that paper (or are provided with additional
0291     % references), adapted to the Riemannian setting.
0292     %
0293     if strcmpi(options.beta_type, 'steep') || ...
0294        strcmpi(options.beta_type, 'S-D')              % Gradient Descent
0295         
0296         beta = 0;
0297         desc_dir = lincomb(x, -1, Pnewgrad);
0298         
0299     else
0300         
0301         oldgrad = problem.M.transp(x, newx, grad);
0302         orth_grads = inner(newx, oldgrad, Pnewgrad) / newgradPnewgrad;
0303         
0304         % Powell's restart strategy (see page 12 of Hager and Zhang's
0305         % survey on conjugate gradient methods, for example)
0306         if abs(orth_grads) >= options.orth_value,
0307             beta = 0;
0308             desc_dir = lincomb(x, -1, Pnewgrad);
0309             
0310         else % Compute the CG modification
0311             
0312             desc_dir = problem.M.transp(x, newx, desc_dir);
0313             
0314             switch upper(options.beta_type)
0315             
0316                 case 'F-R'  % Fletcher-Reeves
0317                     beta = newgradPnewgrad / gradPgrad;
0318                 
0319                 case 'P-R'  % Polak-Ribiere+
0320                     % vector grad(new) - transported grad(current)
0321                     diff = lincomb(newx, 1, newgrad, -1, oldgrad);
0322                     ip_diff = inner(newx, Pnewgrad, diff);
0323                     beta = ip_diff / gradPgrad;
0324                     beta = max(0, beta);
0325                 
0326                 case 'H-S'  % Hestenes-Stiefel+
0327                     diff = lincomb(newx, 1, newgrad, -1, oldgrad);
0328                     ip_diff = inner(newx, Pnewgrad, diff);
0329                     beta = ip_diff / inner(newx, diff, desc_dir);
0330                     beta = max(0, beta);
0331 
0332                 case 'H-Z' % Hager-Zhang+
0333                     diff = lincomb(newx, 1, newgrad, -1, oldgrad);
0334                     Poldgrad = problem.M.transp(x, newx, Pgrad);
0335                     Pdiff = lincomb(newx, 1, Pnewgrad, -1, Poldgrad);
0336                     deno = inner(newx, diff, desc_dir);
0337                     numo = inner(newx, diff, Pnewgrad);
0338                     numo = numo - 2*inner(newx, diff, Pdiff)*...
0339                                      inner(newx, desc_dir, newgrad) / deno;
0340                     beta = numo / deno;
0341 
0342                     % Robustness (see Hager-Zhang paper mentioned above)
0343                     desc_dir_norm = problem.M.norm(newx, desc_dir);
0344                     eta_HZ = -1 / ( desc_dir_norm * min(0.01, gradnorm) );
0345                     beta = max(beta, eta_HZ);
0346 
0347                 otherwise
0348                     error(['Unknown options.beta_type. ' ...
0349                            'Should be steep, S-D, F-R, P-R, H-S or H-Z.']);
0350             end
0351             
0352             desc_dir = lincomb(newx, -1, Pnewgrad, beta, desc_dir);
0353         
0354         end
0355         
0356     end
0357     
0358     % Make sure we don't use too much memory for the store database
0359     storedb.purge();
0360     
0361     % Transfer iterate info
0362     x = newx;
0363     key = newkey;
0364     cost = newcost;
0365     grad = newgrad;
0366     Pgrad = Pnewgrad;
0367     gradnorm = newgradnorm;
0368     gradPgrad = newgradPnewgrad;
0369     
0370     % iter is the number of iterations we have accomplished.
0371     iter = iter + 1;
0372     
0373     % Log statistics for freshly executed iteration
0374     stats = savestats();
0375     info(iter+1) = stats; %#ok<AGROW>
0376     
0377 end
0378 
0379 
0380 info = info(1:iter+1);
0381 
0382 if options.verbosity >= 1
0383     fprintf('Total time is %f [s] (excludes statsfun)\n', info(end).time);
0384 end
0385 
0386 
0387 % Routine in charge of collecting the current iteration stats
0388 function stats = savestats()
0389     stats.iter = iter;
0390     stats.cost = cost;
0391     stats.gradnorm = gradnorm;
0392     if iter == 0
0393         stats.stepsize = nan;
0394         stats.time = toc(timetic);
0395         stats.linesearch = [];
0396         stats.beta = 0;
0397     else
0398         stats.stepsize = stepsize;
0399         stats.time = info(iter).time + toc(timetic);
0400         stats.linesearch = lsstats;
0401         stats.beta = beta;
0402     end
0403     stats = applyStatsfun(problem, x, storedb, key, options, stats);
0404 end
0405 
0406 end
0407 
0408

Generated on Sat 12-Nov-2016 14:11:22 by m2html © 2005