Conjugate gradient minimization algorithm for Manopt. function [x, cost, info, options] = conjugategradient(problem) function [x, cost, info, options] = conjugategradient(problem, x0) function [x, cost, info, options] = conjugategradient(problem, x0, options) function [x, cost, info, options] = conjugategradient(problem, [], options) Apply the conjugate gradient minimization algorithm to the problem defined in the problem structure, starting at x0 if it is provided (otherwise, at a random point on the manifold). To specify options whilst not specifying an initial guess, give x0 as [] (the empty matrix). The outputs x and cost are the best reached point on the manifold and its cost. The struct-array info contains information about the iterations: iter : the iteration number (0 for the initial guess) cost : cost value time : elapsed time in seconds gradnorm : Riemannian norm of the gradient stepsize : norm of the last tangent vector retracted beta : value of the beta parameter (see options.beta_type) linesearch : information logged by options.linesearch And possibly additional information logged by options.statsfun. For example, type [info.gradnorm] to obtain a vector of the successive gradient norms reached. The options structure is used to overwrite the default values. All options have a default value and are hence optional. To force an option value, pass an options structure with a field options.optionname, where optionname is one of the following and the default value is indicated between parentheses: tolgradnorm (1e-6) The algorithm terminates if the norm of the gradient drops below this. maxiter (1000) The algorithm terminates if maxiter iterations have been executed. maxtime (Inf) The algorithm terminates if maxtime seconds elapsed. minstepsize (1e-10) The algorithm terminates if the linesearch returns a displacement vector (to be retracted) smaller in norm than this value. beta_type ('H-S') Conjugate gradient beta rule used to construct the new search direction, based on a linear combination of the previous search direction and the new (preconditioned) gradient. Possible values for this parameter are: 'S-D', 'steep' for beta = 0 (preconditioned steepest descent) 'F-R' for Fletcher-Reeves's rule 'P-R' for Polak-Ribiere's modified rule 'H-S' for Hestenes-Stiefel's modified rule 'H-Z' for Hager-Zhang's modified rule 'L-S' for Sato's Liu-Storey rule See Hager and Zhang 2006, "A survey of nonlinear conjugate gradient methods" for a description of these rules in the Euclidean case and for an explanation of how to adapt them to the preconditioned case. The adaption to the Riemannian case is straightforward: see in code for details. Modified rules take the max between 0 and the computed beta value, which provides automatic restart, except for H-Z and L-S which use a different modification. Sato's Liu-Storey rule is described in Sato 2021, "Riemannian conjugate gradient methods: General framework and specific algorithms with convergence analyses" orth_value (Inf) Following Powell's restart strategy (Math. prog. 1977), restart CG (that is, make a -preconditioned- gradient step) if two successive -preconditioned- gradients are "too" parallel. See for example Hager and Zhang 2006, "A survey of nonlinear conjugate gradient methods", page 12. An infinite value disables this strategy. See in code formula for the specific criterion used. linesearch (@linesearch_adaptive or @linesearch_hint) Function handle to a line search function. The options structure is passed to the line search too, so you can pass it parameters. See each line search's documentation for info. Another available line search in manopt is @linesearch, in /manopt/linesearch/linesearch.m If the problem structure includes a line search hint, then the default line search used is @linesearch_hint. statsfun (none) Function handle to a function that will be called after each iteration to provide the opportunity to log additional statistics. They will be returned in the info struct. See the generic Manopt documentation about solvers for further information. stopfun (none) Function handle to a function that will be called at each iteration to provide the opportunity to specify additional stopping criteria. See the generic Manopt documentation about solvers for further information. verbosity (3) Integer number used to tune the amount of output the algorithm generates during execution (mostly as text in the command window). The higher, the more output. 0 means silent. storedepth (2) Maximum number of different points x of the manifold for which a store structure will be kept in memory in the storedb. If the caching features of Manopt are not used, this is irrelevant. For the CG algorithm, a store depth of 2 should always be sufficient. In most of the examples bundled with the toolbox (see link below), the solver can be replaced by the present one if need be. See also: steepestdescent trustregions manopt/solvers/linesearch manopt/examples
0001 function [x, cost, info, options] = conjugategradient(problem, x, options) 0002 % Conjugate gradient minimization algorithm for Manopt. 0003 % 0004 % function [x, cost, info, options] = conjugategradient(problem) 0005 % function [x, cost, info, options] = conjugategradient(problem, x0) 0006 % function [x, cost, info, options] = conjugategradient(problem, x0, options) 0007 % function [x, cost, info, options] = conjugategradient(problem, [], options) 0008 % 0009 % Apply the conjugate gradient minimization algorithm to the problem 0010 % defined in the problem structure, starting at x0 if it is provided 0011 % (otherwise, at a random point on the manifold). To specify options whilst 0012 % not specifying an initial guess, give x0 as [] (the empty matrix). 0013 % 0014 % The outputs x and cost are the best reached point on the manifold and its 0015 % cost. The struct-array info contains information about the iterations: 0016 % iter : the iteration number (0 for the initial guess) 0017 % cost : cost value 0018 % time : elapsed time in seconds 0019 % gradnorm : Riemannian norm of the gradient 0020 % stepsize : norm of the last tangent vector retracted 0021 % beta : value of the beta parameter (see options.beta_type) 0022 % linesearch : information logged by options.linesearch 0023 % And possibly additional information logged by options.statsfun. 0024 % For example, type [info.gradnorm] to obtain a vector of the successive 0025 % gradient norms reached. 0026 % 0027 % The options structure is used to overwrite the default values. All 0028 % options have a default value and are hence optional. To force an option 0029 % value, pass an options structure with a field options.optionname, where 0030 % optionname is one of the following and the default value is indicated 0031 % between parentheses: 0032 % 0033 % tolgradnorm (1e-6) 0034 % The algorithm terminates if the norm of the gradient drops below this. 0035 % maxiter (1000) 0036 % The algorithm terminates if maxiter iterations have been executed. 0037 % maxtime (Inf) 0038 % The algorithm terminates if maxtime seconds elapsed. 0039 % minstepsize (1e-10) 0040 % The algorithm terminates if the linesearch returns a displacement 0041 % vector (to be retracted) smaller in norm than this value. 0042 % beta_type ('H-S') 0043 % Conjugate gradient beta rule used to construct the new search 0044 % direction, based on a linear combination of the previous search 0045 % direction and the new (preconditioned) gradient. Possible values 0046 % for this parameter are: 0047 % 'S-D', 'steep' for beta = 0 (preconditioned steepest descent) 0048 % 'F-R' for Fletcher-Reeves's rule 0049 % 'P-R' for Polak-Ribiere's modified rule 0050 % 'H-S' for Hestenes-Stiefel's modified rule 0051 % 'H-Z' for Hager-Zhang's modified rule 0052 % 'L-S' for Sato's Liu-Storey rule 0053 % See Hager and Zhang 2006, "A survey of nonlinear conjugate gradient 0054 % methods" for a description of these rules in the Euclidean case and 0055 % for an explanation of how to adapt them to the preconditioned case. 0056 % The adaption to the Riemannian case is straightforward: see in code 0057 % for details. Modified rules take the max between 0 and the computed 0058 % beta value, which provides automatic restart, except for H-Z and L-S 0059 % which use a different modification. Sato's Liu-Storey rule is 0060 % described in Sato 2021, "Riemannian conjugate gradient methods: 0061 % General framework and specific algorithms with convergence analyses" 0062 % orth_value (Inf) 0063 % Following Powell's restart strategy (Math. prog. 1977), restart CG 0064 % (that is, make a -preconditioned- gradient step) if two successive 0065 % -preconditioned- gradients are "too" parallel. See for example 0066 % Hager and Zhang 2006, "A survey of nonlinear conjugate gradient 0067 % methods", page 12. An infinite value disables this strategy. See in 0068 % code formula for the specific criterion used. 0069 % linesearch (@linesearch_adaptive or @linesearch_hint) 0070 % Function handle to a line search function. The options structure is 0071 % passed to the line search too, so you can pass it parameters. See 0072 % each line search's documentation for info. Another available line 0073 % search in manopt is @linesearch, in /manopt/linesearch/linesearch.m 0074 % If the problem structure includes a line search hint, then the 0075 % default line search used is @linesearch_hint. 0076 % statsfun (none) 0077 % Function handle to a function that will be called after each 0078 % iteration to provide the opportunity to log additional statistics. 0079 % They will be returned in the info struct. See the generic Manopt 0080 % documentation about solvers for further information. 0081 % stopfun (none) 0082 % Function handle to a function that will be called at each iteration 0083 % to provide the opportunity to specify additional stopping criteria. 0084 % See the generic Manopt documentation about solvers for further 0085 % information. 0086 % verbosity (3) 0087 % Integer number used to tune the amount of output the algorithm 0088 % generates during execution (mostly as text in the command window). 0089 % The higher, the more output. 0 means silent. 0090 % storedepth (2) 0091 % Maximum number of different points x of the manifold for which a 0092 % store structure will be kept in memory in the storedb. If the 0093 % caching features of Manopt are not used, this is irrelevant. For 0094 % the CG algorithm, a store depth of 2 should always be sufficient. 0095 % 0096 % 0097 % In most of the examples bundled with the toolbox (see link below), the 0098 % solver can be replaced by the present one if need be. 0099 % 0100 % See also: steepestdescent trustregions manopt/solvers/linesearch manopt/examples 0101 0102 % An explicit, general listing of this algorithm, with preconditioning, 0103 % can be found in the following paper: 0104 % @Article{boumal2015lowrank, 0105 % Title = {Low-rank matrix completion via preconditioned optimization on the {G}rassmann manifold}, 0106 % Author = {Boumal, N. and Absil, P.-A.}, 0107 % Journal = {Linear Algebra and its Applications}, 0108 % Year = {2015}, 0109 % Pages = {200--239}, 0110 % Volume = {475}, 0111 % Doi = {10.1016/j.laa.2015.02.027}, 0112 % } 0113 0114 % This file is part of Manopt: www.manopt.org. 0115 % Original author: Bamdev Mishra, Dec. 30, 2012. 0116 % Contributors: Nicolas Boumal, Nick Vannieuwenhoven 0117 % Change log: 0118 % 0119 % March 14, 2013, NB: 0120 % Added preconditioner support : see Section 8 in 0121 % https://www.math.lsu.edu/~hozhang/papers/cgsurvey.pdf 0122 % 0123 % Sept. 13, 2013, NB: 0124 % Now logging beta parameter too. 0125 % 0126 % Nov. 7, 2013, NB: 0127 % The search direction is no longer normalized before it is passed 0128 % to the linesearch. This way, it is up to the designers of the 0129 % linesearch to decide whether they want to use the norm of the 0130 % search direction in their algorithm or not. There are reasons 0131 % against it, but practical evidence that it may help too, so we 0132 % allow it. The default linesearch_adaptive used does exploit the 0133 % norm information. The base linesearch does not. You may select it 0134 % by setting options.linesearch = @linesearch; 0135 % 0136 % Nov. 29, 2013, NB: 0137 % Documentation improved: options are now explicitly described. 0138 % Removed the Daniel rule for beta: it was not appropriate for 0139 % preconditioned CG and I could not find a proper reference for it. 0140 % 0141 % April 3, 2015 (NB): 0142 % Works with the new StoreDB class system. 0143 % 0144 % Aug. 2, 2018 (NB): 0145 % Now using storedb.remove() to keep the cache lean. 0146 % 0147 % Feb. 7, 2022 (NV): 0148 % Added support for Liu-Storey rule (L-S). 0149 0150 M = problem.M; 0151 0152 % Verify that the problem description is sufficient for the solver. 0153 if ~canGetCost(problem) 0154 warning('manopt:getCost', ... 0155 'No cost provided. The algorithm will likely abort.'); 0156 end 0157 if ~canGetGradient(problem) && ~canGetApproxGradient(problem) 0158 warning('manopt:getGradient:approx', ... 0159 ['No gradient provided. Using an FD approximation instead (slow).\n' ... 0160 'It may be necessary to increase options.tolgradnorm.\n' ... 0161 'To disable this warning: warning(''off'', ''manopt:getGradient:approx'')']); 0162 problem.approxgrad = approxgradientFD(problem); 0163 end 0164 0165 % Set local defaults here 0166 localdefaults.minstepsize = 1e-10; 0167 localdefaults.maxiter = 1000; 0168 localdefaults.tolgradnorm = 1e-6; 0169 localdefaults.storedepth = 20; 0170 % Changed by NB : H-S has the "auto restart" property. 0171 % See Hager-Zhang 2005/2006 survey about CG methods. 0172 % The auto restart comes from the 'max(0, ...)', not so much from the 0173 % reason stated in Hager-Zhang I think. P-R also has auto restart. 0174 localdefaults.beta_type = 'H-S'; 0175 localdefaults.orth_value = Inf; % by BM as suggested in Nocedal and Wright 0176 0177 0178 % Depending on whether the problem structure specifies a hint for 0179 % line-search algorithms, choose a default line-search that works on 0180 % its own (typical) or that uses the hint. 0181 if ~canGetLinesearch(problem) 0182 localdefaults.linesearch = @linesearch_adaptive; 0183 else 0184 localdefaults.linesearch = @linesearch_hint; 0185 end 0186 0187 % Merge global and local defaults, then merge w/ user options, if any. 0188 localdefaults = mergeOptions(getGlobalDefaults(), localdefaults); 0189 if ~exist('options', 'var') || isempty(options) 0190 options = struct(); 0191 end 0192 options = mergeOptions(localdefaults, options); 0193 0194 timetic = tic(); 0195 0196 % If no initial point x is given by the user, generate one at random. 0197 if ~exist('x', 'var') || isempty(x) 0198 x = M.rand(); 0199 end 0200 0201 % Create a store database and generate a key for the current x 0202 storedb = StoreDB(options.storedepth); 0203 key = storedb.getNewKey(); 0204 0205 % Compute cost-related quantities for x 0206 [cost, grad] = getCostGrad(problem, x, storedb, key); 0207 gradnorm = M.norm(x, grad); 0208 Pgrad = getPrecon(problem, x, grad, storedb, key); 0209 gradPgrad = M.inner(x, grad, Pgrad); 0210 0211 % Iteration counter (at any point, iter is the number of fully executed 0212 % iterations so far) 0213 iter = 0; 0214 0215 % Save stats in a struct array info and preallocate. 0216 stats = savestats(); 0217 info(1) = stats; 0218 info(min(10000, options.maxiter+1)).iter = []; 0219 0220 0221 if options.verbosity >= 2 0222 fprintf(' iter\t cost val\t grad. norm\n'); 0223 end 0224 0225 % Compute a first descent direction (not normalized) 0226 desc_dir = M.lincomb(x, -1, Pgrad); 0227 0228 0229 % Start iterating until stopping criterion triggers 0230 while true 0231 0232 % Display iteration information 0233 if options.verbosity >= 2 0234 fprintf('%5d\t%+.16e\t%.8e\n', iter, cost, gradnorm); 0235 end 0236 0237 % Start timing this iteration 0238 timetic = tic(); 0239 0240 % Run standard stopping criterion checks 0241 [stop, reason] = stoppingcriterion(problem, x, options, info, iter+1); 0242 0243 % Run specific stopping criterion check 0244 if ~stop && abs(stats.stepsize) < options.minstepsize 0245 stop = true; 0246 reason = sprintf(['Last stepsize smaller than minimum ' ... 0247 'allowed; options.minstepsize = %g.'], ... 0248 options.minstepsize); 0249 end 0250 0251 if stop 0252 if options.verbosity >= 1 0253 fprintf([reason '\n']); 0254 end 0255 break; 0256 end 0257 0258 0259 % The line search algorithms require the directional derivative of the 0260 % cost at the current point x along the search direction. 0261 df0 = M.inner(x, grad, desc_dir); 0262 0263 % If we didn't get a descent direction: restart, i.e., switch to the 0264 % negative gradient. Equivalent to resetting the CG direction to a 0265 % steepest descent step, which discards the past information. 0266 if df0 >= 0 0267 0268 % Or we switch to the negative gradient direction. 0269 if options.verbosity >= 3 0270 fprintf(['Conjugate gradient info: got an ascent direction '... 0271 '(df0 = %2e), reset to the (preconditioned) '... 0272 'steepest descent direction.\n'], df0); 0273 end 0274 % Reset to negative gradient: this discards the CG memory. 0275 desc_dir = M.lincomb(x, -1, Pgrad); 0276 df0 = -gradPgrad; 0277 0278 end 0279 0280 0281 % Execute line search 0282 [stepsize, newx, newkey, lsstats] = options.linesearch( ... 0283 problem, x, desc_dir, cost, df0, options, storedb, key); 0284 0285 0286 % Compute the new cost-related quantities for newx 0287 [newcost, newgrad] = getCostGrad(problem, newx, storedb, newkey); 0288 newgradnorm = M.norm(newx, newgrad); 0289 Pnewgrad = getPrecon(problem, newx, newgrad, storedb, newkey); 0290 newgradPnewgrad = M.inner(newx, newgrad, Pnewgrad); 0291 0292 0293 % Apply the CG scheme to compute the next search direction. 0294 % 0295 % This paper https://www.math.lsu.edu/~hozhang/papers/cgsurvey.pdf 0296 % by Hager and Zhang lists many known beta rules. The rules defined 0297 % here can be found in that paper (or are provided with additional 0298 % references), adapted to the Riemannian setting. 0299 % 0300 if strcmpi(options.beta_type, 'steep') || ... 0301 strcmpi(options.beta_type, 'S-D') % Gradient Descent 0302 0303 beta = 0; 0304 desc_dir = M.lincomb(newx, -1, Pnewgrad); 0305 0306 else 0307 0308 oldgrad = M.transp(x, newx, grad); 0309 orth_grads = M.inner(newx, oldgrad, Pnewgrad) / newgradPnewgrad; 0310 0311 % Powell's restart strategy (see page 12 of Hager and Zhang's 0312 % survey on conjugate gradient methods, for example) 0313 if abs(orth_grads) >= options.orth_value 0314 beta = 0; 0315 desc_dir = M.lincomb(x, -1, Pnewgrad); 0316 0317 else % Compute the CG modification 0318 0319 desc_dir = M.transp(x, newx, desc_dir); 0320 0321 switch upper(options.beta_type) 0322 0323 case 'F-R' % Fletcher-Reeves 0324 beta = newgradPnewgrad / gradPgrad; 0325 0326 case 'P-R' % Polak-Ribiere+ 0327 % vector grad(new) - transported grad(current) 0328 diff = M.lincomb(newx, 1, newgrad, -1, oldgrad); 0329 ip_diff = M.inner(newx, Pnewgrad, diff); 0330 beta = ip_diff / gradPgrad; 0331 beta = max(0, beta); 0332 0333 case 'H-S' % Hestenes-Stiefel+ 0334 diff = M.lincomb(newx, 1, newgrad, -1, oldgrad); 0335 ip_diff = M.inner(newx, Pnewgrad, diff); 0336 beta = ip_diff / M.inner(newx, diff, desc_dir); 0337 beta = max(0, beta); 0338 0339 case 'H-Z' % Hager-Zhang+ 0340 diff = M.lincomb(newx, 1, newgrad, -1, oldgrad); 0341 Poldgrad = M.transp(x, newx, Pgrad); 0342 Pdiff = M.lincomb(newx, 1, Pnewgrad, -1, Poldgrad); 0343 deno = M.inner(newx, diff, desc_dir); 0344 numo = M.inner(newx, diff, Pnewgrad); 0345 numo = numo - 2*M.inner(newx, diff, Pdiff)*... 0346 M.inner(newx, desc_dir, newgrad) / deno; 0347 beta = numo / deno; 0348 0349 % Robustness (see Hager-Zhang paper mentioned above) 0350 desc_dir_norm = M.norm(newx, desc_dir); 0351 eta_HZ = -1 / ( desc_dir_norm * min(0.01, gradnorm) ); 0352 beta = max(beta, eta_HZ); 0353 0354 case 'L-S' % Liu-Storey+ from Sato 0355 diff = M.lincomb(newx, 1, newgrad, -1, oldgrad); 0356 ip_diff = M.inner(newx, Pnewgrad, diff); 0357 denom = -1*M.inner(x, grad, desc_dir); 0358 betaLS = ip_diff / denom; 0359 betaCD = newgradPnewgrad / denom; 0360 beta = max(0, min(betaLS, betaCD)); 0361 0362 otherwise 0363 error(['Unknown options.beta_type. ' ... 0364 'Should be steep, S-D, F-R, P-R, H-S, H-Z, or L-S.']); 0365 end 0366 0367 desc_dir = M.lincomb(newx, -1, Pnewgrad, beta, desc_dir); 0368 0369 end 0370 0371 end 0372 0373 % Transfer iterate info. 0374 storedb.removefirstifdifferent(key, newkey); 0375 x = newx; 0376 key = newkey; 0377 cost = newcost; 0378 grad = newgrad; 0379 Pgrad = Pnewgrad; 0380 gradnorm = newgradnorm; 0381 gradPgrad = newgradPnewgrad; 0382 0383 % iter is the number of iterations we have accomplished. 0384 iter = iter + 1; 0385 0386 % Make sure we don't use too much memory for the store database. 0387 storedb.purge(); 0388 0389 % Log statistics for freshly executed iteration. 0390 stats = savestats(); 0391 info(iter+1) = stats; 0392 0393 end 0394 0395 0396 info = info(1:iter+1); 0397 0398 if options.verbosity >= 1 0399 fprintf('Total time is %f [s] (excludes statsfun)\n', info(end).time); 0400 end 0401 0402 0403 % Routine in charge of collecting the current iteration stats 0404 function stats = savestats() 0405 stats.iter = iter; 0406 stats.cost = cost; 0407 stats.gradnorm = gradnorm; 0408 if iter == 0 0409 stats.stepsize = nan; 0410 stats.time = toc(timetic); 0411 stats.linesearch = []; 0412 stats.beta = 0; 0413 else 0414 stats.stepsize = stepsize; 0415 stats.time = info(iter).time + toc(timetic); 0416 stats.linesearch = lsstats; 0417 stats.beta = beta; 0418 end 0419 stats = applyStatsfun(problem, x, storedb, key, options, stats); 0420 end 0421 0422 end 0423 0424