Riemannian trust-regions solver for optimization on manifolds. function [x, cost, info, options] = trustregions(problem) function [x, cost, info, options] = trustregions(problem, x0) function [x, cost, info, options] = trustregions(problem, x0, options) function [x, cost, info, options] = trustregions(problem, [], options) This is the Riemannian Trust-Region solver (with tCG inner solve), named RTR. This solver will attempt to minimize the cost function described in the problem structure. It requires the availability of the cost function and of its gradient. It will issue calls for the Hessian. If no Hessian nor approximate Hessian is provided, a standard approximation of the Hessian based on the gradient will be computed. If a preconditioner for the Hessian is provided, it will be used. If no gradient is provided, an approximation of the gradient is computed, but this can be slow for manifolds of high dimension. For a description of the algorithm and theorems offering convergence guarantees, see the references below. Documentation for this solver is available online at: http://www.manopt.org/solver_documentation_trustregions.html The initial iterate is x0 if it is provided. Otherwise, a random point on the manifold is picked. To specify options whilst not specifying an initial iterate, give x0 as [] (the empty matrix). The two outputs 'x' and 'cost' are the last reached point on the manifold and its cost. Notice that x is not necessarily the best reached point, because this solver is not forced to be a descent method. In particular, very close to convergence, it is sometimes preferable to accept very slight increases in the cost value (on the order of the machine epsilon) in the process of reaching fine convergence. In practice, this is not a limiting factor, as normally one does not need fine enough convergence that this becomes an issue. The output 'info' is a struct-array which contains information about the iterations: iter (integer) The (outer) iteration number, or number of steps considered (whether accepted or rejected). The initial guess is 0. cost (double) The corresponding cost value. gradnorm (double) The (Riemannian) norm of the gradient. numinner (integer) The number of inner iterations executed to compute this iterate. Inner iterations are truncated-CG steps. Each one requires a Hessian (or approximate Hessian) evaluation. time (double) The total elapsed time in seconds to reach the corresponding cost. rho (double) The performance ratio for the iterate. rhonum, rhoden (double) Regularized numerator and denominator of the performance ratio: rho = rhonum/rhoden. See options.rho_regularization. accepted (boolean) Whether the proposed iterate was accepted or not. stepsize (double) The (Riemannian) norm of the vector returned by the inner solver tCG and which is retracted to obtain the proposed next iterate. If accepted = true for the corresponding iterate, this is the size of the step from the previous to the new iterate. If accepted is false, the step was not executed and this is the size of the rejected step. Delta (double) The trust-region radius at the outer iteration. cauchy (boolean) Whether the Cauchy point was used or not (if useRand is true). And possibly additional information logged by options.statsfun. For example, type [info.gradnorm] to obtain a vector of the successive gradient norms reached at each (outer) iteration. The options structure is used to overwrite the default values. All options have a default value and are hence optional. To force an option value, pass an options structure with a field options.optionname, where optionname is one of the following and the default value is indicated between parentheses: tolgradnorm (1e-6) The algorithm terminates if the norm of the gradient drops below this. For well-scaled problems, a rule of thumb is that you can expect to reduce the gradient norm by 8 orders of magnitude (sqrt(eps)) compared to the gradient norm at a "typical" point (a rough initial iterate for example). Further decrease is sometimes possible, but inexact floating point arithmetic will eventually limit the final accuracy. If tolgradnorm is set too low, the algorithm may end up iterating forever (or at least until another stopping criterion triggers). maxiter (1000) The algorithm terminates if maxiter (outer) iterations were executed. maxtime (Inf) The algorithm terminates if maxtime seconds elapsed. miniter (3) Minimum number of outer iterations (used only if useRand is true). mininner (1) Minimum number of inner iterations (for tCG). maxinner (problem.M.dim() : the manifold's dimension) Maximum number of inner iterations (for tCG). Delta_bar (problem.M.typicaldist() or sqrt(problem.M.dim())) Maximum trust-region radius. If you specify this parameter but not Delta0, then Delta0 will be set to 1/8 times this parameter. Delta0 (Delta_bar/8) Initial trust-region radius. If you observe a long plateau at the beginning of the convergence plot (gradient norm VS iteration), it may pay off to try to tune this parameter to shorten the plateau. You should not set this parameter without setting Delta_bar too (at a larger value). useRand (false) Set to true if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated. kappa (0.1) tCG inner kappa convergence tolerance. kappa > 0 is the linear convergence target rate: tCG will terminate early if the residual was reduced by a factor of kappa. theta (1.0) tCG inner theta convergence tolerance. 1+theta (theta between 0 and 1) is the superlinear convergence target rate. tCG will terminate early if the residual was reduced by a power of 1+theta. rho_prime (0.1) Accept/reject threshold : if rho is at least rho_prime, the outer iteration is accepted. Otherwise, it is rejected. In case it is rejected, the trust-region radius will have been decreased. To ensure this, rho_prime >= 0 must be strictly smaller than 1/4. If rho_prime is negative, the algorithm is not guaranteed to produce monotonically decreasing cost values. It is strongly recommended to set rho_prime > 0, to aid convergence. rho_regularization (1e3) Close to convergence, evaluating the performance ratio rho is numerically challenging. Meanwhile, close to convergence, the quadratic model should be a good fit and the steps should be accepted. Regularization lets rho go to 1 as the model decrease and the actual decrease go to zero. Set this option to zero to disable regularization (not recommended). See in-code for the specifics. When this is not zero, it may happen that the iterates produced are not monotonically improving the cost when very close to convergence. This is because the corrected cost improvement could change sign if it is negative but very small. statsfun (none) Function handle to a function that will be called after each iteration to provide the opportunity to log additional statistics. They will be returned in the info struct. See the generic Manopt documentation about solvers for further information. statsfun is called with the point x that was reached last, after the accept/reject decision. See comment below. stopfun (none) Function handle to a function that will be called at each iteration to provide the opportunity to specify additional stopping criteria. See the generic Manopt documentation about solvers for further information. verbosity (2) Integer number used to tune the amount of output the algorithm generates during execution (mostly as text in the command window). The higher, the more output. 0 means silent. 3 and above includes a display of the options structure at the beginning of the execution. debug (false) Set to true to allow the algorithm to perform additional computations for debugging purposes. If a debugging test fails, you will be informed of it, usually via the command window. Be aware that these additional computations appear in the algorithm timings too, and may interfere with operations such as counting the number of cost evaluations, etc. (the debug calls get storedb too). storedepth (20) Maximum number of different points x of the manifold for which a store structure will be kept in memory in the storedb. If the caching features of Manopt are not used, this is irrelevant. If memory usage is an issue, you may try to lower this number. Profiling may then help to investigate if a performance hit was incured as a result. Notice that statsfun is called with the point x that was reached last, after the accept/reject decision. Hence: if the step was accepted, we get that new x, with a store which only saw the call for the cost and for the gradient. If the step was rejected, we get the same x as previously, with the store structure containing everything that was computed at that point (possibly including previous rejects at that same point). Hence, statsfun should not be used in conjunction with the store to count operations for example. Instead, you should use storedb's shared memory for such purposes (either via storedb.shared, or via store.shared, see online documentation). It is however possible to use statsfun with the store to compute, for example, other merit functions on the point x (other than the actual cost function, that is). Please cite the Manopt paper as well as the research paper: @Article{genrtr, Title = {Trust-region methods on {Riemannian} manifolds}, Author = {Absil, P.-A. and Baker, C. G. and Gallivan, K. A.}, Journal = {Foundations of Computational Mathematics}, Year = {2007}, Number = {3}, Pages = {303--330}, Volume = {7}, Doi = {10.1007/s10208-005-0179-9} } See also: steepestdescent conjugategradient manopt/examples

- StoreDB
- applyStatsfun Apply the statsfun function to a stats structure (for solvers).
- canGetApproxGradient Checks whether an approximate gradient can be computed for this problem.
- canGetApproxHessian Checks whether an approximate Hessian can be computed for this problem.
- canGetCost Checks whether the cost function can be computed for a problem structure.
- canGetGradient Checks whether the gradient can be computed for a problem structure.
- canGetHessian Checks whether the Hessian can be computed for a problem structure.
- getCost Computes the cost function at x.
- getCostGrad Computes the cost function and the gradient at x in one call if possible.
- getDirectionalDerivative Computes the directional derivative of the cost function at x along d.
- getGlobalDefaults Returns a structure with default option values for Manopt.
- getGradient Computes the gradient of the cost function at x.
- getHessian Computes the Hessian of the cost function at x along d.
- mergeOptions Merges two options structures with one having precedence over the other.
- stoppingcriterion Checks for standard stopping criteria, as a helper to solvers.
- approxgradientFD Gradient approx. fnctn handle based on finite differences of the cost.
- approxhessianFD Hessian approx. fnctn handle based on finite differences of the gradient.
- tCG tCG - Truncated (Steihaug-Toint) Conjugate-Gradient method

- dominant_invariant_subspace Returns an orthonormal basis of the dominant invariant p-subspace of A.
- dominant_invariant_subspace_complex Returns a unitary basis of the dominant invariant p-subspace of A.
- elliptope_SDP Solver for semidefinite programs (SDP's) with unit diagonal constraints.
- elliptope_SDP_complex Solver for complex semidefinite programs (SDP's) with unit diagonal.
- essential_svd Sample solution of an optimization problem on the essential manifold.
- generalized_eigenvalue_computation Returns orthonormal basis of the dominant invariant p-subspace of B^-1 A.
- generalized_procrustes Rotationally align clouds of points (generalized Procrustes problem)
- low_rank_dist_completion Perform low-rank distance matrix completion w/ automatic rank detection.
- low_rank_matrix_completion Given partial observation of a low rank matrix, attempts to complete it.
- low_rank_tensor_completion Given partial observation of a low rank tensor, attempts to complete it.
- maxcut Algorithm to (try to) compute a maximum cut of a graph, via SDP approach.
- positive_definite_karcher_mean Computes a Karcher mean of a collection of positive definite matrices.
- radio_interferometric_calibration Returns the gain matrices of N stations with K receivers.
- robust_pca Computes a robust version of PCA (principal component analysis) on data.
- shapefit_smoothed ShapeFit formulation for sensor network localization from pair directions
- sparse_pca Sparse principal component analysis based on optimization over Stiefel.
- truncated_svd Returns an SVD decomposition of A truncated to rank p.
- centroid Attempts the computation of a centroid of a set of points on a manifold.
- preconhessiansolve Preconditioner based on the inverse Hessian, by solving linear systems.
- hessianextreme Compute an extreme eigenvector / eigenvalue of the Hessian of a problem.
- manoptsolve Gateway helper function to call a Manopt solver, chosen in the options.

0001 function [x, cost, info, options] = trustregions(problem, x, options) 0002 % Riemannian trust-regions solver for optimization on manifolds. 0003 % 0004 % function [x, cost, info, options] = trustregions(problem) 0005 % function [x, cost, info, options] = trustregions(problem, x0) 0006 % function [x, cost, info, options] = trustregions(problem, x0, options) 0007 % function [x, cost, info, options] = trustregions(problem, [], options) 0008 % 0009 % This is the Riemannian Trust-Region solver (with tCG inner solve), named 0010 % RTR. This solver will attempt to minimize the cost function described in 0011 % the problem structure. It requires the availability of the cost function 0012 % and of its gradient. It will issue calls for the Hessian. If no Hessian 0013 % nor approximate Hessian is provided, a standard approximation of the 0014 % Hessian based on the gradient will be computed. If a preconditioner for 0015 % the Hessian is provided, it will be used. 0016 % 0017 % If no gradient is provided, an approximation of the gradient is computed, 0018 % but this can be slow for manifolds of high dimension. 0019 % 0020 % For a description of the algorithm and theorems offering convergence 0021 % guarantees, see the references below. Documentation for this solver is 0022 % available online at: 0023 % 0024 % http://www.manopt.org/solver_documentation_trustregions.html 0025 % 0026 % 0027 % The initial iterate is x0 if it is provided. Otherwise, a random point on 0028 % the manifold is picked. To specify options whilst not specifying an 0029 % initial iterate, give x0 as [] (the empty matrix). 0030 % 0031 % The two outputs 'x' and 'cost' are the last reached point on the manifold 0032 % and its cost. Notice that x is not necessarily the best reached point, 0033 % because this solver is not forced to be a descent method. In particular, 0034 % very close to convergence, it is sometimes preferable to accept very 0035 % slight increases in the cost value (on the order of the machine epsilon) 0036 % in the process of reaching fine convergence. In practice, this is not a 0037 % limiting factor, as normally one does not need fine enough convergence 0038 % that this becomes an issue. 0039 % 0040 % The output 'info' is a struct-array which contains information about the 0041 % iterations: 0042 % iter (integer) 0043 % The (outer) iteration number, or number of steps considered 0044 % (whether accepted or rejected). The initial guess is 0. 0045 % cost (double) 0046 % The corresponding cost value. 0047 % gradnorm (double) 0048 % The (Riemannian) norm of the gradient. 0049 % numinner (integer) 0050 % The number of inner iterations executed to compute this iterate. 0051 % Inner iterations are truncated-CG steps. Each one requires a 0052 % Hessian (or approximate Hessian) evaluation. 0053 % time (double) 0054 % The total elapsed time in seconds to reach the corresponding cost. 0055 % rho (double) 0056 % The performance ratio for the iterate. 0057 % rhonum, rhoden (double) 0058 % Regularized numerator and denominator of the performance ratio: 0059 % rho = rhonum/rhoden. See options.rho_regularization. 0060 % accepted (boolean) 0061 % Whether the proposed iterate was accepted or not. 0062 % stepsize (double) 0063 % The (Riemannian) norm of the vector returned by the inner solver 0064 % tCG and which is retracted to obtain the proposed next iterate. If 0065 % accepted = true for the corresponding iterate, this is the size of 0066 % the step from the previous to the new iterate. If accepted is 0067 % false, the step was not executed and this is the size of the 0068 % rejected step. 0069 % Delta (double) 0070 % The trust-region radius at the outer iteration. 0071 % cauchy (boolean) 0072 % Whether the Cauchy point was used or not (if useRand is true). 0073 % And possibly additional information logged by options.statsfun. 0074 % For example, type [info.gradnorm] to obtain a vector of the successive 0075 % gradient norms reached at each (outer) iteration. 0076 % 0077 % The options structure is used to overwrite the default values. All 0078 % options have a default value and are hence optional. To force an option 0079 % value, pass an options structure with a field options.optionname, where 0080 % optionname is one of the following and the default value is indicated 0081 % between parentheses: 0082 % 0083 % tolgradnorm (1e-6) 0084 % The algorithm terminates if the norm of the gradient drops below 0085 % this. For well-scaled problems, a rule of thumb is that you can 0086 % expect to reduce the gradient norm by 8 orders of magnitude 0087 % (sqrt(eps)) compared to the gradient norm at a "typical" point (a 0088 % rough initial iterate for example). Further decrease is sometimes 0089 % possible, but inexact floating point arithmetic will eventually 0090 % limit the final accuracy. If tolgradnorm is set too low, the 0091 % algorithm may end up iterating forever (or at least until another 0092 % stopping criterion triggers). 0093 % maxiter (1000) 0094 % The algorithm terminates if maxiter (outer) iterations were executed. 0095 % maxtime (Inf) 0096 % The algorithm terminates if maxtime seconds elapsed. 0097 % miniter (3) 0098 % Minimum number of outer iterations (used only if useRand is true). 0099 % mininner (1) 0100 % Minimum number of inner iterations (for tCG). 0101 % maxinner (problem.M.dim() : the manifold's dimension) 0102 % Maximum number of inner iterations (for tCG). 0103 % Delta_bar (problem.M.typicaldist() or sqrt(problem.M.dim())) 0104 % Maximum trust-region radius. If you specify this parameter but not 0105 % Delta0, then Delta0 will be set to 1/8 times this parameter. 0106 % Delta0 (Delta_bar/8) 0107 % Initial trust-region radius. If you observe a long plateau at the 0108 % beginning of the convergence plot (gradient norm VS iteration), it 0109 % may pay off to try to tune this parameter to shorten the plateau. 0110 % You should not set this parameter without setting Delta_bar too (at 0111 % a larger value). 0112 % useRand (false) 0113 % Set to true if the trust-region solve is to be initiated with a 0114 % random tangent vector. If set to true, no preconditioner will be 0115 % used. This option is set to true in some scenarios to escape saddle 0116 % points, but is otherwise seldom activated. 0117 % kappa (0.1) 0118 % tCG inner kappa convergence tolerance. 0119 % kappa > 0 is the linear convergence target rate: tCG will terminate 0120 % early if the residual was reduced by a factor of kappa. 0121 % theta (1.0) 0122 % tCG inner theta convergence tolerance. 0123 % 1+theta (theta between 0 and 1) is the superlinear convergence 0124 % target rate. tCG will terminate early if the residual was reduced 0125 % by a power of 1+theta. 0126 % rho_prime (0.1) 0127 % Accept/reject threshold : if rho is at least rho_prime, the outer 0128 % iteration is accepted. Otherwise, it is rejected. In case it is 0129 % rejected, the trust-region radius will have been decreased. 0130 % To ensure this, rho_prime >= 0 must be strictly smaller than 1/4. 0131 % If rho_prime is negative, the algorithm is not guaranteed to 0132 % produce monotonically decreasing cost values. It is strongly 0133 % recommended to set rho_prime > 0, to aid convergence. 0134 % rho_regularization (1e3) 0135 % Close to convergence, evaluating the performance ratio rho is 0136 % numerically challenging. Meanwhile, close to convergence, the 0137 % quadratic model should be a good fit and the steps should be 0138 % accepted. Regularization lets rho go to 1 as the model decrease and 0139 % the actual decrease go to zero. Set this option to zero to disable 0140 % regularization (not recommended). See in-code for the specifics. 0141 % When this is not zero, it may happen that the iterates produced are 0142 % not monotonically improving the cost when very close to 0143 % convergence. This is because the corrected cost improvement could 0144 % change sign if it is negative but very small. 0145 % statsfun (none) 0146 % Function handle to a function that will be called after each 0147 % iteration to provide the opportunity to log additional statistics. 0148 % They will be returned in the info struct. See the generic Manopt 0149 % documentation about solvers for further information. statsfun is 0150 % called with the point x that was reached last, after the 0151 % accept/reject decision. See comment below. 0152 % stopfun (none) 0153 % Function handle to a function that will be called at each iteration 0154 % to provide the opportunity to specify additional stopping criteria. 0155 % See the generic Manopt documentation about solvers for further 0156 % information. 0157 % verbosity (2) 0158 % Integer number used to tune the amount of output the algorithm 0159 % generates during execution (mostly as text in the command window). 0160 % The higher, the more output. 0 means silent. 3 and above includes a 0161 % display of the options structure at the beginning of the execution. 0162 % debug (false) 0163 % Set to true to allow the algorithm to perform additional 0164 % computations for debugging purposes. If a debugging test fails, you 0165 % will be informed of it, usually via the command window. Be aware 0166 % that these additional computations appear in the algorithm timings 0167 % too, and may interfere with operations such as counting the number 0168 % of cost evaluations, etc. (the debug calls get storedb too). 0169 % storedepth (20) 0170 % Maximum number of different points x of the manifold for which a 0171 % store structure will be kept in memory in the storedb. If the 0172 % caching features of Manopt are not used, this is irrelevant. If 0173 % memory usage is an issue, you may try to lower this number. 0174 % Profiling may then help to investigate if a performance hit was 0175 % incured as a result. 0176 % 0177 % Notice that statsfun is called with the point x that was reached last, 0178 % after the accept/reject decision. Hence: if the step was accepted, we get 0179 % that new x, with a store which only saw the call for the cost and for the 0180 % gradient. If the step was rejected, we get the same x as previously, with 0181 % the store structure containing everything that was computed at that point 0182 % (possibly including previous rejects at that same point). Hence, statsfun 0183 % should not be used in conjunction with the store to count operations for 0184 % example. Instead, you should use storedb's shared memory for such 0185 % purposes (either via storedb.shared, or via store.shared, see 0186 % online documentation). It is however possible to use statsfun with the 0187 % store to compute, for example, other merit functions on the point x 0188 % (other than the actual cost function, that is). 0189 % 0190 % 0191 % Please cite the Manopt paper as well as the research paper: 0192 % @Article{genrtr, 0193 % Title = {Trust-region methods on {Riemannian} manifolds}, 0194 % Author = {Absil, P.-A. and Baker, C. G. and Gallivan, K. A.}, 0195 % Journal = {Foundations of Computational Mathematics}, 0196 % Year = {2007}, 0197 % Number = {3}, 0198 % Pages = {303--330}, 0199 % Volume = {7}, 0200 % Doi = {10.1007/s10208-005-0179-9} 0201 % } 0202 % 0203 % See also: steepestdescent conjugategradient manopt/examples 0204 0205 % An explicit, general listing of this algorithm, with preconditioning, 0206 % can be found in the following paper: 0207 % @Article{boumal2015lowrank, 0208 % Title = {Low-rank matrix completion via preconditioned optimization on the {G}rassmann manifold}, 0209 % Author = {Boumal, N. and Absil, P.-A.}, 0210 % Journal = {Linear Algebra and its Applications}, 0211 % Year = {2015}, 0212 % Pages = {200--239}, 0213 % Volume = {475}, 0214 % Doi = {10.1016/j.laa.2015.02.027}, 0215 % } 0216 0217 % When the Hessian is not specified, it is approximated with 0218 % finite-differences of the gradient. The resulting method is called 0219 % RTR-FD. Some convergence theory for it is available in this paper: 0220 % @incollection{boumal2015rtrfd 0221 % author={Boumal, N.}, 0222 % title={Riemannian trust regions with finite-difference Hessian approximations are globally convergent}, 0223 % year={2015}, 0224 % booktitle={Geometric Science of Information} 0225 % } 0226 0227 0228 % This file is part of Manopt: www.manopt.org. 0229 % This code is an adaptation to Manopt of the original GenRTR code: 0230 % RTR - Riemannian Trust-Region 0231 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan 0232 % Florida State University 0233 % School of Computational Science 0234 % (http://www.math.fsu.edu/~cbaker/GenRTR/?page=download) 0235 % See accompanying license file. 0236 % The adaptation was executed by Nicolas Boumal. 0237 % 0238 % 0239 % Change log: 0240 % 0241 % NB April 3, 2013: 0242 % tCG now returns the Hessian along the returned direction eta, so 0243 % that we do not compute that Hessian redundantly: some savings at 0244 % each iteration. Similarly, if the useRand flag is on, we spare an 0245 % extra Hessian computation at each outer iteration too, owing to 0246 % some modifications in the Cauchy point section of the code specific 0247 % to useRand = true. 0248 % 0249 % NB Aug. 22, 2013: 0250 % This function is now Octave compatible. The transition called for 0251 % two changes which would otherwise not be advisable. (1) tic/toc is 0252 % now used as is, as opposed to the safer way: 0253 % t = tic(); elapsed = toc(t); 0254 % And (2), the (formerly inner) function savestats was moved outside 0255 % the main function to not be nested anymore. This is arguably less 0256 % elegant, but Octave does not (and likely will not) support nested 0257 % functions. 0258 % 0259 % NB Dec. 2, 2013: 0260 % The in-code documentation was largely revised and expanded. 0261 % 0262 % NB Dec. 2, 2013: 0263 % The former heuristic which triggered when rhonum was very small and 0264 % forced rho = 1 has been replaced by a smoother heuristic which 0265 % consists in regularizing rhonum and rhoden before computing their 0266 % ratio. It is tunable via options.rho_regularization. Furthermore, 0267 % the solver now detects if tCG did not obtain a model decrease 0268 % (which is theoretically impossible but may happen because of 0269 % numerical errors and/or because of a nonlinear/nonsymmetric Hessian 0270 % operator, which is the case for finite difference approximations). 0271 % When such an anomaly is detected, the step is rejected and the 0272 % trust region radius is decreased. 0273 % Feb. 18, 2015 note: this is less useful now, as tCG now guarantees 0274 % model decrease even for the finite difference approximation of the 0275 % Hessian. It is still useful in case of numerical errors, but this 0276 % is less stringent. 0277 % 0278 % NB Dec. 3, 2013: 0279 % The stepsize is now registered at each iteration, at a small 0280 % additional cost. The defaults for Delta_bar and Delta0 are better 0281 % defined. Setting Delta_bar in the options will automatically set 0282 % Delta0 accordingly. In Manopt 1.0.4, the defaults for these options 0283 % were not treated appropriately because of an incorrect use of the 0284 % isfield() built-in function. 0285 % 0286 % NB Feb. 18, 2015: 0287 % Added some comments. Also, Octave now supports safe tic/toc usage, 0288 % so we reverted the changes to use that again (see Aug. 22, 2013 log 0289 % entry). 0290 % 0291 % NB April 3, 2015: 0292 % Works with the new StoreDB class system. 0293 % 0294 % NB April 8, 2015: 0295 % No Hessian warning if approximate Hessian explicitly available. 0296 % 0297 % NB Nov. 1, 2016: 0298 % Now uses approximate gradient via finite differences if need be. 0299 0300 0301 % Verify that the problem description is sufficient for the solver. 0302 if ~canGetCost(problem) 0303 warning('manopt:getCost', ... 0304 'No cost provided. The algorithm will likely abort.'); 0305 end 0306 if ~canGetGradient(problem) && ~canGetApproxGradient(problem) 0307 % Note: we do not give a warning if an approximate gradient is 0308 % explicitly given in the problem description, as in that case the user 0309 % seems to be aware of the issue. 0310 warning('manopt:getGradient:approx', ... 0311 ['No gradient provided. Using an FD approximation instead (slow).\n' ... 0312 'It may be necessary to increase options.tolgradnorm.\n' ... 0313 'To disable this warning: warning(''off'', ''manopt:getGradient:approx'')']); 0314 problem.approxgrad = approxgradientFD(problem); 0315 end 0316 if ~canGetHessian(problem) && ~canGetApproxHessian(problem) 0317 % Note: we do not give a warning if an approximate Hessian is 0318 % explicitly given in the problem description, as in that case the user 0319 % seems to be aware of the issue. 0320 warning('manopt:getHessian:approx', ... 0321 ['No Hessian provided. Using an FD approximation instead.\n' ... 0322 'To disable this warning: warning(''off'', ''manopt:getHessian:approx'')']); 0323 problem.approxhess = approxhessianFD(problem); 0324 end 0325 0326 % Define some strings for display 0327 tcg_stop_reason = {'negative curvature',... 0328 'exceeded trust region',... 0329 'reached target residual-kappa (linear)',... 0330 'reached target residual-theta (superlinear)',... 0331 'maximum inner iterations',... 0332 'model increased'}; 0333 0334 % Set local defaults here 0335 localdefaults.verbosity = 2; 0336 localdefaults.maxtime = inf; 0337 localdefaults.miniter = 3; 0338 localdefaults.maxiter = 1000; 0339 localdefaults.mininner = 1; 0340 localdefaults.maxinner = problem.M.dim(); 0341 localdefaults.tolgradnorm = 1e-6; 0342 localdefaults.kappa = 0.1; 0343 localdefaults.theta = 1.0; 0344 localdefaults.rho_prime = 0.1; 0345 localdefaults.useRand = false; 0346 localdefaults.rho_regularization = 1e3; 0347 0348 % Merge global and local defaults, then merge w/ user options, if any. 0349 localdefaults = mergeOptions(getGlobalDefaults(), localdefaults); 0350 if ~exist('options', 'var') || isempty(options) 0351 options = struct(); 0352 end 0353 options = mergeOptions(localdefaults, options); 0354 0355 % Set default Delta_bar and Delta0 separately to deal with additional 0356 % logic: if Delta_bar is provided but not Delta0, let Delta0 automatically 0357 % be some fraction of the provided Delta_bar. 0358 if ~isfield(options, 'Delta_bar') 0359 if isfield(problem.M, 'typicaldist') 0360 options.Delta_bar = problem.M.typicaldist(); 0361 else 0362 options.Delta_bar = sqrt(problem.M.dim()); 0363 end 0364 end 0365 if ~isfield(options,'Delta0') 0366 options.Delta0 = options.Delta_bar / 8; 0367 end 0368 0369 % Check some option values 0370 assert(options.rho_prime < 1/4, ... 0371 'options.rho_prime must be strictly smaller than 1/4.'); 0372 assert(options.Delta_bar > 0, ... 0373 'options.Delta_bar must be positive.'); 0374 assert(options.Delta0 > 0 && options.Delta0 < options.Delta_bar, ... 0375 'options.Delta0 must be positive and smaller than Delta_bar.'); 0376 0377 % It is sometimes useful to check what the actual option values are. 0378 if options.verbosity >= 3 0379 disp(options); 0380 end 0381 0382 ticstart = tic(); 0383 0384 % If no initial point x is given by the user, generate one at random. 0385 if ~exist('x', 'var') || isempty(x) 0386 x = problem.M.rand(); 0387 end 0388 0389 % Create a store database and get a key for the current x 0390 storedb = StoreDB(options.storedepth); 0391 key = storedb.getNewKey(); 0392 0393 %% Initializations 0394 0395 % k counts the outer (TR) iterations. The semantic is that k counts the 0396 % number of iterations fully executed so far. 0397 k = 0; 0398 0399 % Initialize solution and companion measures: f(x), fgrad(x) 0400 [fx, fgradx] = getCostGrad(problem, x, storedb, key); 0401 norm_grad = problem.M.norm(x, fgradx); 0402 0403 % Initialize trust-region radius 0404 Delta = options.Delta0; 0405 0406 % Save stats in a struct array info, and preallocate. 0407 if ~exist('used_cauchy', 'var') 0408 used_cauchy = []; 0409 end 0410 stats = savestats(problem, x, storedb, key, options, k, fx, norm_grad, Delta, ticstart); 0411 info(1) = stats; 0412 info(min(10000, options.maxiter+1)).iter = []; 0413 0414 % ** Display: 0415 if options.verbosity == 2 0416 fprintf(['%3s %3s %5s %5s ',... 0417 'f: %+e |grad|: %e\n'],... 0418 ' ',' ',' ',' ', fx, norm_grad); 0419 elseif options.verbosity > 2 0420 fprintf('************************************************************************\n'); 0421 fprintf('%3s %3s k: %5s num_inner: %5s %s\n',... 0422 '','','______','______',''); 0423 fprintf(' f(x) : %+e |grad| : %e\n', fx, norm_grad); 0424 fprintf(' Delta : %f\n', Delta); 0425 end 0426 0427 % To keep track of consecutive radius changes, so that we can warn the 0428 % user if it appears necessary. 0429 consecutive_TRplus = 0; 0430 consecutive_TRminus = 0; 0431 0432 0433 % ********************** 0434 % ** Start of TR loop ** 0435 % ********************** 0436 while true 0437 0438 % Start clock for this outer iteration 0439 ticstart = tic(); 0440 0441 % Run standard stopping criterion checks 0442 [stop, reason] = stoppingcriterion(problem, x, options, info, k+1); 0443 0444 % If the stopping criterion that triggered is the tolerance on the 0445 % gradient norm but we are using randomization, make sure we make at 0446 % least miniter iterations to give randomization a chance at escaping 0447 % saddle points. 0448 if stop == 2 && options.useRand && k < options.miniter 0449 stop = 0; 0450 end 0451 0452 if stop 0453 if options.verbosity >= 1 0454 fprintf([reason '\n']); 0455 end 0456 break; 0457 end 0458 0459 if options.verbosity > 2 || options.debug > 0 0460 fprintf('************************************************************************\n'); 0461 end 0462 0463 % ************************* 0464 % ** Begin TR Subproblem ** 0465 % ************************* 0466 0467 % Determine eta0 0468 if ~options.useRand 0469 % Pick the zero vector 0470 eta = problem.M.zerovec(x); 0471 else 0472 % Random vector in T_x M (this has to be very small) 0473 eta = problem.M.lincomb(x, 1e-6, problem.M.randvec(x)); 0474 % Must be inside trust-region 0475 while problem.M.norm(x, eta) > Delta 0476 eta = problem.M.lincomb(x, sqrt(sqrt(eps)), eta); 0477 end 0478 end 0479 0480 % Solve TR subproblem approximately 0481 [eta, Heta, numit, stop_inner] = ... 0482 tCG(problem, x, fgradx, eta, Delta, options, storedb, key); 0483 srstr = tcg_stop_reason{stop_inner}; 0484 0485 % If using randomized approach, compare result with the Cauchy point. 0486 % Convergence proofs assume that we achieve at least (a fraction of) 0487 % the reduction of the Cauchy point. After this if-block, either all 0488 % eta-related quantities have been changed consistently, or none of 0489 % them have changed. 0490 if options.useRand 0491 used_cauchy = false; 0492 % Check the curvature, 0493 Hg = getHessian(problem, x, fgradx, storedb, key); 0494 g_Hg = problem.M.inner(x, fgradx, Hg); 0495 if g_Hg <= 0 0496 tau_c = 1; 0497 else 0498 tau_c = min( norm_grad^3/(Delta*g_Hg) , 1); 0499 end 0500 % and generate the Cauchy point. 0501 eta_c = problem.M.lincomb(x, -tau_c * Delta / norm_grad, fgradx); 0502 Heta_c = problem.M.lincomb(x, -tau_c * Delta / norm_grad, Hg); 0503 0504 % Now that we have computed the Cauchy point in addition to the 0505 % returned eta, we might as well keep the best of them. 0506 mdle = fx + problem.M.inner(x, fgradx, eta) ... 0507 + .5*problem.M.inner(x, Heta, eta); 0508 mdlec = fx + problem.M.inner(x, fgradx, eta_c) ... 0509 + .5*problem.M.inner(x, Heta_c, eta_c); 0510 if mdlec < mdle 0511 eta = eta_c; 0512 Heta = Heta_c; % added April 11, 2012 0513 used_cauchy = true; 0514 end 0515 end 0516 0517 0518 % This is only computed for logging purposes, because it may be useful 0519 % for some user-defined stopping criteria. If this is not cheap for 0520 % specific applications (compared to evaluating the cost), we should 0521 % reconsider this. 0522 norm_eta = problem.M.norm(x, eta); 0523 0524 if options.debug > 0 0525 testangle = problem.M.inner(x, eta, fgradx) / (norm_eta*norm_grad); 0526 end 0527 0528 0529 % Compute the tentative next iterate (the proposal) 0530 x_prop = problem.M.retr(x, eta); 0531 key_prop = storedb.getNewKey(); 0532 0533 % Compute the function value of the proposal 0534 fx_prop = getCost(problem, x_prop, storedb, key_prop); 0535 0536 % Will we accept the proposal or not? 0537 % Check the performance of the quadratic model against the actual cost. 0538 rhonum = fx - fx_prop; 0539 rhoden = -problem.M.inner(x, fgradx, eta) ... 0540 -.5*problem.M.inner(x, eta, Heta); 0541 % rhonum could be anything. 0542 % rhoden should be nonnegative, as guaranteed by tCG, baring numerical 0543 % errors. 0544 0545 % Heuristic -- added Dec. 2, 2013 (NB) to replace the former heuristic. 0546 % This heuristic is documented in the book by Conn Gould and Toint on 0547 % trust-region methods, section 17.4.2. 0548 % rhonum measures the difference between two numbers. Close to 0549 % convergence, these two numbers are very close to each other, so 0550 % that computing their difference is numerically challenging: there may 0551 % be a significant loss in accuracy. Since the acceptance or rejection 0552 % of the step is conditioned on the ratio between rhonum and rhoden, 0553 % large errors in rhonum result in a very large error in rho, hence in 0554 % erratic acceptance / rejection. Meanwhile, close to convergence, 0555 % steps are usually trustworthy and we should transition to a Newton- 0556 % like method, with rho=1 consistently. The heuristic thus shifts both 0557 % rhonum and rhoden by a small amount such that far from convergence, 0558 % the shift is irrelevant and close to convergence, the ratio rho goes 0559 % to 1, effectively promoting acceptance of the step. 0560 % The rationale is that close to convergence, both rhonum and rhoden 0561 % are quadratic in the distance between x and x_prop. Thus, when this 0562 % distance is on the order of sqrt(eps), the value of rhonum and rhoden 0563 % is on the order of eps, which is indistinguishable from the numerical 0564 % error, resulting in badly estimated rho's. 0565 % For abs(fx) < 1, this heuristic is invariant under offsets of f but 0566 % not under scaling of f. For abs(fx) > 1, the opposite holds. This 0567 % should not alarm us, as this heuristic only triggers at the very last 0568 % iterations if very fine convergence is demanded. 0569 rho_reg = max(1, abs(fx)) * eps * options.rho_regularization; 0570 rhonum = rhonum + rho_reg; 0571 rhoden = rhoden + rho_reg; 0572 0573 if options.debug > 0 0574 fprintf('DBG: rhonum : %e\n', rhonum); 0575 fprintf('DBG: rhoden : %e\n', rhoden); 0576 end 0577 0578 % This is always true if a linear, symmetric operator is used for the 0579 % Hessian (approximation) and if we had infinite numerical precision. 0580 % In practice, nonlinear approximations of the Hessian such as the 0581 % built-in finite difference approximation and finite numerical 0582 % accuracy can cause the model to increase. In such scenarios, we 0583 % decide to force a rejection of the step and a reduction of the 0584 % trust-region radius. We test the sign of the regularized rhoden since 0585 % the regularization is supposed to capture the accuracy to which 0586 % rhoden is computed: if rhoden were negative before regularization but 0587 % not after, that should not be (and is not) detected as a failure. 0588 % 0589 % Note (Feb. 17, 2015, NB): the most recent version of tCG already 0590 % includes a mechanism to ensure model decrease if the Cauchy step 0591 % attained a decrease (which is theoretically the case under very lax 0592 % assumptions). This being said, it is always possible that numerical 0593 % errors will prevent this, so that it is good to keep a safeguard. 0594 % 0595 % The current strategy is that, if this should happen, then we reject 0596 % the step and reduce the trust region radius. This also ensures that 0597 % the actual cost values are monotonically decreasing. 0598 model_decreased = (rhoden >= 0); 0599 0600 if ~model_decreased 0601 srstr = [srstr ', model did not decrease']; %#ok<AGROW> 0602 end 0603 0604 rho = rhonum / rhoden; 0605 0606 % Added June 30, 2015 following observation by BM. 0607 % With this modification, it is guaranteed that a step rejection is 0608 % always accompanied by a TR reduction. This prevents stagnation in 0609 % this "corner case" (NaN's really aren't supposed to occur, but it's 0610 % nice if we can handle them nonetheless). 0611 if isnan(rho) 0612 fprintf('rho is NaN! Forcing a radius decrease. This should not happen.\n'); 0613 if isnan(fx_prop) 0614 fprintf('The cost function returned NaN (perhaps the retraction returned a bad point?)\n'); 0615 else 0616 fprintf('The cost function did not return a NaN value.'); 0617 end 0618 end 0619 0620 if options.debug > 0 0621 m = @(x, eta) ... 0622 getCost(problem, x, storedb, key) + ... 0623 getDirectionalDerivative(problem, x, eta, storedb, key) + ... 0624 .5*problem.M.inner(x, getHessian(problem, x, eta, storedb, key), eta); 0625 zerovec = problem.M.zerovec(x); 0626 actrho = (fx - fx_prop) / (m(x, zerovec) - m(x, eta)); 0627 fprintf('DBG: new f(x) : %+e\n', fx_prop); 0628 fprintf('DBG: actual rho : %e\n', actrho); 0629 fprintf('DBG: used rho : %e\n', rho); 0630 end 0631 0632 % Choose the new TR radius based on the model performance 0633 trstr = ' '; 0634 % If the actual decrease is smaller than 1/4 of the predicted decrease, 0635 % then reduce the TR radius. 0636 if rho < 1/4 || ~model_decreased || isnan(rho) 0637 trstr = 'TR-'; 0638 Delta = Delta/4; 0639 consecutive_TRplus = 0; 0640 consecutive_TRminus = consecutive_TRminus + 1; 0641 if consecutive_TRminus >= 5 && options.verbosity >= 1 0642 consecutive_TRminus = -inf; 0643 fprintf(' +++ Detected many consecutive TR- (radius decreases).\n'); 0644 fprintf(' +++ Consider decreasing options.Delta_bar by an order of magnitude.\n'); 0645 fprintf(' +++ Current values: options.Delta_bar = %g and options.Delta0 = %g.\n', options.Delta_bar, options.Delta0); 0646 end 0647 % If the actual decrease is at least 3/4 of the precicted decrease and 0648 % the tCG (inner solve) hit the TR boundary, increase the TR radius. 0649 % We also keep track of the number of consecutive trust-region radius 0650 % increases. If there are many, this may indicate the need to adapt the 0651 % initial and maximum radii. 0652 elseif rho > 3/4 && (stop_inner == 1 || stop_inner == 2) 0653 trstr = 'TR+'; 0654 Delta = min(2*Delta, options.Delta_bar); 0655 consecutive_TRminus = 0; 0656 consecutive_TRplus = consecutive_TRplus + 1; 0657 if consecutive_TRplus >= 5 && options.verbosity >= 1 0658 consecutive_TRplus = -inf; 0659 fprintf(' +++ Detected many consecutive TR+ (radius increases).\n'); 0660 fprintf(' +++ Consider increasing options.Delta_bar by an order of magnitude.\n'); 0661 fprintf(' +++ Current values: options.Delta_bar = %g and options.Delta0 = %g.\n', options.Delta_bar, options.Delta0); 0662 end 0663 else 0664 % Otherwise, keep the TR radius constant. 0665 consecutive_TRplus = 0; 0666 consecutive_TRminus = 0; 0667 end 0668 0669 % Choose to accept or reject the proposed step based on the model 0670 % performance. Note the strict inequality. 0671 if model_decreased && rho > options.rho_prime 0672 accept = true; 0673 accstr = 'acc'; 0674 x = x_prop; 0675 key = key_prop; 0676 fx = fx_prop; 0677 fgradx = getGradient(problem, x, storedb, key); 0678 norm_grad = problem.M.norm(x, fgradx); 0679 else 0680 accept = false; 0681 accstr = 'REJ'; 0682 end 0683 0684 0685 % Make sure we don't use too much memory for the store database 0686 storedb.purge(); 0687 0688 % k is the number of iterations we have accomplished. 0689 k = k + 1; 0690 0691 % Log statistics for freshly executed iteration. 0692 % Everything after this in the loop is not accounted for in the timing. 0693 stats = savestats(problem, x, storedb, key, options, k, fx, ... 0694 norm_grad, Delta, ticstart, info, rho, rhonum, ... 0695 rhoden, accept, numit, norm_eta, used_cauchy); 0696 info(k+1) = stats; %#ok<AGROW> 0697 0698 0699 % ** Display: 0700 if options.verbosity == 2, 0701 fprintf(['%3s %3s k: %5d num_inner: %5d ', ... 0702 'f: %+e |grad|: %e %s\n'], ... 0703 accstr,trstr,k,numit,fx,norm_grad,srstr); 0704 elseif options.verbosity > 2, 0705 if options.useRand && used_cauchy, 0706 fprintf('USED CAUCHY POINT\n'); 0707 end 0708 fprintf('%3s %3s k: %5d num_inner: %5d %s\n', ... 0709 accstr, trstr, k, numit, srstr); 0710 fprintf(' f(x) : %+e |grad| : %e\n',fx,norm_grad); 0711 if options.debug > 0 0712 fprintf(' Delta : %f |eta| : %e\n',Delta,norm_eta); 0713 end 0714 fprintf(' rho : %e\n',rho); 0715 end 0716 if options.debug > 0, 0717 fprintf('DBG: cos ang(eta,gradf): %d\n',testangle); 0718 if rho == 0 0719 fprintf('DBG: rho = 0, this will likely hinder further convergence.\n'); 0720 end 0721 end 0722 0723 end % of TR loop (counter: k) 0724 0725 % Restrict info struct-array to useful part 0726 info = info(1:k+1); 0727 0728 0729 if (options.verbosity > 2) || (options.debug > 0), 0730 fprintf('************************************************************************\n'); 0731 end 0732 if (options.verbosity > 0) || (options.debug > 0) 0733 fprintf('Total time is %f [s] (excludes statsfun)\n', info(end).time); 0734 end 0735 0736 % Return the best cost reached 0737 cost = fx; 0738 0739 end 0740 0741 0742 0743 0744 0745 % Routine in charge of collecting the current iteration stats 0746 function stats = savestats(problem, x, storedb, key, options, k, fx, ... 0747 norm_grad, Delta, ticstart, info, rho, rhonum, ... 0748 rhoden, accept, numit, norm_eta, used_cauchy) 0749 stats.iter = k; 0750 stats.cost = fx; 0751 stats.gradnorm = norm_grad; 0752 stats.Delta = Delta; 0753 if k == 0 0754 stats.time = toc(ticstart); 0755 stats.rho = inf; 0756 stats.rhonum = NaN; 0757 stats.rhoden = NaN; 0758 stats.accepted = true; 0759 stats.numinner = NaN; 0760 stats.stepsize = NaN; 0761 if options.useRand 0762 stats.cauchy = false; 0763 end 0764 else 0765 stats.time = info(k).time + toc(ticstart); 0766 stats.rho = rho; 0767 stats.rhonum = rhonum; 0768 stats.rhoden = rhoden; 0769 stats.accepted = accept; 0770 stats.numinner = numit; 0771 stats.stepsize = norm_eta; 0772 if options.useRand, 0773 stats.cauchy = used_cauchy; 0774 end 0775 end 0776 0777 % See comment about statsfun above: the x and store passed to statsfun 0778 % are that of the most recently accepted point after the iteration 0779 % fully executed. 0780 stats = applyStatsfun(problem, x, storedb, key, options, stats); 0781 0782 end

Generated on Sat 12-Nov-2016 14:11:22 by