Home > manopt > solvers > trustregions > trustregions.m

# trustregions

## PURPOSE

Riemannian trust-regions solver for optimization on manifolds.

## SYNOPSIS

function [x, cost, info, options] = trustregions(problem, x, options)

## DESCRIPTION

``` Riemannian trust-regions solver for optimization on manifolds.

function [x, cost, info, options] = trustregions(problem)
function [x, cost, info, options] = trustregions(problem, x0)
function [x, cost, info, options] = trustregions(problem, x0, options)
function [x, cost, info, options] = trustregions(problem, [], options)

This is the Riemannian Trust-Region solver (with tCG inner solve), named
RTR. This solver will attempt to minimize the cost function described in
the problem structure. It requires the availability of the cost function
and of its gradient. It will issue calls for the Hessian. If no Hessian
nor approximate Hessian is provided, a standard approximation of the
Hessian based on the gradient will be computed. If a preconditioner for
the Hessian is provided, it will be used.

If no gradient is provided, an approximation of the gradient is computed,
but this can be slow for manifolds of high dimension.

For a description of the algorithm and theorems offering convergence
guarantees, see the references below. Documentation for this solver is
available online at:

http://www.manopt.org/solver_documentation_trustregions.html

The initial iterate is x0 if it is provided. Otherwise, a random point on
the manifold is picked. To specify options whilst not specifying an
initial iterate, give x0 as [] (the empty matrix).

The two outputs 'x' and 'cost' are the last reached point on the manifold
and its cost. Notice that x is not necessarily the best reached point,
because this solver is not forced to be a descent method. In particular,
very close to convergence, it is sometimes preferable to accept very
slight increases in the cost value (on the order of the machine epsilon)
in the process of reaching fine convergence. In practice, this is not a
limiting factor, as normally one does not need fine enough convergence
that this becomes an issue.

The output 'info' is a struct-array which contains information about the
iterations:
iter (integer)
The (outer) iteration number, or number of steps considered
(whether accepted or rejected). The initial guess is 0.
cost (double)
The corresponding cost value.
The (Riemannian) norm of the gradient.
numinner (integer)
The number of inner iterations executed to compute this iterate.
Inner iterations are truncated-CG steps. Each one requires a
Hessian (or approximate Hessian) evaluation.
time (double)
The total elapsed time in seconds to reach the corresponding cost.
rho (double)
The performance ratio for the iterate.
rhonum, rhoden (double)
Regularized numerator and denominator of the performance ratio:
rho = rhonum/rhoden. See options.rho_regularization.
accepted (boolean)
Whether the proposed iterate was accepted or not.
stepsize (double)
The (Riemannian) norm of the vector returned by the inner solver
tCG and which is retracted to obtain the proposed next iterate. If
accepted = true for the corresponding iterate, this is the size of
the step from the previous to the new iterate. If accepted is
false, the step was not executed and this is the size of the
rejected step.
Delta (double)
The trust-region radius at the outer iteration.
cauchy (boolean)
Whether the Cauchy point was used or not (if useRand is true).
And possibly additional information logged by options.statsfun.
For example, type [info.gradnorm] to obtain a vector of the successive
gradient norms reached at each (outer) iteration.

The options structure is used to overwrite the default values. All
options have a default value and are hence optional. To force an option
value, pass an options structure with a field options.optionname, where
optionname is one of the following and the default value is indicated
between parentheses:

The algorithm terminates if the norm of the gradient drops below
this. For well-scaled problems, a rule of thumb is that you can
expect to reduce the gradient norm by 8 orders of magnitude
(sqrt(eps)) compared to the gradient norm at a "typical" point (a
rough initial iterate for example). Further decrease is sometimes
possible, but inexact floating point arithmetic will eventually
limit the final accuracy. If tolgradnorm is set too low, the
algorithm may end up iterating forever (or at least until another
stopping criterion triggers).
maxiter (1000)
The algorithm terminates if maxiter (outer) iterations were executed.
maxtime (Inf)
The algorithm terminates if maxtime seconds elapsed.
miniter (3)
Minimum number of outer iterations (used only if useRand is true).
mininner (1)
Minimum number of inner iterations (for tCG).
maxinner (problem.M.dim() : the manifold's dimension)
Maximum number of inner iterations (for tCG).
Delta_bar (problem.M.typicaldist() or sqrt(problem.M.dim()))
Maximum trust-region radius. If you specify this parameter but not
Delta0, then Delta0 will be set to 1/8 times this parameter.
Delta0 (Delta_bar/8)
Initial trust-region radius. If you observe a long plateau at the
beginning of the convergence plot (gradient norm VS iteration), it
may pay off to try to tune this parameter to shorten the plateau.
You should not set this parameter without setting Delta_bar too (at
a larger value).
useRand (false)
Set to true if the trust-region solve is to be initiated with a
random tangent vector. If set to true, no preconditioner will be
used. This option is set to true in some scenarios to escape saddle
points, but is otherwise seldom activated.
kappa (0.1)
tCG inner kappa convergence tolerance.
kappa > 0 is the linear convergence target rate: tCG will terminate
early if the residual was reduced by a factor of kappa.
theta (1.0)
tCG inner theta convergence tolerance.
1+theta (theta between 0 and 1) is the superlinear convergence
target rate. tCG will terminate early if the residual was reduced
by a power of 1+theta.
rho_prime (0.1)
Accept/reject threshold : if rho is at least rho_prime, the outer
iteration is accepted. Otherwise, it is rejected. In case it is
rejected, the trust-region radius will have been decreased.
To ensure this, rho_prime >= 0 must be strictly smaller than 1/4.
If rho_prime is negative, the algorithm is not guaranteed to
produce monotonically decreasing cost values. It is strongly
recommended to set rho_prime > 0, to aid convergence.
rho_regularization (1e3)
Close to convergence, evaluating the performance ratio rho is
numerically challenging. Meanwhile, close to convergence, the
quadratic model should be a good fit and the steps should be
accepted. Regularization lets rho go to 1 as the model decrease and
the actual decrease go to zero. Set this option to zero to disable
regularization (not recommended). See in-code for the specifics.
When this is not zero, it may happen that the iterates produced are
not monotonically improving the cost when very close to
convergence. This is because the corrected cost improvement could
change sign if it is negative but very small.
statsfun (none)
Function handle to a function that will be called after each
iteration to provide the opportunity to log additional statistics.
They will be returned in the info struct. See the generic Manopt
documentation about solvers for further information. statsfun is
called with the point x that was reached last, after the
accept/reject decision. See comment below.
stopfun (none)
Function handle to a function that will be called at each iteration
to provide the opportunity to specify additional stopping criteria.
See the generic Manopt documentation about solvers for further
information.
verbosity (2)
Integer number used to tune the amount of output the algorithm
generates during execution (mostly as text in the command window).
The higher, the more output. 0 means silent. 3 and above includes a
display of the options structure at the beginning of the execution.
debug (false)
Set to true to allow the algorithm to perform additional
computations for debugging purposes. If a debugging test fails, you
will be informed of it, usually via the command window. Be aware
that these additional computations appear in the algorithm timings
too, and may interfere with operations such as counting the number
of cost evaluations, etc. (the debug calls get storedb too).
storedepth (20)
Maximum number of different points x of the manifold for which a
store structure will be kept in memory in the storedb. If the
caching features of Manopt are not used, this is irrelevant. If
memory usage is an issue, you may try to lower this number.
Profiling may then help to investigate if a performance hit was
incured as a result.

Notice that statsfun is called with the point x that was reached last,
after the accept/reject decision. Hence: if the step was accepted, we get
that new x, with a store which only saw the call for the cost and for the
gradient. If the step was rejected, we get the same x as previously, with
the store structure containing everything that was computed at that point
(possibly including previous rejects at that same point). Hence, statsfun
should not be used in conjunction with the store to count operations for
example. Instead, you should use storedb's shared memory for such
purposes (either via storedb.shared, or via store.shared, see
online documentation). It is however possible to use statsfun with the
store to compute, for example, other merit functions on the point x
(other than the actual cost function, that is).

Please cite the Manopt paper as well as the research paper:
@Article{genrtr,
Title    = {Trust-region methods on {Riemannian} manifolds},
Author   = {Absil, P.-A. and Baker, C. G. and Gallivan, K. A.},
Journal  = {Foundations of Computational Mathematics},
Year     = {2007},
Number   = {3},
Pages    = {303--330},
Volume   = {7},
Doi      = {10.1007/s10208-005-0179-9}
}

## CROSS-REFERENCE INFORMATION

This function calls:
• StoreDB
• applyStatsfun Apply the statsfun function to a stats structure (for solvers).
• canGetApproxGradient Checks whether an approximate gradient can be computed for this problem.
• canGetApproxHessian Checks whether an approximate Hessian can be computed for this problem.
• canGetCost Checks whether the cost function can be computed for a problem structure.
• canGetGradient Checks whether the gradient can be computed for a problem structure.
• canGetHessian Checks whether the Hessian can be computed for a problem structure.
• getCost Computes the cost function at x.
• getCostGrad Computes the cost function and the gradient at x in one call if possible.
• getDirectionalDerivative Computes the directional derivative of the cost function at x along d.
• getGlobalDefaults Returns a structure with default option values for Manopt.
• getHessian Computes the Hessian of the cost function at x along d.
• mergeOptions Merges two options structures with one having precedence over the other.
• stoppingcriterion Checks for standard stopping criteria, as a helper to solvers.
• approxgradientFD Gradient approx. fnctn handle based on finite differences of the cost.
• approxhessianFD Hessian approx. fnctn handle based on finite differences of the gradient.
• tCG tCG - Truncated (Steihaug-Toint) Conjugate-Gradient method
This function is called by:

## SOURCE CODE

```0001 function [x, cost, info, options] = trustregions(problem, x, options)
0002 % Riemannian trust-regions solver for optimization on manifolds.
0003 %
0004 % function [x, cost, info, options] = trustregions(problem)
0005 % function [x, cost, info, options] = trustregions(problem, x0)
0006 % function [x, cost, info, options] = trustregions(problem, x0, options)
0007 % function [x, cost, info, options] = trustregions(problem, [], options)
0008 %
0009 % This is the Riemannian Trust-Region solver (with tCG inner solve), named
0010 % RTR. This solver will attempt to minimize the cost function described in
0011 % the problem structure. It requires the availability of the cost function
0012 % and of its gradient. It will issue calls for the Hessian. If no Hessian
0013 % nor approximate Hessian is provided, a standard approximation of the
0014 % Hessian based on the gradient will be computed. If a preconditioner for
0015 % the Hessian is provided, it will be used.
0016 %
0017 % If no gradient is provided, an approximation of the gradient is computed,
0018 % but this can be slow for manifolds of high dimension.
0019 %
0020 % For a description of the algorithm and theorems offering convergence
0021 % guarantees, see the references below. Documentation for this solver is
0022 % available online at:
0023 %
0024 % http://www.manopt.org/solver_documentation_trustregions.html
0025 %
0026 %
0027 % The initial iterate is x0 if it is provided. Otherwise, a random point on
0028 % the manifold is picked. To specify options whilst not specifying an
0029 % initial iterate, give x0 as [] (the empty matrix).
0030 %
0031 % The two outputs 'x' and 'cost' are the last reached point on the manifold
0032 % and its cost. Notice that x is not necessarily the best reached point,
0033 % because this solver is not forced to be a descent method. In particular,
0034 % very close to convergence, it is sometimes preferable to accept very
0035 % slight increases in the cost value (on the order of the machine epsilon)
0036 % in the process of reaching fine convergence. In practice, this is not a
0037 % limiting factor, as normally one does not need fine enough convergence
0038 % that this becomes an issue.
0039 %
0040 % The output 'info' is a struct-array which contains information about the
0041 % iterations:
0042 %   iter (integer)
0043 %       The (outer) iteration number, or number of steps considered
0044 %       (whether accepted or rejected). The initial guess is 0.
0045 %    cost (double)
0046 %       The corresponding cost value.
0048 %       The (Riemannian) norm of the gradient.
0049 %    numinner (integer)
0050 %       The number of inner iterations executed to compute this iterate.
0051 %       Inner iterations are truncated-CG steps. Each one requires a
0052 %       Hessian (or approximate Hessian) evaluation.
0053 %    time (double)
0054 %       The total elapsed time in seconds to reach the corresponding cost.
0055 %    rho (double)
0056 %       The performance ratio for the iterate.
0057 %    rhonum, rhoden (double)
0058 %       Regularized numerator and denominator of the performance ratio:
0059 %       rho = rhonum/rhoden. See options.rho_regularization.
0060 %    accepted (boolean)
0061 %       Whether the proposed iterate was accepted or not.
0062 %    stepsize (double)
0063 %       The (Riemannian) norm of the vector returned by the inner solver
0064 %       tCG and which is retracted to obtain the proposed next iterate. If
0065 %       accepted = true for the corresponding iterate, this is the size of
0066 %       the step from the previous to the new iterate. If accepted is
0067 %       false, the step was not executed and this is the size of the
0068 %       rejected step.
0069 %    Delta (double)
0070 %       The trust-region radius at the outer iteration.
0071 %    cauchy (boolean)
0072 %       Whether the Cauchy point was used or not (if useRand is true).
0073 %   And possibly additional information logged by options.statsfun.
0074 % For example, type [info.gradnorm] to obtain a vector of the successive
0075 % gradient norms reached at each (outer) iteration.
0076 %
0077 % The options structure is used to overwrite the default values. All
0078 % options have a default value and are hence optional. To force an option
0079 % value, pass an options structure with a field options.optionname, where
0080 % optionname is one of the following and the default value is indicated
0081 % between parentheses:
0082 %
0084 %       The algorithm terminates if the norm of the gradient drops below
0085 %       this. For well-scaled problems, a rule of thumb is that you can
0086 %       expect to reduce the gradient norm by 8 orders of magnitude
0087 %       (sqrt(eps)) compared to the gradient norm at a "typical" point (a
0088 %       rough initial iterate for example). Further decrease is sometimes
0089 %       possible, but inexact floating point arithmetic will eventually
0090 %       limit the final accuracy. If tolgradnorm is set too low, the
0091 %       algorithm may end up iterating forever (or at least until another
0092 %       stopping criterion triggers).
0093 %   maxiter (1000)
0094 %       The algorithm terminates if maxiter (outer) iterations were executed.
0095 %   maxtime (Inf)
0096 %       The algorithm terminates if maxtime seconds elapsed.
0097 %    miniter (3)
0098 %       Minimum number of outer iterations (used only if useRand is true).
0099 %    mininner (1)
0100 %       Minimum number of inner iterations (for tCG).
0101 %    maxinner (problem.M.dim() : the manifold's dimension)
0102 %       Maximum number of inner iterations (for tCG).
0103 %    Delta_bar (problem.M.typicaldist() or sqrt(problem.M.dim()))
0104 %       Maximum trust-region radius. If you specify this parameter but not
0105 %       Delta0, then Delta0 will be set to 1/8 times this parameter.
0106 %   Delta0 (Delta_bar/8)
0107 %       Initial trust-region radius. If you observe a long plateau at the
0108 %       beginning of the convergence plot (gradient norm VS iteration), it
0109 %       may pay off to try to tune this parameter to shorten the plateau.
0110 %       You should not set this parameter without setting Delta_bar too (at
0111 %       a larger value).
0112 %    useRand (false)
0113 %       Set to true if the trust-region solve is to be initiated with a
0114 %       random tangent vector. If set to true, no preconditioner will be
0115 %       used. This option is set to true in some scenarios to escape saddle
0116 %       points, but is otherwise seldom activated.
0117 %    kappa (0.1)
0118 %       tCG inner kappa convergence tolerance.
0119 %       kappa > 0 is the linear convergence target rate: tCG will terminate
0120 %       early if the residual was reduced by a factor of kappa.
0121 %    theta (1.0)
0122 %       tCG inner theta convergence tolerance.
0123 %       1+theta (theta between 0 and 1) is the superlinear convergence
0124 %       target rate. tCG will terminate early if the residual was reduced
0125 %       by a power of 1+theta.
0126 %    rho_prime (0.1)
0127 %       Accept/reject threshold : if rho is at least rho_prime, the outer
0128 %       iteration is accepted. Otherwise, it is rejected. In case it is
0129 %       rejected, the trust-region radius will have been decreased.
0130 %       To ensure this, rho_prime >= 0 must be strictly smaller than 1/4.
0131 %       If rho_prime is negative, the algorithm is not guaranteed to
0132 %       produce monotonically decreasing cost values. It is strongly
0133 %       recommended to set rho_prime > 0, to aid convergence.
0134 %   rho_regularization (1e3)
0135 %       Close to convergence, evaluating the performance ratio rho is
0136 %       numerically challenging. Meanwhile, close to convergence, the
0137 %       quadratic model should be a good fit and the steps should be
0138 %       accepted. Regularization lets rho go to 1 as the model decrease and
0139 %       the actual decrease go to zero. Set this option to zero to disable
0140 %       regularization (not recommended). See in-code for the specifics.
0141 %       When this is not zero, it may happen that the iterates produced are
0142 %       not monotonically improving the cost when very close to
0143 %       convergence. This is because the corrected cost improvement could
0144 %       change sign if it is negative but very small.
0145 %   statsfun (none)
0146 %       Function handle to a function that will be called after each
0147 %       iteration to provide the opportunity to log additional statistics.
0148 %       They will be returned in the info struct. See the generic Manopt
0149 %       documentation about solvers for further information. statsfun is
0150 %       called with the point x that was reached last, after the
0151 %       accept/reject decision. See comment below.
0152 %   stopfun (none)
0153 %       Function handle to a function that will be called at each iteration
0154 %       to provide the opportunity to specify additional stopping criteria.
0155 %       See the generic Manopt documentation about solvers for further
0156 %       information.
0157 %   verbosity (2)
0158 %       Integer number used to tune the amount of output the algorithm
0159 %       generates during execution (mostly as text in the command window).
0160 %       The higher, the more output. 0 means silent. 3 and above includes a
0161 %       display of the options structure at the beginning of the execution.
0162 %   debug (false)
0163 %       Set to true to allow the algorithm to perform additional
0164 %       computations for debugging purposes. If a debugging test fails, you
0165 %       will be informed of it, usually via the command window. Be aware
0166 %       that these additional computations appear in the algorithm timings
0167 %       too, and may interfere with operations such as counting the number
0168 %       of cost evaluations, etc. (the debug calls get storedb too).
0169 %   storedepth (20)
0170 %       Maximum number of different points x of the manifold for which a
0171 %       store structure will be kept in memory in the storedb. If the
0172 %       caching features of Manopt are not used, this is irrelevant. If
0173 %       memory usage is an issue, you may try to lower this number.
0174 %       Profiling may then help to investigate if a performance hit was
0175 %       incured as a result.
0176 %
0177 % Notice that statsfun is called with the point x that was reached last,
0178 % after the accept/reject decision. Hence: if the step was accepted, we get
0179 % that new x, with a store which only saw the call for the cost and for the
0180 % gradient. If the step was rejected, we get the same x as previously, with
0181 % the store structure containing everything that was computed at that point
0182 % (possibly including previous rejects at that same point). Hence, statsfun
0183 % should not be used in conjunction with the store to count operations for
0184 % example. Instead, you should use storedb's shared memory for such
0185 % purposes (either via storedb.shared, or via store.shared, see
0186 % online documentation). It is however possible to use statsfun with the
0187 % store to compute, for example, other merit functions on the point x
0188 % (other than the actual cost function, that is).
0189 %
0190 %
0191 % Please cite the Manopt paper as well as the research paper:
0192 %     @Article{genrtr,
0193 %       Title    = {Trust-region methods on {Riemannian} manifolds},
0194 %       Author   = {Absil, P.-A. and Baker, C. G. and Gallivan, K. A.},
0195 %       Journal  = {Foundations of Computational Mathematics},
0196 %       Year     = {2007},
0197 %       Number   = {3},
0198 %       Pages    = {303--330},
0199 %       Volume   = {7},
0200 %       Doi      = {10.1007/s10208-005-0179-9}
0201 %     }
0202 %
0204
0205 % An explicit, general listing of this algorithm, with preconditioning,
0206 % can be found in the following paper:
0207 %     @Article{boumal2015lowrank,
0208 %       Title   = {Low-rank matrix completion via preconditioned optimization on the {G}rassmann manifold},
0209 %       Author  = {Boumal, N. and Absil, P.-A.},
0210 %       Journal = {Linear Algebra and its Applications},
0211 %       Year    = {2015},
0212 %       Pages   = {200--239},
0213 %       Volume  = {475},
0214 %       Doi     = {10.1016/j.laa.2015.02.027},
0215 %     }
0216
0217 % When the Hessian is not specified, it is approximated with
0218 % finite-differences of the gradient. The resulting method is called
0219 % RTR-FD. Some convergence theory for it is available in this paper:
0220 % @incollection{boumal2015rtrfd
0221 %     author={Boumal, N.},
0222 %     title={Riemannian trust regions with finite-difference Hessian approximations are globally convergent},
0223 %     year={2015},
0224 %     booktitle={Geometric Science of Information}
0225 % }
0226
0227
0228 % This file is part of Manopt: www.manopt.org.
0229 % This code is an adaptation to Manopt of the original GenRTR code:
0230 % RTR - Riemannian Trust-Region
0231 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan
0232 % Florida State University
0233 % School of Computational Science
0235 % See accompanying license file.
0236 % The adaptation was executed by Nicolas Boumal.
0237 %
0238 %
0239 % Change log:
0240 %
0241 %   NB April 3, 2013:
0242 %       tCG now returns the Hessian along the returned direction eta, so
0243 %       that we do not compute that Hessian redundantly: some savings at
0244 %       each iteration. Similarly, if the useRand flag is on, we spare an
0245 %       extra Hessian computation at each outer iteration too, owing to
0246 %       some modifications in the Cauchy point section of the code specific
0247 %       to useRand = true.
0248 %
0249 %   NB Aug. 22, 2013:
0250 %       This function is now Octave compatible. The transition called for
0251 %       two changes which would otherwise not be advisable. (1) tic/toc is
0252 %       now used as is, as opposed to the safer way:
0253 %       t = tic(); elapsed = toc(t);
0254 %       And (2), the (formerly inner) function savestats was moved outside
0255 %       the main function to not be nested anymore. This is arguably less
0256 %       elegant, but Octave does not (and likely will not) support nested
0257 %       functions.
0258 %
0259 %   NB Dec. 2, 2013:
0260 %       The in-code documentation was largely revised and expanded.
0261 %
0262 %   NB Dec. 2, 2013:
0263 %       The former heuristic which triggered when rhonum was very small and
0264 %       forced rho = 1 has been replaced by a smoother heuristic which
0265 %       consists in regularizing rhonum and rhoden before computing their
0266 %       ratio. It is tunable via options.rho_regularization. Furthermore,
0267 %       the solver now detects if tCG did not obtain a model decrease
0268 %       (which is theoretically impossible but may happen because of
0269 %       numerical errors and/or because of a nonlinear/nonsymmetric Hessian
0270 %       operator, which is the case for finite difference approximations).
0271 %       When such an anomaly is detected, the step is rejected and the
0272 %       trust region radius is decreased.
0273 %       Feb. 18, 2015 note: this is less useful now, as tCG now guarantees
0274 %       model decrease even for the finite difference approximation of the
0275 %       Hessian. It is still useful in case of numerical errors, but this
0276 %       is less stringent.
0277 %
0278 %   NB Dec. 3, 2013:
0279 %       The stepsize is now registered at each iteration, at a small
0280 %       additional cost. The defaults for Delta_bar and Delta0 are better
0281 %       defined. Setting Delta_bar in the options will automatically set
0282 %       Delta0 accordingly. In Manopt 1.0.4, the defaults for these options
0283 %       were not treated appropriately because of an incorrect use of the
0284 %       isfield() built-in function.
0285 %
0286 %   NB Feb. 18, 2015:
0287 %       Added some comments. Also, Octave now supports safe tic/toc usage,
0288 %       so we reverted the changes to use that again (see Aug. 22, 2013 log
0289 %       entry).
0290 %
0291 %   NB April 3, 2015:
0292 %       Works with the new StoreDB class system.
0293 %
0294 %   NB April 8, 2015:
0295 %       No Hessian warning if approximate Hessian explicitly available.
0296 %
0297 %   NB Nov. 1, 2016:
0298 %       Now uses approximate gradient via finite differences if need be.
0299
0300
0301 % Verify that the problem description is sufficient for the solver.
0302 if ~canGetCost(problem)
0303     warning('manopt:getCost', ...
0304             'No cost provided. The algorithm will likely abort.');
0305 end
0307     % Note: we do not give a warning if an approximate gradient is
0308     % explicitly given in the problem description, as in that case the user
0309     % seems to be aware of the issue.
0312             'It may be necessary to increase options.tolgradnorm.\n' ...
0313             'To disable this warning: warning(''off'', ''manopt:getGradient:approx'')']);
0315 end
0316 if ~canGetHessian(problem) && ~canGetApproxHessian(problem)
0317     % Note: we do not give a warning if an approximate Hessian is
0318     % explicitly given in the problem description, as in that case the user
0319     % seems to be aware of the issue.
0320     warning('manopt:getHessian:approx', ...
0321            ['No Hessian provided. Using an FD approximation instead.\n' ...
0322             'To disable this warning: warning(''off'', ''manopt:getHessian:approx'')']);
0323     problem.approxhess = approxhessianFD(problem);
0324 end
0325
0326 % Define some strings for display
0327 tcg_stop_reason = {'negative curvature',...
0328                    'exceeded trust region',...
0329                    'reached target residual-kappa (linear)',...
0330                    'reached target residual-theta (superlinear)',...
0331                    'maximum inner iterations',...
0332                    'model increased'};
0333
0334 % Set local defaults here
0335 localdefaults.verbosity = 2;
0336 localdefaults.maxtime = inf;
0337 localdefaults.miniter = 3;
0338 localdefaults.maxiter = 1000;
0339 localdefaults.mininner = 1;
0340 localdefaults.maxinner = problem.M.dim();
0342 localdefaults.kappa = 0.1;
0343 localdefaults.theta = 1.0;
0344 localdefaults.rho_prime = 0.1;
0345 localdefaults.useRand = false;
0346 localdefaults.rho_regularization = 1e3;
0347
0348 % Merge global and local defaults, then merge w/ user options, if any.
0349 localdefaults = mergeOptions(getGlobalDefaults(), localdefaults);
0350 if ~exist('options', 'var') || isempty(options)
0351     options = struct();
0352 end
0353 options = mergeOptions(localdefaults, options);
0354
0355 % Set default Delta_bar and Delta0 separately to deal with additional
0356 % logic: if Delta_bar is provided but not Delta0, let Delta0 automatically
0357 % be some fraction of the provided Delta_bar.
0358 if ~isfield(options, 'Delta_bar')
0359     if isfield(problem.M, 'typicaldist')
0360         options.Delta_bar = problem.M.typicaldist();
0361     else
0362         options.Delta_bar = sqrt(problem.M.dim());
0363     end
0364 end
0365 if ~isfield(options,'Delta0')
0366     options.Delta0 = options.Delta_bar / 8;
0367 end
0368
0369 % Check some option values
0370 assert(options.rho_prime < 1/4, ...
0371         'options.rho_prime must be strictly smaller than 1/4.');
0372 assert(options.Delta_bar > 0, ...
0373         'options.Delta_bar must be positive.');
0374 assert(options.Delta0 > 0 && options.Delta0 < options.Delta_bar, ...
0375         'options.Delta0 must be positive and smaller than Delta_bar.');
0376
0377 % It is sometimes useful to check what the actual option values are.
0378 if options.verbosity >= 3
0379     disp(options);
0380 end
0381
0382 ticstart = tic();
0383
0384 % If no initial point x is given by the user, generate one at random.
0385 if ~exist('x', 'var') || isempty(x)
0386     x = problem.M.rand();
0387 end
0388
0389 % Create a store database and get a key for the current x
0390 storedb = StoreDB(options.storedepth);
0391 key = storedb.getNewKey();
0392
0393 %% Initializations
0394
0395 % k counts the outer (TR) iterations. The semantic is that k counts the
0396 % number of iterations fully executed so far.
0397 k = 0;
0398
0399 % Initialize solution and companion measures: f(x), fgrad(x)
0402
0404 Delta = options.Delta0;
0405
0406 % Save stats in a struct array info, and preallocate.
0407 if ~exist('used_cauchy', 'var')
0408     used_cauchy = [];
0409 end
0410 stats = savestats(problem, x, storedb, key, options, k, fx, norm_grad, Delta, ticstart);
0411 info(1) = stats;
0412 info(min(10000, options.maxiter+1)).iter = [];
0413
0414 % ** Display:
0415 if options.verbosity == 2
0416    fprintf(['%3s %3s      %5s                %5s     ',...
0418            '   ','   ','     ','     ', fx, norm_grad);
0419 elseif options.verbosity > 2
0420    fprintf('************************************************************************\n');
0421    fprintf('%3s %3s    k: %5s     num_inner: %5s     %s\n',...
0422            '','','______','______','');
0424    fprintf('      Delta : %f\n', Delta);
0425 end
0426
0427 % To keep track of consecutive radius changes, so that we can warn the
0428 % user if it appears necessary.
0429 consecutive_TRplus = 0;
0430 consecutive_TRminus = 0;
0431
0432
0433 % **********************
0434 % ** Start of TR loop **
0435 % **********************
0436 while true
0437
0438     % Start clock for this outer iteration
0439     ticstart = tic();
0440
0441     % Run standard stopping criterion checks
0442     [stop, reason] = stoppingcriterion(problem, x, options, info, k+1);
0443
0444     % If the stopping criterion that triggered is the tolerance on the
0445     % gradient norm but we are using randomization, make sure we make at
0446     % least miniter iterations to give randomization a chance at escaping
0448     if stop == 2 && options.useRand && k < options.miniter
0449         stop = 0;
0450     end
0451
0452     if stop
0453         if options.verbosity >= 1
0454             fprintf([reason '\n']);
0455         end
0456         break;
0457     end
0458
0459     if options.verbosity > 2 || options.debug > 0
0460         fprintf('************************************************************************\n');
0461     end
0462
0463     % *************************
0464     % ** Begin TR Subproblem **
0465     % *************************
0466
0467     % Determine eta0
0468     if ~options.useRand
0469         % Pick the zero vector
0470         eta = problem.M.zerovec(x);
0471     else
0472         % Random vector in T_x M (this has to be very small)
0473         eta = problem.M.lincomb(x, 1e-6, problem.M.randvec(x));
0474         % Must be inside trust-region
0475         while problem.M.norm(x, eta) > Delta
0476             eta = problem.M.lincomb(x, sqrt(sqrt(eps)), eta);
0477         end
0478     end
0479
0480     % Solve TR subproblem approximately
0481     [eta, Heta, numit, stop_inner] = ...
0482                 tCG(problem, x, fgradx, eta, Delta, options, storedb, key);
0483     srstr = tcg_stop_reason{stop_inner};
0484
0485     % If using randomized approach, compare result with the Cauchy point.
0486     % Convergence proofs assume that we achieve at least (a fraction of)
0487     % the reduction of the Cauchy point. After this if-block, either all
0488     % eta-related quantities have been changed consistently, or none of
0489     % them have changed.
0490     if options.useRand
0491         used_cauchy = false;
0492         % Check the curvature,
0493         Hg = getHessian(problem, x, fgradx, storedb, key);
0494         g_Hg = problem.M.inner(x, fgradx, Hg);
0495         if g_Hg <= 0
0496             tau_c = 1;
0497         else
0498             tau_c = min( norm_grad^3/(Delta*g_Hg) , 1);
0499         end
0500         % and generate the Cauchy point.
0502         Heta_c = problem.M.lincomb(x, -tau_c * Delta / norm_grad, Hg);
0503
0504         % Now that we have computed the Cauchy point in addition to the
0505         % returned eta, we might as well keep the best of them.
0506         mdle  = fx + problem.M.inner(x, fgradx, eta) ...
0507                    + .5*problem.M.inner(x, Heta,   eta);
0508         mdlec = fx + problem.M.inner(x, fgradx, eta_c) ...
0509                    + .5*problem.M.inner(x, Heta_c, eta_c);
0510         if mdlec < mdle
0511             eta = eta_c;
0512             Heta = Heta_c; % added April 11, 2012
0513             used_cauchy = true;
0514         end
0515     end
0516
0517
0518     % This is only computed for logging purposes, because it may be useful
0519     % for some user-defined stopping criteria. If this is not cheap for
0520     % specific applications (compared to evaluating the cost), we should
0521     % reconsider this.
0522     norm_eta = problem.M.norm(x, eta);
0523
0524     if options.debug > 0
0526     end
0527
0528
0529     % Compute the tentative next iterate (the proposal)
0530     x_prop  = problem.M.retr(x, eta);
0531     key_prop = storedb.getNewKey();
0532
0533     % Compute the function value of the proposal
0534     fx_prop = getCost(problem, x_prop, storedb, key_prop);
0535
0536     % Will we accept the proposal or not?
0537     % Check the performance of the quadratic model against the actual cost.
0538     rhonum = fx - fx_prop;
0539     rhoden = -problem.M.inner(x, fgradx, eta) ...
0540              -.5*problem.M.inner(x, eta, Heta);
0541     % rhonum could be anything.
0542     % rhoden should be nonnegative, as guaranteed by tCG, baring numerical
0543     % errors.
0544
0545     % Heuristic -- added Dec. 2, 2013 (NB) to replace the former heuristic.
0546     % This heuristic is documented in the book by Conn Gould and Toint on
0547     % trust-region methods, section 17.4.2.
0548     % rhonum measures the difference between two numbers. Close to
0549     % convergence, these two numbers are very close to each other, so
0550     % that computing their difference is numerically challenging: there may
0551     % be a significant loss in accuracy. Since the acceptance or rejection
0552     % of the step is conditioned on the ratio between rhonum and rhoden,
0553     % large errors in rhonum result in a very large error in rho, hence in
0554     % erratic acceptance / rejection. Meanwhile, close to convergence,
0555     % steps are usually trustworthy and we should transition to a Newton-
0556     % like method, with rho=1 consistently. The heuristic thus shifts both
0557     % rhonum and rhoden by a small amount such that far from convergence,
0558     % the shift is irrelevant and close to convergence, the ratio rho goes
0559     % to 1, effectively promoting acceptance of the step.
0560     % The rationale is that close to convergence, both rhonum and rhoden
0561     % are quadratic in the distance between x and x_prop. Thus, when this
0562     % distance is on the order of sqrt(eps), the value of rhonum and rhoden
0563     % is on the order of eps, which is indistinguishable from the numerical
0564     % error, resulting in badly estimated rho's.
0565     % For abs(fx) < 1, this heuristic is invariant under offsets of f but
0566     % not under scaling of f. For abs(fx) > 1, the opposite holds. This
0567     % should not alarm us, as this heuristic only triggers at the very last
0568     % iterations if very fine convergence is demanded.
0569     rho_reg = max(1, abs(fx)) * eps * options.rho_regularization;
0570     rhonum = rhonum + rho_reg;
0571     rhoden = rhoden + rho_reg;
0572
0573     if options.debug > 0
0574         fprintf('DBG:     rhonum : %e\n', rhonum);
0575         fprintf('DBG:     rhoden : %e\n', rhoden);
0576     end
0577
0578     % This is always true if a linear, symmetric operator is used for the
0579     % Hessian (approximation) and if we had infinite numerical precision.
0580     % In practice, nonlinear approximations of the Hessian such as the
0581     % built-in finite difference approximation and finite numerical
0582     % accuracy can cause the model to increase. In such scenarios, we
0583     % decide to force a rejection of the step and a reduction of the
0584     % trust-region radius. We test the sign of the regularized rhoden since
0585     % the regularization is supposed to capture the accuracy to which
0586     % rhoden is computed: if rhoden were negative before regularization but
0587     % not after, that should not be (and is not) detected as a failure.
0588     %
0590     % includes a mechanism to ensure model decrease if the Cauchy step
0591     % attained a decrease (which is theoretically the case under very lax
0592     % assumptions). This being said, it is always possible that numerical
0593     % errors will prevent this, so that it is good to keep a safeguard.
0594     %
0595     % The current strategy is that, if this should happen, then we reject
0596     % the step and reduce the trust region radius. This also ensures that
0597     % the actual cost values are monotonically decreasing.
0598     model_decreased = (rhoden >= 0);
0599
0600     if ~model_decreased
0601         srstr = [srstr ', model did not decrease']; %#ok<AGROW>
0602     end
0603
0604     rho = rhonum / rhoden;
0605
0606     % Added June 30, 2015 following observation by BM.
0607     % With this modification, it is guaranteed that a step rejection is
0608     % always accompanied by a TR reduction. This prevents stagnation in
0609     % this "corner case" (NaN's really aren't supposed to occur, but it's
0610     % nice if we can handle them nonetheless).
0611     if isnan(rho)
0612         fprintf('rho is NaN! Forcing a radius decrease. This should not happen.\n');
0613         if isnan(fx_prop)
0614             fprintf('The cost function returned NaN (perhaps the retraction returned a bad point?)\n');
0615         else
0616             fprintf('The cost function did not return a NaN value.');
0617         end
0618     end
0619
0620     if options.debug > 0
0621         m = @(x, eta) ...
0622           getCost(problem, x, storedb, key) + ...
0623           getDirectionalDerivative(problem, x, eta, storedb, key) + ...
0624           .5*problem.M.inner(x, getHessian(problem, x, eta, storedb, key), eta);
0625         zerovec = problem.M.zerovec(x);
0626         actrho = (fx - fx_prop) / (m(x, zerovec) - m(x, eta));
0627         fprintf('DBG:   new f(x) : %+e\n', fx_prop);
0628         fprintf('DBG: actual rho : %e\n', actrho);
0629         fprintf('DBG:   used rho : %e\n', rho);
0630     end
0631
0632     % Choose the new TR radius based on the model performance
0633     trstr = '   ';
0634     % If the actual decrease is smaller than 1/4 of the predicted decrease,
0635     % then reduce the TR radius.
0636     if rho < 1/4 || ~model_decreased || isnan(rho)
0637         trstr = 'TR-';
0638         Delta = Delta/4;
0639         consecutive_TRplus = 0;
0640         consecutive_TRminus = consecutive_TRminus + 1;
0641         if consecutive_TRminus >= 5 && options.verbosity >= 1
0642             consecutive_TRminus = -inf;
0643             fprintf(' +++ Detected many consecutive TR- (radius decreases).\n');
0644             fprintf(' +++ Consider decreasing options.Delta_bar by an order of magnitude.\n');
0645             fprintf(' +++ Current values: options.Delta_bar = %g and options.Delta0 = %g.\n', options.Delta_bar, options.Delta0);
0646         end
0647     % If the actual decrease is at least 3/4 of the precicted decrease and
0648     % the tCG (inner solve) hit the TR boundary, increase the TR radius.
0649     % We also keep track of the number of consecutive trust-region radius
0650     % increases. If there are many, this may indicate the need to adapt the
0651     % initial and maximum radii.
0652     elseif rho > 3/4 && (stop_inner == 1 || stop_inner == 2)
0653         trstr = 'TR+';
0654         Delta = min(2*Delta, options.Delta_bar);
0655         consecutive_TRminus = 0;
0656         consecutive_TRplus = consecutive_TRplus + 1;
0657         if consecutive_TRplus >= 5 && options.verbosity >= 1
0658             consecutive_TRplus = -inf;
0659             fprintf(' +++ Detected many consecutive TR+ (radius increases).\n');
0660             fprintf(' +++ Consider increasing options.Delta_bar by an order of magnitude.\n');
0661             fprintf(' +++ Current values: options.Delta_bar = %g and options.Delta0 = %g.\n', options.Delta_bar, options.Delta0);
0662         end
0663     else
0664         % Otherwise, keep the TR radius constant.
0665         consecutive_TRplus = 0;
0666         consecutive_TRminus = 0;
0667     end
0668
0669     % Choose to accept or reject the proposed step based on the model
0670     % performance. Note the strict inequality.
0671     if model_decreased && rho > options.rho_prime
0672         accept = true;
0673         accstr = 'acc';
0674         x = x_prop;
0675         key = key_prop;
0676         fx = fx_prop;
0679     else
0680         accept = false;
0681         accstr = 'REJ';
0682     end
0683
0684
0685     % Make sure we don't use too much memory for the store database
0686     storedb.purge();
0687
0688     % k is the number of iterations we have accomplished.
0689     k = k + 1;
0690
0691     % Log statistics for freshly executed iteration.
0692     % Everything after this in the loop is not accounted for in the timing.
0693     stats = savestats(problem, x, storedb, key, options, k, fx, ...
0694                       norm_grad, Delta, ticstart, info, rho, rhonum, ...
0695                       rhoden, accept, numit, norm_eta, used_cauchy);
0696     info(k+1) = stats; %#ok<AGROW>
0697
0698
0699     % ** Display:
0700     if options.verbosity == 2,
0701         fprintf(['%3s %3s   k: %5d     num_inner: %5d     ', ...
0702         'f: %+e   |grad|: %e   %s\n'], ...
0704     elseif options.verbosity > 2,
0705         if options.useRand && used_cauchy,
0706             fprintf('USED CAUCHY POINT\n');
0707         end
0708         fprintf('%3s %3s    k: %5d     num_inner: %5d     %s\n', ...
0709                 accstr, trstr, k, numit, srstr);
0711         if options.debug > 0
0712             fprintf('      Delta : %f          |eta| : %e\n',Delta,norm_eta);
0713         end
0714         fprintf('        rho : %e\n',rho);
0715     end
0716     if options.debug > 0,
0718         if rho == 0
0719             fprintf('DBG: rho = 0, this will likely hinder further convergence.\n');
0720         end
0721     end
0722
0723 end  % of TR loop (counter: k)
0724
0725 % Restrict info struct-array to useful part
0726 info = info(1:k+1);
0727
0728
0729 if (options.verbosity > 2) || (options.debug > 0),
0730    fprintf('************************************************************************\n');
0731 end
0732 if (options.verbosity > 0) || (options.debug > 0)
0733     fprintf('Total time is %f [s] (excludes statsfun)\n', info(end).time);
0734 end
0735
0736 % Return the best cost reached
0737 cost = fx;
0738
0739 end
0740
0741
0742
0743
0744
0745 % Routine in charge of collecting the current iteration stats
0746 function stats = savestats(problem, x, storedb, key, options, k, fx, ...
0747                            norm_grad, Delta, ticstart, info, rho, rhonum, ...
0748                            rhoden, accept, numit, norm_eta, used_cauchy)
0749     stats.iter = k;
0750     stats.cost = fx;
0752     stats.Delta = Delta;
0753     if k == 0
0754         stats.time = toc(ticstart);
0755         stats.rho = inf;
0756         stats.rhonum = NaN;
0757         stats.rhoden = NaN;
0758         stats.accepted = true;
0759         stats.numinner = NaN;
0760         stats.stepsize = NaN;
0761         if options.useRand
0762             stats.cauchy = false;
0763         end
0764     else
0765         stats.time = info(k).time + toc(ticstart);
0766         stats.rho = rho;
0767         stats.rhonum = rhonum;
0768         stats.rhoden = rhoden;
0769         stats.accepted = accept;
0770         stats.numinner = numit;
0771         stats.stepsize = norm_eta;
0772         if options.useRand,
0773           stats.cauchy = used_cauchy;
0774         end
0775     end
0776
0777     % See comment about statsfun above: the x and store passed to statsfun
0778     % are that of the most recently accepted point after the iteration
0779     % fully executed.
0780     stats = applyStatsfun(problem, x, storedb, key, options, stats);
0781
0782 end```

Generated on Sat 12-Nov-2016 14:11:22 by m2html © 2005