In September 2022 this solver was improved in several ways that are not reflected in this documentation.
Everything is backwards compatible though. For the latest info, type help trustregions
at the
Matlab prompt.
Trust-region methods form a widely used class of nonlinear optimization algorithms. They were generalized to the realm of optimization on Riemannian manifolds in [ABG07], see also [AMS08], specifically chapter 7. See [ABG07] for a listing and explanation of the algorithm.
The standard convergence guarantees from the Euclidean setting essentially carry over to the Riemannian setting: the method is globally convergent, that is, regardless of the initial guess, it will converge toward critical points. Furthermore, when the true Hessian is used, we get quadratic local convergence; otherwise, we still get linear local convergence (superlinear if the approximate Hessian is "good enough"). See [ABG07] for precise theorem statements and the assumptions that need to be verified (in particular, sufficient smoothness of both the manifold and the cost).
This trust-region method uses a truncated conjugate-gradient (tCG) method to solve the inner minimization problems. This inner solve can be preconditioned: simply provide a preconditioner in the cost function description.
The implementation in Manopt is an adapted and improved version of GenRTR, the original code by the authors of [ABG07]. By default, if no Hessian and no approximate Hessian are provided, Manopt will approximate the Hessian using finite differences of the gradient. Empirically, this is seen to perform excellently. The global convergence proof of the Riemannian trust-region method can be extended to guarantee global convergence with this nonlinear approximation of the Hessian too, see [Bou15], but there is currently no proof of superlinear convergence for this algorithm. You may disable this by providing an approximate Hessian (for example, the identity operator), or the Hessian of course. If you want more control over the approximation algorithm, you may specify an approximate Hessian in the cost description and perhaps start from the base code in manopt/solvers/hessianapproximations/approxhessianFD.m.
What should the problem structure specify about the cost function $f$?
How to call this solver?
The solver trustregions
is defined in the file manopt/solvers/trustregions/trustregions.m.
It may be called as follows:
[x_opt cost_opt info] = trustregions(problem)
[x_opt cost_opt info] = trustregions(problem, x0)
[x_opt cost_opt info] = trustregions(problem, [], options)
[x_opt cost_opt info] = trustregions(problem, x0, options)
The first input is a Manopt problem structure, describing the manifold and the cost. The second input (optional) is a point on the manifold from where the algorithm will start (initial guess). The third input (optional) is a structure specifying option values (see below).
For an in-depth description of the algorithm, with convergence theory, see [ABG07] or [AMS08].
All options have default values, so that you seldom have to tweak
them. Nevertheless, from time to time, good tuning may bring about
nice performance boosts or better control. The tutorial
page documents generic options. Among those, the present
solver supports options."..."
:
verbosity
, debug
,
statsfun
maxiter
, maxtime
,
tolcost
,
tolgradnorm
, stopfun
storedepth
options
structure:
Field name (options."..." ) |
Value type | Description |
Solver specific | ||
Delta0 |
double | Initial radius $\Delta_0$ of the trust-region. You should
not set this parameter without setting Delta_bar .
|
Delta_bar |
double | Maximum radius $\bar\Delta$ the trust-region can have. If
you specify this parameter but not Delta0 , then
Delta0 will be set to 1/8 times this parameter.
|
useRand |
Boolean | Use randomization to try to escape saddle points (false by default; you should have a good reason to activate this). If activated, the preconditioner is not used. |
kappa |
double | Parameter $\kappa$ in the stopping criterion for the inner iterations (truncated CG). See [ABG07]. |
theta |
double | Parameter $0 < \theta \leq 1$ in the stopping criterion for the inner iterations (truncated CG). Set to 1.0 (default) to reach for a quadratic convergence (if the Hessian is provided). See [ABG07]. |
rho_prime |
double | Steps (at the outer iteration level) are only accepted if $\rho$ (see the Outputs section) is at least $\rho' \in [0, 1/4)$. Defaults to 0.1. Make it smaller to accept steps more often, and larger to reject steps more often. See [ABG07]. |
rho_regularization |
double | Close to convergence, evaluating the performance ratio $\rho$ is numerically challenging. Meanwhile, close to convergence, the quadratic model should be a good fit and the steps should be accepted. Regularization lets $\rho$ go to 1 as the model decrease and actual decrease go to zero. Set this option to zero to disable regularization (not recommended). See in-code for the specifics. This heuristic is documented in the book by Conn Gould and Toint on trust-region methods, section 17.4.2. |
debug |
Integer | If set to 1 or more, will display the value of all options used and extra information at each iteration (the larger the number, the more information is displayed). |
Stopping criteria | ||
maxiter |
integer | Stop after this number of outer iterations. |
miniter |
integer | If useRand is true, do not stop before at least this many outer iterations were executed. |
maxinner |
integer | Maximum number of inner iterations (usually, the number of Hessian evaluations) per outer iteration. This option may be quite critical for the overall performance of the algorithm. Setting it too low can jeopardize the quadratic convergence of the algorithm, but setting too high can sometimes slow down the algorithm at unimportant stages of the optimization. |
mininner |
integer | Minimum number of inner iterations per outer iteration, unless negative curvature is detected or the trust-region is exceeded during the inner iterations. |
options.statsfun
is called with the point x
that was reached last,
after the accept/reject decision. Hence: if the step was accepted,
we get that new x
, with a store
which
only saw the call for the cost and for the gradient. If the step was
rejected, we get the same x
as previously, with the store
structure containing everything that was computed at that point
(possibly including previous rejects at that same point). Hence, statsfun
should not be used in conjunction with the store
to
count operations for example. Instead, you could use a global
variable and increment that variable directly from the cost related
functions. It is however possible to use statsfun
with
the store
to compute, for example, alternate merit
functions on the point x
.The first output is the best point $x$ found on the manifold. The
second output is the value of the cost function at that point. The
third output is a structure array containing information about the
iterations performed by the solver. The tutorial
page documents generic fields that may appear in the info
structure array. Among those, the present solver will populate:
iter
, time
, cost
, gradnorm
,options.statsfun
logged, if it was
provided.In addition,you will find this information:
Field name ([info."..."] ) |
Value type | Description |
rho |
double | Performance ratio $\rho_k$ of the iterate $k$: ratio between the actual cost improvement (as indicated by the cost function) and the expected cost improvement (as indicated by the quadratic model). |
rhonum , rhoden |
double | Regularized numerator and denominator of the performance ratio $\rho$, such that $\rho = \rho_{num}/\rho_{den}$. The numerator corresponds to the actual decrease and the denominator corresponds to the model decrease. |
accepted |
Boolean | Whether the iterate was accepted or rejected. |
stepsize |
double | The (Riemannian) norm of the vector returned by the inner
solver tCG and which is retracted to obtain the proposed
next iterate. If accepted is true for the
corresponding iterate, this is the size of the step from the
previous to the new iterate. If accepted is
false, the step was not executed and this is the size of the
rejected step. |
numinner |
integer | The number of inner iterations used to compute the next iterate. |
Delta |
double | The trust-region radius $\Delta_k$ at the outer iteration $k$. |
cauchy |
Boolean | (if useRand is true) indicates whether the Cauchy point was used at this iterate or not. |
Recall the structure array syntax in Matlab: [info.time]
will return a vector of all the times logged (mind the brackets).
rhoden
is theoretically always positive,
but finite numerical accuracy or a nonlinear / non-symmetric
approximate Hessian operator (or its approximation) may jeopardize
that. When this happens, the step is rejected and the trust-region
radius is reduced. If this happens too often, this is the sign
that something is wrong with the Hessian or its approximation. You
may detect this either by inspecting the text output (look for the
message "model did not
decrease") or by looking for true values in [info.rhoden]
< 0
, and of course using the checkhessian
tool. If you did not provide a Hessian or approximation, then the
generic finite-difference approximation of it is in use, getHessianFD
.
It is nonlinear and may at times lead to negative rhoden
.
If this happens, you may either try to provide an alternative
Hessian operator, or try the conjugategradient
solver for example. if you decide to copy getHessianFD
to a new file to modify it, tuning the epsilon
parameter is the first thing to try.See the examples folder. Most examples use (or could use) this solver.