Solvers
Available optimization algorithms and how to use them
General description
Solvers (that is, optimization algorithms) are functions in Manopt. Built-in solvers are located in /manopt/solvers
. In principle, all solvers accept the following basic call format:
x = mysolver(problem)
The returned value x
is a point on the manifold problem.M
. Depending on the properties of your problem and on the guarantees of the solver, x
is more or less close to a good minimizer of the cost function described in the problem
structure.
Bear in mind that we are dealing with usually nonconvex, and possibly nonsmooth or derivative-free optimization, so that it is in general not guaranteed that x
is a global minimizer of the cost. For smooth problems with gradient information though, most decent algorithms guarantee that x
is (approximately) a critical point. Typically, we even expect an approximate local minimizer, but even that is usually not guaranteed in all cases: this is a fundamental limitation of continuous optimization.
All provided solvers are minimization algorithms. If you want to maximize your objective function, multiply it by -1 (and likewise for the derivatives of the objective function), as we did in the first example.
All solvers also admit a more complete call format:
x, xcost, info, options] = mysolver(problem, x0, options) [
Inputs:
- The
problem
structure specifies the manifold and describes the cost function. - Optionally,
x0
is an initial guess, or initial iterate, for the solver. It is typically a point on the manifoldproblem.M
, but may be something else depending on the solver. It can be omitted by passing the empty matrix[]
instead. - Optionally, the
options
structure is used to fine tune the behavior of the optimization algorithm. On top of hosting the algorithmic parameters, it manages the stopping criteria as well as what information needs to be displayed and / or logged during execution.
Outputs:
x
is the returned point on the manifold.xcost
is the value of the cost function atx
.- The
info
struct-array is described below, and contains information collected at each iteration of the solver’s progress. - The
options
structure is returned too, so you can see what default values the solver used on top of the options you (possibly) specified.
Available solvers
The toolbox comes with a handful of solvers.
The most trust-worthy is the trust-regions algorithm. Originally, it is a modification of the code of GenRTR.
The toolbox was designed to accommodate many more solvers though: we have since then added BFGS-style solvers, stochastic gradient descent and more. In particular, we look forward to proposing algorithms for nonsmooth cost functions (which notably arise when L1 penalties are at play). You can also propose your own solvers.
Name | \(f\) | \(\nabla f\) | \(\nabla^2 f\) | \(P\) | Comment |
---|---|---|---|---|---|
Trust-regions (RTR) | ★ | ★ | ★ | ☆ | #1 choice for smooth optimization. Still good without true Hessian. |
Adaptive regularization by cubics (ARC) | ★ | ★ | ★ | Alternative to RTR. Still good without true Hessian. | |
Steepest descent | ★ | ★ | Gradient descent (GD) with line-search (default is backtracking). | ||
Conjugate gradient | ★ | ★ | ☆ | Often better than steepest descent. Requires M.transp . |
|
Barzilai-Borwein | ★ | ★ | Gradient descent with BB step-size heuristic. Requires M.transp . |
||
BFGS | ★ | ★ | Limited-memory version of BFGS. Requires M.transp . |
||
SGD | ★’ | Stochastic gradient algorithm for optimization of large sums. | |||
Particle swarm (PSO) | ★ | Derivative-free optimizer (DFO) based on a population of points. | |||
Nelder-Mead | ★ | DFO based on a simplex; requires M.pairmean . |
The options structure
In Manopt, all options are optional. Standard options are assigned a default value at the toolbox level in /manopt/core/getGlobalDefaults.m
(it’s a core tool, best not to edit it). Solvers then overwrite and complement these options with solver-specific fields. These options are in turn overwritten by the user-specified options, if any.
The subsections below list commonly used options. See each solver’s documentation for specific information (e.g., type help trustregions
).
Generic options
The following table lists options to control text output during solver execution, whether additional debugging checks are run, and a (deprecated) option to control memory usage from caching.
Field name (options."..." ) |
Value type | Description |
---|---|---|
verbosity |
integer | Controls how much information a solver outputs during execution; 0: no output; 1 : output at init and at exit; 2: light output at each iteration; 3+: all you can read. |
debug |
integer | If larger than 0, the solver may perform additional computations for debugging purposes. |
storedepth |
integer | Maximum number of store structures that may be kept in memory (see the cost description page). As of Manopt 5.0, this is mostly irrelevant because the main solvers do a much better job of discarding stale information on the go. |
Options for stopping criterion
Solvers stop running when one of several stopping criteria triggers. The following table lists options for some standard criteria.
Field name (options."..." ) |
Value type | Description |
---|---|---|
maxiter |
integer | Limits the number of iterations of the solver. |
maxtime |
double | Limits the execution time of the solver, in seconds. |
tolcost |
double | Stop as soon as the cost drops below this tolerance. |
tolgradnorm |
double | Stop as soon as the norm of the gradient drops below this tolerance. |
stopfun |
fun. handle | Custom stopping criterion: see below. |
Some solvers offer additional stopping criteria: type help solvername
to learn more.
Users can also define custom stopping criteria with options.stopfun
. The general usage pattern is as follows:
% define your problem structure, then:
options.stopfun = @mystopfun;
x = mysolver(problem, [], options);
function [stopnow, reason] = mystopfun(problem, x, info, last)
stopnow = false;
reason = '';
if xyz % decide if solver should stop based on inputs
stopnow = true;
reason = 'This optional message explains why we stopped.';
end
end
The function handle options.stopfun
is called after each iteration completes. As input, it receives:
- the
problem
structure, - the current point
x
, - the whole
info
struct-array built so far, and - an index
last
such thatinfo(last)
is the structure pertaining to the current iteration (this is becauseinfo
is pre-allocated, so thatinfo(end)
typically does not refer to the current iteration).
The outputs are:
- a Boolean
stopnow
: the solver terminates if this is true; - optionally, a string
reason
which explains why the criterion triggered. This is displayed by the solver ifoptions.verbosity > 0
.
Consider the following example:
options.stopfun = @mystopfun;
function stopnow = mystopfun(problem, x, info, last)
stopnow = (last >= 3 && info(last-2).cost - info(last).cost < 1e-3);
end
This tells the solver to exit as soon as two successive iterations combined have decreased the cost by less than \(10^{-3}\).
See also the tools page for a list of built-in ways to use stopfun
to good effect. For example:
- Stop a solver on demand by closing a figure or by deleting a file. These allow to gracefully interrupt a solver interactively without breaking program execution. The second one is especially useful when running code remotely without GUI.
- Stop a solver using Manopt counters. This makes it easy to stop after a certain budget of function calls, matrix-vector products etc. has been exceeded.
Statsfun option for recording info at each iteration
The function handle options.statsfun
is called after each iteration completes. It provides an opportunity to process information about each iteration. The main purpose is to store information, but this option can also be used to print, save, plot, etc.
Here is an example:
% define your problem structure, then:
options.statsfun = @mystatsfun;
x, fx, info] = mysolver(problem, [], options);
[
function stats = mystatsfun(problem, x, stats, store)
stats.current_point = x;
end
This logs all the points visited during the optimization process in the info
struct-array returned by the solver. One could also write x
to disk during this call, or update a plot (if that is useful).
As input, statsfun
receives:
- the
problem
structure, - the current point
x
, - the
stats
structure for the current iteration as recorded in theinfo
struct-array, and - (optionally) the
store
structure with the cache forx
(and the common cache instore.shared
).
The output is the stats
structure. This function gives you a chance to modify the stats
structure, hence to add fields if you want to.
Bear in mind that structures in a struct-array must all have the same fields: if statsfun
adds a certain field to a stats
structure, it must do so for all iterations. (In some instances, it is even important that the fields be created in the same order.)
Time spent in statsfun
is discounted from execution time recorded in [info.time]
, as this is typically only used for prototyping, debugging, plotting etc.
If you include store
as an input for the statsfun
, this lets you access the data you cached, both for that particular iterate. You also get access to the cache in store.shared
, common to all points x
visited by the solver so far. Note that statsfun
can read but it cannot write to store
(modifications are lost).
An alternative to creating a statsfun
by hand is to use the statsfunhelper
tool. This is sometimes simpler (and allows to pass inline functions). The example above simplifies to:
options.statsfun = statsfunhelper('current_point', @(x) x);
The helper can also be used to log more than one metric, by passing it a structure. In the example below, x_reference
is a certain point on the manifold problem.M
. The stats structures will include fields current_point
and dist_to_ref
. Notice how the function handles can take different inputs. See the help of that tool for more info.
metrics.current_point = @(x) x;
metrics.dist_to_ref = @(problem, x) problem.M.dist(x, x_reference);
options.statsfun = statsfunhelper(metrics);
See also how to use Manopt counters to keep track of things such as the number of calls to cost, gradient and Hessian, or other special operations such as matrix-vector products. These counters are registered at every iteration and available in the returned stats structure. They can also be used in a stopping criterion.
statsfun
is called after each iteration completes
Upon completion of an iteration, options.statsfun
is called with the point x
that was reached last, even if that x
did not change compared to the previous iteration. For example, with trustregions
, this triggers after the accept/reject decision. If the step was accepted, we get that new x
, optionally with its store
. If the step was rejected, we get the same x
as previously, likewise with its store
. The nuance is this: there is a stats
structure for each iteration, but there is a store
structure for each point.
Option to specify a line-search
Some solvers, such as steepestdescent
and conjugategradient
, need to solve a line-search problem at each iteration. That is, they need to (approximately) solve the one-dimensional optimization problem:
\[
\min_{t\geq 0} \phi(t) = f(\Retr_x(td)),
\]
where \(x\) is the current point on the manifold, \(d\) is a tangent vector at \(x\) (the search direction), \(\Retr\) is the retraction on the manifold and \(f\) is the cost function. Assuming \(d\) is a descent direction, there exists \(t > 0\) such that \(\phi(t) < \phi(0) = f(x)\). The purpose of a line-search algorithm is to find such a real number \(t\).
Manopt includes certain generic purpose line-search algorithms. To force the use of one of them or of your own, specify this in the options structure (not in the problem structure), for example as follows:
options.linesearch = @linesearch_adaptive;
Each line-search algorithm accepts its own options which can be added in this same options
structure passed to the main solver. See each line-search’s help for details.
For certain problems, you may want to implement your own line-search, typically in order to exploit structure specific to the problem at hand. To this end, it is best to start from an existing line-search function and to adapt it. Alternatively (and perhaps more easily), you may specify a linesearch
function in the problem
structure (see the cost description page) and use a line-search that uses it, to incorporate the additional information you supply there. Do not hesitate to ask for help on the forum if you have questions on this feature.
The info struct-array
The various solvers log information at each iteration about their progress. This information is returned in the output info
.
This is a struct-array, that is, an array of structures. Read this MathWorks blog post for more on this data container in Matlab.
For example, to extract a vector containing the cost value at each iteration, call [info.cost]
with the brackets. To plot the cost function value against computation time, call
plot([info.time], [info.cost]);
The following table lists some indicators that might be present in the info
output.
Field name ([info."..."] ) |
Value type | Description |
---|---|---|
iter |
integer | Iteration number (0 corresponds to the initial guess). |
time |
double | Elapsed execution time until completion of the iterate, in seconds. |
cost |
double | Attained value of the cost function. |
gradnorm |
double | Attained value for the norm of the Riemannian gradient. |
A specific solver may not populate all of these fields and may provide additional fields, which would then be described in the solver’s documentation.
Include a statsfun
in your options
structure: read more here. Also read about Manopt counters to keep track of things such as function calls, Hessian calls, matrix products, etc.
info.time
The execution time is logged without incorporating time spent in statsfun
, as it usually performs computations that are not needed to solve the optimization problem. If, however, your statsfun
influences the final output of the solver (e.g., because it logs information that you then use in your stopfun
criterion), then you should add the relevant computation time to the stats.time
field.