The sphere $\mathbb{S}^{nm-1}$ (the set of unit Frobenius norm matrices of size nxm) is endowed with a Riemannian manifold structure by considering it as a Riemannian submanifold of the embedding Euclidean space $\mathbb{R}^{n\times m}$ endowed with the usual inner product $\langle H_1, H_2 \rangle = \operatorname{trace}(H_1^T H_2)$.
Factory call: M = spherefactory(n, m)
. By default, m
equals 1. If you need to work with several vectors of unit norm, have a look at the oblique manifold.
Name | Formula | Numerical representation |
Set | $\mathbb{S}^{nm-1} = \{ X \in \mathbb{R}^{n\times m} : \|X\| = 1 \}$ | $X$ is represented as a matrix X of size nxm with unit Frobenius norm, i.e., X(:).'*X(:) = 1 . |
Tangent space at $X$ | $T_X \mathbb{S}^{nm-1} = \{ U \in \mathbb{R}^{n\times m} : \operatorname{trace}(X^T U) = 0 \}$ | A tangent vector $U$ at $X$ is represented as a matrix U of size nxm such that $\operatorname{trace}(X^T U) = 0$, i.e., X(:).'*U(:) = 0 . |
Ambient space | $\mathbb{R}^{n\times m}$ | Points and vectors in the ambient space are, naturally, represented as matrices of size nxm. |
The following table shows some of the nontrivial available functions in the structure M
. The norm $\|\cdot\|$ refers to the norm in the ambient space, which is the Frobenius norm. The tutorial page gives more details about the functionality implemented by each function.
Name | Field usage | Formula |
Dimension | M.dim() |
$\operatorname{dim}\mathcal{M} = nm-1$ |
Metric | M.inner(X, U, V) |
$\langle U, V\rangle_X = \operatorname{trace}(U^T V)$ |
Norm | M.norm(X, U) |
$\|U\|_X = \sqrt{\langle U, U \rangle_X}$ |
Distance | M.dist(X, Y) |
$\operatorname{dist}(X, Y) = \arccos(\operatorname{trace}(X^T Y))$ |
Typical distance | M.typicaldist() |
$\pi$ |
Tangent space projector | M.proj(X, H) |
$P_X(H) = H - \operatorname{trace}(X^T H) X$, where H represents a vector in the ambient space. |
Euclidean to Riemannian gradient | M.egrad2rgrad(X, egrad) |
$\operatorname{grad} f(X) = P_X(\nabla f(X))$, where egrad represents the Euclidean gradient $\nabla f(X)$, which is a vector in the ambient space. |
Euclidean to Riemannian Hessian | M.ehess2rhess(X, egrad, ehess, U) |
$\operatorname{Hess} f(X)[U] = P_X(\nabla^2 f(X)[U]) - \operatorname{trace}(X^T \nabla f(X)) U$, where egrad represents the Euclidean gradient $\nabla f(X)$ and ehess represents the Euclidean Hessian $\nabla^2 f(X)[U]$, both being vectors in the ambient space. |
Exponential map | M.exp(X, U, t) |
$\operatorname{Exp}_X(tU) = \cos(\|tU\|) X + \frac{\sin(\|tU\|)}{\|U\|} U$ |
Retraction | M.retr(X, U, t) |
$\operatorname{Retr}_X(tU) = \frac{X+tU}{\|X+tU\|}$ (Frobenius norm in the ambient space) |
Logarithmic map | M.log(X, Y) |
$\operatorname{Log}_X(Y) = \frac{\operatorname{dist}(X, Y)}{\|P_X(Y-X)\|} P_X(Y-X)$ |
Random point | M.rand() |
Returns a point uniformly at random w.r.t. the natural measure as follows: generate $Y$ with i.i.d. normal entries; return $Y/\|Y\|$. |
Random vector | M.randvec(X) |
Returns a unit-norm tangent vector at $X$ with uniformly random direction, obtained as follows: generate $H$ with i.i.d. normal entries; return $P_X(H)/\|P_X(H)\|$. |
Vector transport | M.transp(X, Y, U) |
$\operatorname{Transp}_{Y\leftarrow X}(U) = P_Y(U)$, where $U$ is a tangent vector at $X$ that is transported to the tangent space at $Y$. |
Pair mean | M.pairmean(X, Y) |
$\operatorname{mean}(X, Y) = \frac{X+Y}{\|X+Y\|}$ |
This example continues the example from the tutorial page. Let $A\in\mathbb{R}^{n\times n}$ be a symmetric matrix. We minimize the following cost function:
$$f(x) = -x^T Ax,$$
such that $x^T x = 1$, that is, such that $x$ belongs to the sphere $\mathbb{S}^{n-1}$. Compute the Euclidean gradient and Hessian of $f$:
$$\nabla f(x) = -2Ax,$$
$$\nabla^2 f(x)[u] = -2Au.$$
The Riemannian gradient and Hessian are obtained by applying the M.egrad2rgrad
and M.ehess2rhess
operators:
$$\operatorname{grad} f(x) = (I-xx^T) \nabla f(x),$$
$$\operatorname{Hess} f(x)[u] = (I-xx^T)\nabla^2 f(x)[u] - (x^T \nabla f(x)) u.$$
Notice that there is no need to compute these explicitly: it suffices to write code for the Euclidean quantities and to apply the conversion tools on them to obtain the Riemannian quantities, as in the following code:
% Generate the problem data. n = 1000; A = randn(n); A = .5*(A+A'); % Create the problem structure. manifold = spherefactory(n); problem.M = manifold; % Define the problem cost function and its derivatives.
problem.cost = @(x) -x'*(A*x);
egrad = @(x) -2*A*x;
ehess = @(x, u) -2*A*u; problem.grad = @(x) manifold.egrad2rgrad(x, egrad(x));
problem.hess = @(x, u) manifold.ehess2rhess(x, egrad(x), ehess(x, u), u);
% Numerically check the differentials.
checkgradient(problem); pause;
checkhessian(problem); pause;
% Solve
x = trustregions(problem);
Of course, this is not efficient since the computation of $Ax$ should be reused when computing $f(x)$ and $\nabla f(x)$, just like$\nabla f(x)$ should be reused to compute $\nabla^2 f(x)[u]$. See the various ways of describing the cost function for better alternatives.
For theory on Riemannian submanifolds, see [AMS08], section 3.6.1 (first-order derivatives) and section 5.3.3 (second-order derivatives, i.e., connections).
For content specifically about the sphere, see [AMS08] and look up examples 3.5.1 (tangent space), 3.6.1 (metric, projector), 4.1.1 (retraction), 5.3.1 (connection), 5.4.1 (exponential map, i.e., geodesics) and 8.1.1 (parallel translation, not implemented), 8.1.4 and 8.1.7 (vector transport).
[AMS08] P.-A. Absil, R. Mahony and R. Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, 2008.
The first example on the tutorial page is an example on the sphere. Section 6.4.1 in [AMS08] also gives an example of second-order optimization on the sphere.
We are still working on building a collection of examples for Manopt. The relevant ones will be referenced here when the time comes.