Manopt, group of rotations

The special orthogonal group $\mathrm{SO}(n)$ (the set of rotations in $\mathbb{R}^n$) is endowed with a Riemannian manifold structure by considering it as a Riemannian submanifold of the embedding Euclidean space $\mathbb{R}^{n\times n}$ endowed with the usual inner product $\langle H_1, H_2 \rangle = \operatorname{trace}(H_1^T H_2)$. Proper rotation matrices are orthogonal with determinant +1, as opposed to -1, that is: reflections are not allowed. The present factory allows working with several rotation matrices simultaneously and efficiently using 3D arrays of size nxnxk in Matlab. The geometry of $\mathrm{SO}(n)^k$ is obtained from the geometry of $\mathrm{SO}(n)$ in the obvious way by element-wise extension.

Factory call: M = rotationsfactory(n, k). By default, k equals 1.

Name	Formula and numerical representation
Set	$\mathrm{SO}(n)^k = \{ X = (X_1, \ldots, X_k) \in (\mathbb{R}^{n\times n})^k : X_i^T X_i = I_n \textrm{ and } \det(X_i) = +1, i = 1:k \}$
	$X$ is represented as a 3D array `X` of size nxnxk whose slices are proper rotation matrices, i.e., `X(:, :, i).'*X(:, :, i) = eye(n)` and `det(X(:, :, i)) = 1` for `i = 1:k`.
Tangent space at $X$	$T_X \mathrm{SO}(n)^k = \{ U = (X_1S_1, \ldots, X_kS_k) \in (\mathbb{R}^{n\times n})^k : S_i + S_i^T = 0, i = 1:k \}$
	A tangent vector $U$ at $X$ is represented as an array `S` of size nxnxk such that each slice of `S` is a skew-symmetric matrix, i.e., `S(:, :, i) + S(:, :, i).' = 0` for `i = 1:k`. Thus, `S` numerically contains $(S_1, \ldots, S_k)$, not $U$. This is important to note.
Ambient space	$(\mathbb{R}^{n\times n})^k$
	Points and vectors in the ambient space are, naturally, represented as 3D arrays of size nxnxk.

Heads up! Tangent vectors are represented as skew-symmetric matrices, that is, in the Lie algebra of $\mathrm{SO}(n)$. This means that all functions of M that take tangent vectors as input or output work with skew-symmetric matrices. This skew-symmetric part is often written $\Omega$; here, we call it $S$ to have a direct correspondence between the math notation $S$ and the Matlab matrix S. In practice, for v = problem.ehess(x, u) for example, u is a representation of a tangent vector: it is skew-symmetric (or an array of skew-symmetric matrices). Use w = M.tangent2ambient(x, u) to convert u to the actual vector in the ambient space (obtained simply as x*u in the case $k = 1$.) The output v, on the other hand, should be returned simply as a vector in the ambient space (there is no conversion to be done here.)

The following table shows some of the nontrivial available functions in the structure M. The norm $\|\cdot\|$ refers to the norm in the ambient space, which is the Frobenius norm. The tutorial page gives more details about the functionality implemented by each function.

Name	Field usage	Formula
Dimension	`M.dim()`	$\operatorname{dim}\mathcal{M} = k\frac{n(n-1)}{2}$
Metric	`M.inner(X, S, T)`	$\langle U, V\rangle_X = \sum_{i=1}^k \operatorname{trace}(S_i^T T_i)$, where `S` and `T` are representations of the tangent vectors $U = (X_1S_1, \ldots, X_kS_k)$ and $V = (X_1T_1, \ldots, X_kT_k)$.
Norm	`M.norm(X, S)`	$\\|U\\|_X = \sqrt{\langle U, U \rangle_X}$
Distance	`M.dist(X, Y)`	$\operatorname{dist}(X, Y) = \sqrt{ \sum_{i=1}^k \\|\log(X_i^TY_i)\\|^2 }$, where $\log$ denotes the matrix logarithm `logm`.
Typical distance	`M.typicaldist()`	$\pi\sqrt{nk}$
Tangent space projector	`S = M.proj(X, H)`	$(P_X(H))_i = X_i S_i$ with $S_i = \operatorname{skew}(X_i^T H_i)$, for $i = 1\ldots k$ where `H` represents a vector $H = (H_1, \ldots, H_k)$ in the ambient space with slices `H(:, :, i)` corresponding to $H_i$'s and $\operatorname{skew}$ extracts the skew-symmetric part of a matrix: $\operatorname{skew}(A) = \frac{A-A^T}{2}$. Notice that only $S_i$ is computed since tangent vectors $XS$ are represented using their skew-symmetric part $S$ only. As a result, contrary to intuition, applying `proj` twice is not equivalent to applying `proj` once: this is merely a question of numerical representation. See also the function below.
Tangent space to ambient space	`H = M.tangent2ambient(X, S)`	Returns an ambient vector $H$ representing the tangent vector which is represented by `S`, with `H(:, :, i) = X(:, :, i)*S(:, :, i)`, for $i = 1\ldots k$. For a given tangent vector (represented by the skew symmetric slices `S(:, :, i)`), returns a representation of that vector in the ambient space, using the slices `H(:, :, i)`. This function is necessary because the `proj` operator takes as input an ambient vector and returns a tangent vector. To apply the `proj` again to the result (which should change nothing), it is necessary to first represent the tangent vector obtained as an ambient vector.
Euclidean to Riemannian gradient	`M.egrad2rgrad(X, egrad)`	$\operatorname{grad} f(X) = P_X(\nabla f(X))$, where `egrad` is the Euclidean gradient $\nabla f(X)$, which is a vector in the ambient space.
Euclidean to Riemannian Hessian	`M.ehess2rhess(X, egrad, ehess, S)`	$(\operatorname{Hess} f(X)[U])_i = P_{X_i}\left( (\nabla^2 f(X)[U])_i - U_i \operatorname{sym}\left( X_i^T (\nabla f(X))_i \right) \right)$, for $i = 1\ldots k$, where `S` represents the tangent vector $U = (X_1S_1, \ldots, X_kS_k)$, `egrad` is the Euclidean gradient $\nabla f(X)$, `ehess` is the Euclidean Hessian $\nabla^2 f(X)[U]$ (the two latter being vectors in the ambient space) and $\operatorname{sym}$ extracts the symmetric part of a matrix: $\operatorname{sym}(A) = \frac{A+A^T}{2}$.
Exponential map	`M.exp(X, S, t)`	$(\operatorname{Exp}_X(tU))_i = X_i\operatorname{exp}(tS_i)$ where `S` represents the tangent vector $U = (X_1S_1, \ldots, X_kS_k)$ and $\operatorname{exp}$ denotes the matrix exponential `expm`.
Retraction	`M.retr(X, S, t)`	$\left( \operatorname{Retr}_X(tU) \right)_i = \operatorname{qfactor}(X_i+tU_i)$, where $\operatorname{qfactor}$ extracts the Q-factor of the QR decomposition with positive elements on the diagonal of R. This is guaranteed to always return a proper rotation matrix, i.e., with determinant +1.
Logarithmic map	`S = M.log(X, Y)`	$(\operatorname{Log}_X(Y))_i = X_iS_i$, with $S_i = \operatorname{log}(X_i^T Y_i)$ where $\operatorname{log}$ denotes the matrix logarithm `logm`.
Random point	`M.rand()`	Returns a point uniformly at random w.r.t. the natural measure, following the algorithm described in [Mez07]. Essentially, computes the Q-factor of a QR decomposition of a matrix with i.i.d. normal entries and makes sure the determinant is +1.
Random vector	`M.randvec(X)`	Returns a unit-norm tangent vector at $X$ with uniformly random direction.
Vector transport	`M.transp(X, Y, S)`	$\left(\operatorname{Transp}_{Y\leftarrow X}(U) \right)_i = Y_iS_i$, where `S` represents a tangent vector $U = (X_1S_1, \ldots, X_kS_k)$ at $X$ that is transported to the tangent space at $Y$. Since tangent vectors are represented in the Lie algebra, there is nothing to do: `S` is returned untouched. Notice that this transport is an isometry between tangent spaces: it preserves inner products.
Pair mean	`M.pairmean(X, Y)`	$\left( \operatorname{mean}(X, Y) \right)_i = X_i \operatorname{exp}\left( \frac{\operatorname{log}(X_i^T Y_i)}{2} \right)$

Let $A, B\in\mathbb{R}^{n\times m}$ be two matrices representing clouds of $m$ points in $\mathbb{R}^n$. We search for the rotation matrix $X\in\mathrm{SO}(n)$ such that rotating all points in $B$ by $X$ will bring the points of $B$ as close as possible to the points of $A$, according to the Frobenius norm. This problem has a known solution (it is the Procrustes problem). We treat it merely for the sake of example. We minimize the cost $\frac{1}{2} \|XB - A\|^2 = \frac{1}{2} \|A\|^2 + \frac{1}{2} \|B\|^2 - \operatorname{Trace}(B^T X^T A)$, which is equivalent to minimizing the following cost:

$$f(X) = -\operatorname{Trace}(X^T \, AB^T),$$

such that $X \in \mathrm{SO}(n)$. Notice that we really are just looking for the closest rotation matrix to the matrix $AB^T$. Compute the Euclidean gradient and Hessian of $f$:

$$\nabla f(X) = -AB^T,$$

$$\nabla^2 f(X)[XS] = 0.$$

Internally, the Riemannian gradient and Hessian are obtained by applying the M.egrad2rgrad and M.ehess2rhess operators to these Euclidean quantities. This happens under the hood, as in the following code:

% Generate the problem data.
n = 3;
m = 10;
A = randn(n, m);
B = randn(n, m);
ABt = A*B.';

% Create the problem structure.
manifold = rotationsfactory(n, 1);
problem.M = manifold;

% Define the problem cost function and its Euclidean derivatives.
problem.cost = @(X) -X(:).'*ABt(:);
problem.egrad = @(X) -ABt;
% Notice that, even though the Euclidean Hessian vanishes, the Riemannian Hessian does not.
% Indeed, plugging the Euclidean gradient and Hessian explicitly in the ehess2rhess
% formula gives a non trivial operator.
% In all generality, the Euclidean Hessian would not be zero: it would be a function of S.
% You would then need to transform S (which is a representation of the tangent vector)
% into an actual ambient vector corresponding to this tangent vector. In this case,
% it is simply X*S. More generally in Manopt, this operation is done via a call to
% manifold.tangent2ambient(X, S).
problem.ehess = @(X, S) manifold.zerovec(X);

% Numerically check the differentials.
checkgradient(problem); pause;
checkhessian(problem); pause;

% Solve the problem
X = trustregions(problem);

% Compare with the known optimal solution
[U, S, V] = svd(ABt);
UVt = U*V.';
% The determinant of UVt is either 1 or -1, in theory
if abs(1 - det(UVt)) < 1e-10
    Xopt = UVt;
elseif abs(-1 - det(UVt)) < 1e-10
    % UVt is in O(n) but not SO(n). This is easily corrected for:
    J = diag([ones(n-1,1);-1]);
    Xopt = U*J*V.';
else
    error('Should never ever happen ...');
end
fprintf('This should be small: %g\n', norm(Xopt - X, 'fro'));

Notice that if you work with the cost $f(X) = \frac{1}{2} \|XB - A\|^2$ directly, your Euclidean gradient becomes $\nabla f(X) = (XB-A)B^T = XBB^T - AB^T$ and there seems to be an extra term in there. But this term, $XBB^T$, is of the form "$X$ times something symmetric", so that the orthogonal projection to the tangent space at $X$ will get rid of it.

More interestingly, one might try to solve the orthogonal Procrustes problem with more than two matrices, as in [TB77]. This is linked to the algorithms proposed in [BSA13], with Manopt code.

For theory on Riemannian submanifolds, see [AMS08], section 3.6.1 (first-order derivatives) and section 5.3.3 (second-order derivatives, i.e., connections).

For content specifically about the Lie group of rotations, [Bum04] is one of many resources.

For an application to the problem of interpolation and regression of rotation matrices, see [BA11].

For algorithms that generate uniformly random rotations, see [Mez07].

[AMS08] P.-A. Absil, R. Mahony and R. Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, 2008.
[BA11] N. Boumal and P.-A. Absil, A discrete regression method on manifolds and its application to data on SO(n), 18th IFAC World Congress, 2011.
[Bum04] D. Bump, Lie groups, vol. 225 of Graduate Texts in Mathematics. Springer, 2004.
[Mez07] F. Mezzadri, How to generate random matrices from the classical compact groups, Notices of the AMS, 54(5), 592–604, 2007.
[TB77] J.M.F. Ten Berge, Orthogonal Procrustes rotation for two or more matrices, Psychometrika, 42(2), 267–276, 1977.
[BSA13] N. Boumal, A. Singer and P.-A. Absil, Robust estimation of rotations from relative measurements by maximum likelihood, CDC 2013.

Generalized Procrustes

A Riemannian geometry for the special orthogonal group (the set of rotations)

Toolset

Example

References

List of examples