The special orthogonal group $\mathrm{SO}(n)$ (the set of rotations in $\mathbb{R}^n$) is endowed with a Riemannian manifold structure by considering it as a Riemannian submanifold of the embedding Euclidean space $\mathbb{R}^{n\times n}$ endowed with the usual inner product $\langle H_1, H_2 \rangle = \operatorname{trace}(H_1^T H_2)$. Proper rotation matrices are orthogonal with determinant +1, as opposed to -1, that is: reflections are not allowed. The present factory allows working with several rotation matrices simultaneously and efficiently using 3D arrays of size nxnxk in Matlab. The geometry of $\mathrm{SO}(n)^k$ is obtained from the geometry of $\mathrm{SO}(n)$ in the obvious way by element-wise extension.
Factory call: M = rotationsfactory(n, k)
. By
default, k
equals 1.
Name | Formula and numerical representation |
Set | $\mathrm{SO}(n)^k = \{ X = (X_1, \ldots, X_k) \in (\mathbb{R}^{n\times n})^k : X_i^T X_i = I_n \textrm{ and } \det(X_i) = +1, i = 1:k \}$ |
|
$X$ is represented as a 3D array X of size nxnxk whose slices
are proper rotation matrices, i.e., X(:, :, i).'*X(:,
:, i) = eye(n) and det(X(:, :, i)) = 1
for i = 1:k . |
Tangent space at $X$ | $T_X \mathrm{SO}(n)^k = \{ U = (X_1S_1, \ldots, X_kS_k) \in (\mathbb{R}^{n\times n})^k : S_i + S_i^T = 0, i = 1:k \}$ |
|
A tangent vector $U$ at $X$ is represented as an array S
of size nxnxk such
that each slice of S is a skew-symmetric
matrix, i.e., S(:, :, i) + S(:, :, i).' = 0
for i = 1:k . Thus, S numerically
contains $(S_1, \ldots, S_k)$, not $U$. This is important to
note. |
Ambient space | $(\mathbb{R}^{n\times n})^k$ |
|
Points and vectors in the ambient space are, naturally, represented as 3D arrays of size nxnxk. |
M
that take tangent vectors as input or output
work with skew-symmetric matrices. This skew-symmetric part is
often written $\Omega$; here, we call it $S$ to have a direct
correspondence between the math notation $S$ and the Matlab matrix
S
. In practice, for v = problem.ehess(x, u)
for example, u
is a representation of a tangent
vector: it is skew-symmetric (or an array of skew-symmetric
matrices). Use w = M.tangent2ambient(x, u)
to
convert u
to the actual vector in the ambient space
(obtained simply as x*u
in the case $k = 1$.) The
output v
, on the other hand, should be returned
simply as a vector in the ambient space (there is no conversion to
be done here.)The following table shows some of the nontrivial available
functions in the structure M
. The norm $\|\cdot\|$
refers to the norm in the ambient space, which is the Frobenius
norm. The tutorial page
gives more details about the functionality implemented by each
function.
Name | Field usage | Formula |
Dimension | M.dim() |
$\operatorname{dim}\mathcal{M} = k\frac{n(n-1)}{2}$ |
Metric | M.inner(X, S, T) |
$\langle U, V\rangle_X = \sum_{i=1}^k
\operatorname{trace}(S_i^T T_i)$, where S and
T are representations of the tangent vectors $U
= (X_1S_1, \ldots, X_kS_k)$ and $V = (X_1T_1, \ldots,
X_kT_k)$. |
Norm | M.norm(X, S) |
$\|U\|_X = \sqrt{\langle U, U \rangle_X}$ |
Distance | M.dist(X, Y) |
$\operatorname{dist}(X, Y) = \sqrt{ \sum_{i=1}^k
\|\log(X_i^TY_i)\|^2 }$, where $\log$ denotes the matrix
logarithm logm . |
Typical distance | M.typicaldist() |
$\pi\sqrt{nk}$ |
Tangent space projector | S = M.proj(X, H) |
$(P_X(H))_i = X_i S_i$ with $S_i =
\operatorname{skew}(X_i^T H_i)$, for $i = 1\ldots k$ where H
represents a vector $H = (H_1, \ldots, H_k)$ in the ambient
space with slices H(:, :, i) corresponding to
$H_i$'s and $\operatorname{skew}$ extracts the
skew-symmetric part of a matrix: $\operatorname{skew}(A) =
\frac{A-A^T}{2}$. Notice that only $S_i$ is computed since
tangent vectors $XS$ are represented using their
skew-symmetric part $S$ only. As a result, contrary to
intuition, applying proj twice is not
equivalent to applying proj once: this is
merely a question of numerical representation. See also the
function below. |
Tangent space to ambient space | H = M.tangent2ambient(X, S) |
Returns an ambient vector $H$ representing the tangent
vector which is represented by S , with H(:,
:, i) = X(:, :, i)*S(:, :, i) , for $i = 1\ldots k$.
For a given tangent vector (represented by the skew
symmetric slices S(:, :, i) ), returns a
representation of that vector in the ambient space, using
the slices H(:, :, i) . This function is
necessary because the proj operator takes as
input an ambient vector and returns a tangent vector. To
apply the proj again to the result (which
should change nothing), it is necessary to first represent
the tangent vector obtained as an ambient vector. |
Euclidean to Riemannian gradient | M.egrad2rgrad(X, egrad) |
$\operatorname{grad} f(X) = P_X(\nabla f(X))$, where egrad
is the Euclidean gradient $\nabla f(X)$, which is a vector
in the ambient space. |
Euclidean to Riemannian Hessian | M.ehess2rhess(X, egrad, ehess, S) |
$(\operatorname{Hess} f(X)[U])_i = P_{X_i}\left(
(\nabla^2 f(X)[U])_i - U_i
\operatorname{sym}\left( X_i^T (\nabla f(X))_i \right)
\right)$, for $i = 1\ldots k$, where S
represents the tangent vector $U = (X_1S_1, \ldots,
X_kS_k)$, egrad is the Euclidean gradient
$\nabla f(X)$, ehess is the Euclidean Hessian
$\nabla^2 f(X)[U]$ (the two latter being vectors in the
ambient space) and $\operatorname{sym}$ extracts the
symmetric part of a matrix: $\operatorname{sym}(A) =
\frac{A+A^T}{2}$. |
Exponential map | M.exp(X, S, t) |
$(\operatorname{Exp}_X(tU))_i =
X_i\operatorname{exp}(tS_i)$ where S
represents the tangent vector $U = (X_1S_1, \ldots, X_kS_k)$
and $\operatorname{exp}$ denotes the matrix exponential expm . |
Retraction | M.retr(X, S, t) |
$\left( \operatorname{Retr}_X(tU) \right)_i = \operatorname{qfactor}(X_i+tU_i)$, where $\operatorname{qfactor}$ extracts the Q-factor of the QR decomposition with positive elements on the diagonal of R. This is guaranteed to always return a proper rotation matrix, i.e., with determinant +1. |
Logarithmic map | S = M.log(X, Y) |
$(\operatorname{Log}_X(Y))_i = X_iS_i$, with $S_i =
\operatorname{log}(X_i^T Y_i)$ where $\operatorname{log}$
denotes the matrix logarithm logm . |
Random point | M.rand() |
Returns a point uniformly at random w.r.t. the natural measure, following the algorithm described in [Mez07]. Essentially, computes the Q-factor of a QR decomposition of a matrix with i.i.d. normal entries and makes sure the determinant is +1. |
Random vector | M.randvec(X) |
Returns a unit-norm tangent vector at $X$ with uniformly random direction. |
Vector transport | M.transp(X, Y, S) |
$\left(\operatorname{Transp}_{Y\leftarrow X}(U) \right)_i
= Y_iS_i$, where S represents a tangent vector
$U = (X_1S_1, \ldots, X_kS_k)$ at $X$ that is transported to
the tangent space at $Y$. Since tangent vectors are
represented in the Lie algebra, there is nothing to do: S
is returned untouched. Notice that this transport is an
isometry between tangent spaces: it preserves inner
products. |
Pair mean | M.pairmean(X, Y) |
$\left( \operatorname{mean}(X, Y) \right)_i = X_i \operatorname{exp}\left( \frac{\operatorname{log}(X_i^T Y_i)}{2} \right)$ |
Let $A, B\in\mathbb{R}^{n\times m}$ be two matrices representing clouds of $m$ points in $\mathbb{R}^n$. We search for the rotation matrix $X\in\mathrm{SO}(n)$ such that rotating all points in $B$ by $X$ will bring the points of $B$ as close as possible to the points of $A$, according to the Frobenius norm. This problem has a known solution (it is the Procrustes problem). We treat it merely for the sake of example. We minimize the cost $\frac{1}{2} \|XB - A\|^2 = \frac{1}{2} \|A\|^2 + \frac{1}{2} \|B\|^2 - \operatorname{Trace}(B^T X^T A)$, which is equivalent to minimizing the following cost:
$$f(X) = -\operatorname{Trace}(X^T \, AB^T),$$
such that $X \in \mathrm{SO}(n)$. Notice that we really are just looking for the closest rotation matrix to the matrix $AB^T$. Compute the Euclidean gradient and Hessian of $f$:
$$\nabla f(X) = -AB^T,$$
$$\nabla^2 f(X)[XS] = 0.$$
Internally, the Riemannian gradient and Hessian are obtained by
applying the M.egrad2rgrad
and M.ehess2rhess
operators to these Euclidean quantities. This happens under the
hood, as in the following code:
% Generate the problem data.
n = 3;
m = 10;
A = randn(n, m);
B = randn(n, m);
ABt = A*B.';
% Create the problem structure. manifold = rotationsfactory(n, 1); problem.M = manifold; % Define the problem cost function and its Euclidean derivatives.
problem.cost = @(X) -X(:).'*ABt(:); problem.egrad = @(X) -ABt;
% Notice that, even though the Euclidean Hessian vanishes, the Riemannian Hessian does not.
% Indeed, plugging the Euclidean gradient and Hessian explicitly in the ehess2rhess
% formula gives a non trivial operator.
% In all generality, the Euclidean Hessian would not be zero: it would be a function of S.
% You would then need to transform S (which is a representation of the tangent vector)
% into an actual ambient vector corresponding to this tangent vector. In this case,
% it is simply X*S. More generally in Manopt, this operation is done via a call to
% manifold.tangent2ambient(X, S).
problem.ehess = @(X, S) manifold.zerovec(X);
% Numerically check the differentials.
checkgradient(problem); pause;
checkhessian(problem); pause;
% Solve the problem
X = trustregions(problem);
% Compare with the known optimal solution
[U, S, V] = svd(ABt);
UVt = U*V.';
% The determinant of UVt is either 1 or -1, in theory
if abs(1 - det(UVt)) < 1e-10
Xopt = UVt;
elseif abs(-1 - det(UVt)) < 1e-10
% UVt is in O(n) but not SO(n). This is easily corrected for:
J = diag([ones(n-1,1);-1]);
Xopt = U*J*V.';
else
error('Should never ever happen ...');
end
fprintf('This should be small: %g\n', norm(Xopt - X, 'fro'));
Notice that if you work with the cost $f(X) = \frac{1}{2} \|XB - A\|^2$ directly, your Euclidean gradient becomes $\nabla f(X) = (XB-A)B^T = XBB^T - AB^T$ and there seems to be an extra term in there. But this term, $XBB^T$, is of the form "$X$ times something symmetric", so that the orthogonal projection to the tangent space at $X$ will get rid of it.
More interestingly, one might try to solve the orthogonal Procrustes problem with more than two matrices, as in [TB77]. This is linked to the algorithms proposed in [BSA13], with Manopt code.
For theory on Riemannian submanifolds, see [AMS08], section 3.6.1 (first-order derivatives) and section 5.3.3 (second-order derivatives, i.e., connections).
For content specifically about the Lie group of rotations, [Bum04] is one of many resources.
For an application to the problem of interpolation and regression of rotation matrices, see [BA11].
For algorithms that generate uniformly random rotations, see [Mez07].
[AMS08] P.-A. Absil, R. Mahony and R. Sepulchre, Optimization Algorithms
on Matrix Manifolds, Princeton University Press, 2008.
[BA11] N. Boumal and P.-A. Absil, A
discrete regression method on manifolds and its application to
data on SO(n), 18th IFAC World Congress, 2011.
[Bum04] D. Bump, Lie
groups, vol. 225 of Graduate Texts in Mathematics. Springer,
2004.
[Mez07] F. Mezzadri, How
to generate random matrices from the classical compact groups,
Notices of the AMS, 54(5), 592–604, 2007.
[TB77] J.M.F. Ten Berge, Orthogonal
Procrustes rotation for two or more matrices, Psychometrika,
42(2), 267–276, 1977.
[BSA13] N. Boumal, A. Singer and P.-A. Absil, Robust
estimation of rotations from relative measurements by maximum
likelihood, CDC 2013.