Rotation matrices: the special orthogonal group

Comment about the finer points of how to use rotationsfactory in Manopt

The group of rotations in \(\Rn\) is the special orthogonal group \[ \SOn = \{ X \in \Rnn : X\transpose X = I_n \textrm{ and } \det(X) = 1 \}. \] This is one of the two connected components of the orthogonal group (matrices in the other component satisfy \(\det(X) = -1\)).

Differentiating the constraint \(X\transpose X = I_n\) along \(\dot X\), we obtain the defining equation for the tangent space at \(X\). Explicitly, \[ \dot X\transpose X + X\transpose \dot X = 0. \] Thus, a tangent vector \(\dot X\) is a matrix in \(\Rnn\) such that \(X\transpose \dot X\) is skew-symmetric. Equivalently, \(\dot X\) is of the form \(X\Omega\) where \(\Omega\) is a skew-symmetric matrix in \(\Rnn\). Accordingly, the tangent space at \(X\) is the following subspace of \(\Rnn\): \[ \T_X \SOn = \{ X\Omega : \Omega + \Omega\transpose = 0 \}. \] Since \(X\) is fixed, we see that in order to identify a tangent vector \(\dot X\) it is equivalent to specify either \(\dot X\) itself (as a matrix of size \(n \times n\)) or \(\Omega\) (a skew-symmetric matrix of size \(n \times n\), with the understanding that \(\Omega\) represents \(\dot X = X\Omega\)).

In practice, it is common that the first thing we need to compute with a tangent vector is the product \(X\transpose \dot X\), which happens to be \(\Omega\). For this reason, in Manopt’s rotationsfactory, tangent vectors are stored as \(\Omega\), not as \(\dot X\).

In constrast, we still see \(\SOn\) as an embedded submanifold of \(\Rnn\), and all objects in \(\Rnn\) are represented as themselves (matrices of size \(n \times n\)).

This is important to keep in mind whenever we work with a matrix that comes from or that we provide to rotationsfactory. We must ask: is this a tangent vector at \(X\) (represented as \(\Omega\)), or is this in the embedding space (represented as itself)? Explicitly:

The manifold structure M = rotationsfactory(n) provides convenient tools to work with these representations:

As an example, consider the following optimization problem over rotation matrices. Let \(\bar f \colon \Rnn \to \reals\) be defined by \[ \bar f(X) = \frac{1}{2}\sqfrobnorm{AXB - C}, \] where \(A, B, C\) are given matrices in \(\Rnn\). We wish to minimize \(f \colon \SOn \to \reals\), where \(f = \bar f|_{\SOn}\) is simply the restriction of \(\bar f\) to \(\SOn\).

The Euclidean gradient and Hessian of \(\bar f\) are as follows: \[ \grad \bar f(X) = A\transpose ( AXB - C ) B\transpose \] and, for all \(Z \in \Rnn\), \[ \Hess \bar f(X)[Z] = A\transpose ( AZB ) B\transpose. \] These can be converted to the Riemannian gradient and Hessian of \(f\) in the usual way: see for example Section 7.4 in this book. Manopt does it automatically though: we just need to provide \(\grad \bar f\) and \(\Hess \bar f\) in problem.egrad and problem.ehess.

Here is how we would implement this in Manopt using rotationsfactory to handle \(\SOn\). The tabs below provide two things:

  1. The code to define the problem with cost, egrad and two versions of ehess:
    • An incorrect one, which we might write if unaware of the subtleties discussed above; then
    • A correct one, where we transform Omg from tangent space to embedding space, that is, from Omg (skew symmetric) to X*Omg.
  2. The messages printed by checkhessian for the incorrect Hessian.
n = 8;
A = randn(n, n);
B = randn(n, n);
C = randn(n, n);

SOn = rotationsfactory(n);
problem.M = SOn;
problem.cost = @(X) .5*norm(A*X*B - C, 'fro')^2; % f(X)
problem.egrad = @(X) A'*(A*X*B - C)*B';          % grad f(X) in R^(nxn)

% Wrong.
% This code treats the second input of ehess as if it were in
% the embedding space. Optimization algorithms won't be amused.
problem.ehess = @(X, Xdot) A'*(A*Xdot*B)*B';

% Correct.
% This code notes that the second input to ehess is a representation of
% a tangent vector. It transforms Omg to X*Omg, which is the direction
% that is actually encoded by Omg, and which we often call Xdot.
% Equivalently, we can call SOn.tangent2ambient(X, Omg) instead of X*Omg.
problem.ehess = @(X, Omg) A'*(A*X*Omg*B)*B';

figure(1); clf;
checkgradient(problem);

figure(2); clf;
checkhessian(problem);

X = trustregions(problem);

This is the text output for checkhessian when using the wrong code for problem.ehess. Notice that this wrong ehess fails both the slope test and the symmetry test.

# Hessian check
The slope should be 3. It appears to be: 2.00001.
If it is far from 3, then the gradient or the Hessian might be erroneous.
Hess f(x)[d] must be a tangent vector at x.
If so, the following number is zero up to machine precision: 0.
If it is far from 0, the Hessian returns non-tangent vectors.
The Hessian at x must be linear on the tangent space at x.
If so, ||a*H[d1] + b*H[d2] - H[a*d1+b*d2]|| is zero up to machine precision.
    Value: 3.97354e-14 (norm of H[a*d1+b*d2]: 163.589)
If it is far from 0, then the Hessian is not linear.
The Hessian at x must be symmetric on the tangent space at x.
If so, <d1, H[d2]> - <H[d1], d2> is zero up to machine precision.
    Value: 21.2102 - 10.2558 = 10.9544.
If it is far from 0, then the Hessian is not symmetric.


!! Special message for this manifold

If all above looks good, great. If not, read on.

For this manifold, tangent vectors are represented
differently from their ambient space representation.
In practice, this means that when defining
v = problem.ehess(x, u), one may need to call
u = problem.M.tangent2ambient(x, u) first, so as to
transform u into an ambient vector, if this is more
convenient. The output of ehess should be an ambient
vector (it will be transformed to a tangent vector
automatically).