Rotation matrices: the special orthogonal group

Comment about the finer points of how to use rotationsfactory in Manopt

The group of rotations in \(\Rn\) is the special orthogonal group \[ \SOn = \{ X \in \Rnn : X\transpose X = I_n \textrm{ and } \det(X) = 1 \}. \] This is one of the two connected components of the orthogonal group (matrices in the other component satisfy \(\det(X) = -1\)).

Differentiating the constraint \(X\transpose X = I_n\) along \(\dot X\), we obtain the defining equation for the tangent space at \(X\). Explicitly, \[ \dot X\transpose X + X\transpose \dot X = 0. \] Thus, a tangent vector \(\dot X\) is a matrix in \(\Rnn\) such that \(X\transpose \dot X\) is skew-symmetric. Equivalently, \(\dot X\) is of the form \(X\Omega\) where \(\Omega\) is a skew-symmetric matrix in \(\Rnn\). Accordingly, the tangent space at \(X\) is the following subspace of \(\Rnn\): \[ \T_X \SOn = \{ X\Omega : \Omega + \Omega\transpose = 0 \}. \] Since \(X\) is fixed, we see that in order to identify a tangent vector \(\dot X\) it is equivalent to specify either \(\dot X\) itself (as a matrix of size \(n \times n\)) or \(\Omega\) (a skew-symmetric matrix of size \(n \times n\), with the understanding that \(\Omega\) represents \(\dot X = X\Omega\)).

In practice, it is common that the first thing we need to compute with a tangent vector is the product \(X\transpose \dot X\), which happens to be \(\Omega\). For this reason, in Manopt’s rotationsfactory, tangent vectors are stored as \(\Omega\), not as \(\dot X\).

In constrast, we still see \(\SOn\) as an embedded submanifold of \(\Rnn\), and all objects in \(\Rnn\) are represented as themselves (matrices of size \(n \times n\)).

This is important to keep in mind whenever we work with a matrix that comes from or that we provide to rotationsfactory. We must ask: is this a tangent vector at \(X\) (represented as \(\Omega\)), or is this in the embedding space (represented as itself)? Explicitly:

problem.grad(X) is the Riemannian gradient at \(X\); this is a tangent vector at \(X\), so the output is just the skew-symmetric factor.
problem.egrad(X) is the Euclidean gradient at \(X\); this is in the embedding space, so the output is the matrix itself.
problem.hess(X, Omg) is the Riemannian Hessian at \(X\) along \(\dot X = X\Omega\); the input \(\Omega\) represents the tangent vector \(\dot X\): we only receive the skew-symmetric factor as input; the output is a tangent vector at \(X\), say, \(X\Xi\), and we only output the skew-symmetric factor, so, \(\Xi\).
problem.ehess(X, Omg) is the Euclidean Hessian at \(X\) along \(\dot X = X\Omega\); the input \(\Omega\) is still just the skew-symmetric representation of the tangent vector \(\dot X\); however, now the output is in the embedding space, so it should be the matrix itself.

The manifold structure M = rotationsfactory(n) provides convenient tools to work with these representations:

Omg = M.proj(X, Z) takes a full matrix \(Z \in \Rnn\) (in the embedding space) and projects it to \(\T_X\SOn\). The output is in the tangent space at \(X\), so it is just the skew-symmetric factor \(\Omega\) that represents \(\dot X = X\Omega\). The actual projection of \(Z\) to \(\T_X\SOn\) is \(\dot X\).
Omg = M.tangent(X, Omg) takes a tangent vector \(\dot X\) (represented as a skew-symmetric matrix \(\Omega\)) and “tangentializes” it; in effect, it just ensures that the matrix indeed encodes a tangent vector up to machine precision. It does so by extracting the skew-symmetric part of the stored matrix: \(\Omega \leftarrow \frac{1}{2}(\Omega - \Omega^\top)\). If \(\Omega\) is already skew-symmetric, this has no effect. The main use of this tool is to make up for numerical errors.
Xdot = M.tangent2ambient(X, Omg) takes the representation \(\Omega\) of a tangent vector at \(X\) and outputs the matrix \(\dot X = X\Omega\), that is, the matrix in the embedding space that corresponds to (that is represented by) the input \((X, \Omega)\).

As an example, consider the following optimization problem over rotation matrices. Let \(\bar f \colon \Rnn \to \reals\) be defined by \[ \bar f(X) = \frac{1}{2}\sqfrobnorm{AXB - C}, \] where \(A, B, C\) are given matrices in \(\Rnn\). We wish to minimize \(f \colon \SOn \to \reals\), where \(f = \bar f|_{\SOn}\) is simply the restriction of \(\bar f\) to \(\SOn\).

The Euclidean gradient and Hessian of \(\bar f\) are as follows: \[ \grad \bar f(X) = A\transpose ( AXB - C ) B\transpose \] and, for all \(Z \in \Rnn\), \[ \Hess \bar f(X)[Z] = A\transpose ( AZB ) B\transpose. \] These can be converted to the Riemannian gradient and Hessian of \(f\) in the usual way: see for example Section 7.4 in this book. Manopt does it automatically though: we just need to provide \(\grad \bar f\) and \(\Hess \bar f\) in problem.egrad and problem.ehess.

Here is how we would implement this in Manopt using rotationsfactory to handle \(\SOn\). The tabs below provide two things:

The code to define the problem with cost, egrad and two versions of ehess:
- An incorrect one, which we might write if unaware of the subtleties discussed above; then
- A correct one, where we transform Omg from tangent space to embedding space, that is, from Omg (skew symmetric) to X*Omg.
The messages printed by checkhessian for the incorrect Hessian.

Manopt code
checkhessian hints

n = 8;
A = randn(n, n);
B = randn(n, n);
C = randn(n, n);

SOn = rotationsfactory(n);
problem.M = SOn;
problem.cost = @(X) .5*norm(A*X*B - C, 'fro')^2; % f(X)
problem.egrad = @(X) A'*(A*X*B - C)*B';          % grad f(X) in R^(nxn)

% Wrong.
% This code treats the second input of ehess as if it were in
% the embedding space. Optimization algorithms won't be amused.
problem.ehess = @(X, Xdot) A'*(A*Xdot*B)*B';

% Correct.
% This code notes that the second input to ehess is a representation of
% a tangent vector. It transforms Omg to X*Omg, which is the direction
% that is actually encoded by Omg, and which we often call Xdot.
% Equivalently, we can call SOn.tangent2ambient(X, Omg) instead of X*Omg.
problem.ehess = @(X, Omg) A'*(A*X*Omg*B)*B';

figure(1); clf;
checkgradient(problem);

figure(2); clf;
checkhessian(problem);

X = trustregions(problem);

This is the text output for checkhessian when using the wrong code for problem.ehess. Notice that this wrong ehess fails both the slope test and the symmetry test.

# Hessian check
The slope should be 3. It appears to be: 2.00001.
If it is far from 3, then the gradient or the Hessian might be erroneous.
Hess f(x)[d] must be a tangent vector at x.
If so, the following number is zero up to machine precision: 0.
If it is far from 0, the Hessian returns non-tangent vectors.
The Hessian at x must be linear on the tangent space at x.
If so, ||a*H[d1] + b*H[d2] - H[a*d1+b*d2]|| is zero up to machine precision.
    Value: 3.97354e-14 (norm of H[a*d1+b*d2]: 163.589)
If it is far from 0, then the Hessian is not linear.
The Hessian at x must be symmetric on the tangent space at x.
If so, <d1, H[d2]> - <H[d1], d2> is zero up to machine precision.
    Value: 21.2102 - 10.2558 = 10.9544.
If it is far from 0, then the Hessian is not symmetric.


!! Special message for this manifold

If all above looks good, great. If not, read on.

For this manifold, tangent vectors are represented
differently from their ambient space representation.
In practice, this means that when defining
v = problem.ehess(x, u), one may need to call
u = problem.M.tangent2ambient(x, u) first, so as to
transform u into an ambient vector, if this is more
convenient. The output of ehess should be an ambient
vector (it will be transformed to a tangent vector
automatically).