The set $S_+(n, k)$ of rank-$k$ symmetric positive semidefinite matrices $X$ of size $n\times n$ is parametrized as $X = YY^T$ where $Y \in \mathbb{R}^{n\times k}_*$ is a full column rank matrix of size $n\times k$. Notice that for all orthogonal matrices $O$ of size $k$, the transformation $Y \rightarrow YO$ leaves $X$ unchanged. As a result, we may consider all matrices of the form $YO$ as being equivalent for our purpose. Quotienting this equivalence relation out results in a quotient space, $\mathbb{R}^{n \times k}_* / {\rm O}(k)$, with ${\rm O}(k)$ the group of orthogonal matrices. A point on the quotient space is (conceptually) an equivalence class $[Y] = \{YO : O \in {\rm O}(k)\}$. Numerically, we represent that equivalence class with (any) matrix $Y$ in $[Y]$.
A function $\phi : S_+(n,k) \to \mathbb{R}: X \mapsto \phi(X) $ induces a function $f: \mathbb{R}^{n \times k}_* \rightarrow \mathbb{R}: Y \mapsto f(Y)$ defined as $f(Y) = \phi(YY^T)$. Of course, $f$ has the appropriate invariance $f(Y) = f(YO)$ for all orthogonal $O$. Thus, $f$ is defined on the total space $\mathbb{R}^{n\times k}_*$ and descends as a well-defined function on the quotient space $\mathbb{R}^{n \times k}_* / {\rm O}(k)$. Endowing the total space with the usual Riemannian structure of a Euclidean space (the inner product $\langle H_1, H_2 \rangle = \operatorname{trace}(H_1^T H_2)$), a Riemannian structure for the quotient space follows.
See the paper [JBAS10] referenced below for details on this geometry, and [AMS08] for general background.
Factory call: M = symfixedrankYYfactory(n, k)
.
Name | Formula | Numerical representation |
Set | $S_+(n,k) = \{X \in \mathbb{R}^{n\times n} : X=X^T\succeq 0, {\rm rank}(X) = k\} \\ = \{ YY^T: Y \in \mathbb{R}^{n\times k}, {\rm rank}(Y) = k\}$ | $X$ is represented by means of $Y$ as a matrix Y of size $n\times k$. |
Vertical space at $Y$ | $\{ Y\Omega : \Omega \in \mathbb{R}^{k\times k}, \Omega + \Omega^T = 0 \}$ | A vertical vector $U$ at $Y$ is represented as a matrix U of size $n\times k$: it is tangent to the equivalence classes. |
Horizontal space at $Y$ | $\{ U \in \mathbb{R}^{n\times k}: U^T Y = Y^T U \}$ | A horizontal vector $U$ at $Y$ is represented as a matrix U of size $n\times k$: it is orthogonal to all vertical vectors. |
Ambient space | $\mathbb{R}^{n\times k}$ | Points and vectors in the ambient space are, naturally, represented as matrices of size $n\times k$. |
The following table shows some of the nontrivial available functions in the structure M
. The tutorial page gives more details about the functionality implemented by each function.
Name | Field usage | Formula |
Dimension | M.dim() |
$\operatorname{dim}\mathcal{M} = kn - k(k-1)/2$ |
Metric | M.inner(Y, U, V) |
$\langle U, V\rangle_Y = \operatorname{trace}(U^T V)$ |
Norm | M.norm(Y, U) |
$\|U\|_Y = \sqrt{\langle U, U \rangle_Y}$ |
Horizontal space projector | M.proj(Y, H) |
$P_Y(H) = H - Y\Omega$, where H represents a vector in the ambient space. $\Omega$ is a skew-symmetric matrix obtained as the unique solution to the Sylvester equation $\Omega (Y^T Y) + (Y^T Y)\Omega = Y^T H - H^T Y $ |
Euclidean to Riemannian gradient | M.egrad2rgrad(Y, egrad) |
$\operatorname{grad} f(Y) = \nabla f(Y)$, where egrad represents the Euclidean gradient $\nabla f(Y)$, which is a vector in the ambient space. There is indeed nothing to do, since the Euclidean gradient of $f$ is already a horizontal vector thanks to the fact that $f(Y) = f(YO)$ for all orthogonal $O$. |
Euclidean to Riemannian Hessian | M.ehess2rhess(Y, egrad, ehess, U) |
$\operatorname{Hess} f(Y)[U] = P_Y(\nabla^2 f(Y)[U])$, where egrad represents the Euclidean gradient $\nabla f(Y)$ (which happens to not be necessary in this case) and ehess represents the Euclidean Hessian $\nabla^2 f(Y)[U]$, both being vectors in the ambient space. |
Retraction | M.retr(Y, U, t) |
$\operatorname{Retr}_Y(tU) = Y + tU$ |
Random point | M.rand() |
Returns a point at random as follows: generate $Y$ with i.i.d. normal entries; return $Y$. With high probability, $Y$ will indeed be full-rank. |
Random vector | M.randvec(Y) |
Returns a horizontal vector at $Y$ with uniformly random direction, obtained as follows: generate $H$ with i.i.d. normal entries; return $P_Y(H)/\|P_Y(H)\|_Y$. |
Vector transport | M.transp(Y, Z, U) |
$\operatorname{Transp}_{Z\leftarrow Y}(U) = P_Z(U)$, where $U$ is a horizontal vector at $Y$ that is transported to the horizontal space at $Z$. |
This is an example of low-rank matrix approximation. Let $A\in\mathbb{R}^{n\times n}$ be a symmetric matrix. We minimize the following cost function:
$$\phi(X) =\| X - A \|_F ^2,$$
such that $X \in S_+(n, k)$. $X$ is factorized as $YY^T$ with $Y \in \mathbb{R}^{n \times k}_*$. The function $f:\mathbb{R}^{n \times k}_* \rightarrow \mathbb{R}$ is defined as $f(Y) = \| YY^T - A \|_F ^2 $. Compute the Euclidean gradient and Hessian of $f$:
$$\nabla f(Y) = 4(YY^T - A)Y,$$
$$\nabla^2 f(Y)[U] = 4(YU^T + UY^T)Y + 4(YY^T - A)U.$$
The Riemannian gradient and Hessian are obtained by applying the M.egrad2rgrad
and M.ehess2rhess
operators:
$$\operatorname{grad} f(x) = 4(YY^T - A)Y,$$
$$\operatorname{Hess} f(Y)[U] = \nabla^2 f(Y)[U] - Y\Omega$$ where $\Omega$ is obtained by solving the Sylvester equation, $$ \Omega (Y^T Y) + (Y^T Y )\Omega = Y^T \nabla^2 f(Y)[U] - \nabla^2 f(Y)[U] ^T Y. $$
The Sylvester equation can be solved efficiently using lyap
of MATLAB. Notice that there is no need to compute these explicitly: it suffices to write code for the Euclidean quantities and to apply the conversion tools on them to obtain the Riemannian quantities, as in the following code:
% Generate the problem data. n = 1000; k = 5; Y_org = randn(n,k); A = Y_org*Y_org'; % Create the problem structure. manifold = symfixedrankYYfactory(n,k); problem.M = manifold; % Define the problem cost function and its gradient.
problem.cost = @(Y) norm( Y*Y' - A, 'fro')^2;
egrad = @(Y) 4*(Y*Y' - A)*Y;
ehess = @(Y, U) 4*(Y*U' + U*Y')*Y + 4*(Y*Y' - A)*U; problem.grad = @(Y) manifold.egrad2rgrad(Y, egrad(Y));
problem.hess = @(Y, U) manifold.ehess2rhess(Y, egrad(Y), ehess(Y, U), U);
% Numerically check the differentials.
checkgradient(problem); pause;
checkhessian(problem); pause;
% Execute the optimization
Y = trustregions(problem);
We are still working on building a collection of examples for Manopt. The relevant ones will be referenced here when the time comes.