Manopt, doubly stochastic multinomial manifold

The doubly stochastic multinomial manifold $\mathcal{DP}_n$ (the set of entry-wise positive matrices of size nxn with columns and rows summing to 1) is endowed with a Riemannian manifold structure by considering it as a Riemannian submanifold of the embedding Euclidean space $\mathbb{R}^{n\times n}$ endowed with the Fisher information as inner product $\langle\xi_\mathbf{X},\eta_\mathbf{X}\rangle_\mathbf{X} = \sum\limits_{i=1}^n \sum\limits_{j=1}^n \cfrac{(\xi_\mathbf{X})_{ij} (\eta_\mathbf{X})_{ij}}{\mathbf{X}_{ij}}$. The geometry of the doubly stochastic multinomial manifold is available in the research paper [DH18].

Factory call: M = multinomialdoublystochasticfactory(n).

There is also a symmetric version of this factory: see multinomialsymmetricfactory.

Name	Formula	Numerical representation
Set	$\mathcal{DP}_n = \left\{ \mathbf{X} \in \mathbb{R}^{n \times n} \big\| \mathbf{X}_{ij} > 0,\ \mathbf{X}\mathbf{1}=\mathbf{1},\ \mathbf{X}^T\mathbf{1}=\mathbf{1} \right\}$	$X$ is represented as a matrix `X` of size nxn whose columns and rows sum to 1, i.e., `sum(X(:, i))= sum(X(i,:)) = 1` for `i = 1:n` .
Tangent space at $X$	$\mathcal{T}_{\mathbf{X}}\mathcal{D}\mathcal{P}_{n} = \left\{\mathbf{Z} \in \mathbb{R}^{n \times n} \big\| \mathbf{Z}\mathbf{1}=\mathbf{0},\ \mathbf{Z}^T\mathbf{1}=\mathbf{0} \right\}$	A tangent vector $Z$ at $X$ is represented as a matrix `Z` of size nxn such that each column and row of `Z` sum to 0, i.e., `sum(Z(:, i))= sum(Z(i,:)) = 0` for `i = 1:n`.
Ambient space	$\mathbb{R}^{n\times n}$	Points and vectors in the ambient space are, naturally, represented as matrices of size nxn.

The following table shows some of the nontrivial available functions in the structure M. The norm $\|\cdot\|$ refers to the norm in the ambient space, which is the Frobenius norm. The tutorial page gives more details about the functionality implemented by each function.

Name	Field usage	Formula
Dimension	`M.dim()`	$\operatorname{dim}\mathcal{M} = (n-1)^2$
Metric	`M.inner(X, U, V)`	$\langle U, V\rangle_X = \operatorname{trace}(U^T (V \oslash X)) = \sum\limits_{i,j=1}^n \cfrac{U_{ij} V_{ij}}{X_{ij}}$
Norm	`M.norm(X, U)`	$\\|U\\|_X = \sqrt{\langle U, U \rangle_X}$
Distance	`M.dist(X, Y)`	Not implemented
Typical distance	`M.typicaldist()`	$n$
Tangent space projector	`M.proj(X, H)`	$P_X(H)$ represents the orthogonal projection, in the Fisher information inner product sens, of the ambient point H onto the tangent space $\mathcal{T}_{\mathbf{X}}\mathcal{D}\mathcal{P}_{n}$.
Euclidean to Riemannian gradient	`M.egrad2rgrad(X, egrad)`	$\operatorname{grad} f(X) = P_X(\nabla f(X) \odot X)$, where `egrad` represents the Euclidean gradient $\nabla f(X)$, which is a vector in the ambient space.
Euclidean to Riemannian Hessian	`M.ehess2rhess(X, egrad, ehess, U)`	$\operatorname{Hess} f(X)[U]$ represents the Riemannian gradient, where `egrad` represents the Euclidean gradient $\nabla f(X)$ and `ehess` represents the Euclidean Hessian $\nabla^2 f(X)[U]$, both being vectors in the ambient space.
Retraction	`M.retr(X, U, t)`	$\operatorname{Retr}_X(tU)$ retracts the tangent vector U to the manifold, i.e., U is a doubly stochastic matrix.
Random point	`M.rand()`	Returns a point uniformly at random from the set of doubly stochastic matrices.
Random vector	`M.randvec(X)`	Returns a unit-norm tangent vector at $X$ with uniformly random direction, obtained as follows: generate $H$ with i.i.d. normal entries; return: $U = P_X(H) / \\|P_X(H)\\|$.
Vector transport	`M.transp(X, Y, U)`	$\operatorname{Transp}_{Y\leftarrow X}(U) = P_Y(U)$, where $U$ is a tangent vector at $X$ that is transported to the tangent space at $Y$.

Let $A\in\mathbb{R}^{n\times n}$ be a noisy version of a doubly stochastic matrix. We search for the doubly stochastic matrix that is closest to $A$ according to the Frobenius norm. We minimize the following cost function:

$$f(X) = \frac{1}{2} \|X-A\|^2,$$

such that $X \in \mathcal{DP}_n$. Compute the Euclidean gradient and Hessian of $f$:

$$\nabla f(X) = X-A,$$

$$\nabla^2 f(X)[U] = U.$$

The Riemannian gradient and Hessian are computed automatically from these.

% Generate the problem data.

n = 100 ; % Size of the matrix
sigma = 1/n^2 ; % Noise standard deviation at each entry

% Generate a doubly stochastic matrix using the Sinkhorn algorithm
B = doubly_stochastic(abs(randn(n, n))) ; 
% Adding noise to the matrix
A = max(B + sigma*randn(n, n),0.01) ;

% Denoising cost function and derivatives
cost = @(X) 0.5*norm(A-X, 'fro')^2;
egrad = @(X) X - A;  % Euclidean Gradient
ehess = @(X, U) U; % Euclidean Hessian

% Manifold initialization
manifold = multinomialdoublystochasticfactory(n) ;
problem.M = manifold;
problem.cost = cost;
problem.egrad = egrad;
problem.ehess = ehess;

% Numerically check the differentials.
checkgradient(problem); pause;
checkhessian(problem); pause; % Since the retraction is only first-order, the slope test is valid only at critical points.

X = trustregions(problem); % Solve the problem with a local optimization algorithm

For theory on Riemannian submanifolds, see [AMS08], section 3.6.1 (first-order derivatives) and section 5.3.3 (second-order derivatives, i.e., connections).

For content specifically about the doubly stochastic multinomial manifold with applications, see [DH18].

[AMS08] P.-A. Absil, R. Mahony and R. Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, 2008.
[DH18] A. Douik and B. Hassibi, Manifold Optimization Over the Set of Doubly Stochastic Matrices: A Second-Order Geometry, 2018.

A Riemannian geometry for the doubly stochastic multinomial manifold

Toolset

Example

References