The doubly stochastic multinomial manifold $\mathcal{DP}_n$ (the set of entry-wise positive matrices of size nxn with columns and rows summing to 1) is endowed with a Riemannian manifold structure by considering it as a Riemannian submanifold of the embedding Euclidean space $\mathbb{R}^{n\times n}$ endowed with the Fisher information as inner product $\langle\xi_\mathbf{X},\eta_\mathbf{X}\rangle_\mathbf{X} = \sum\limits_{i=1}^n \sum\limits_{j=1}^n \cfrac{(\xi_\mathbf{X})_{ij} (\eta_\mathbf{X})_{ij}}{\mathbf{X}_{ij}}$. The geometry of the doubly stochastic multinomial manifold is available in the research paper [DH18].
Factory call: M = multinomialdoublystochasticfactory(n)
.
There is also a symmetric version of this factory: see multinomialsymmetricfactory
.
Name | Formula | Numerical representation |
Set | $\mathcal{DP}_n = \left\{ \mathbf{X} \in \mathbb{R}^{n \times n} \big| \mathbf{X}_{ij} > 0,\ \mathbf{X}\mathbf{1}=\mathbf{1},\ \mathbf{X}^T\mathbf{1}=\mathbf{1} \right\}$ | $X$ is represented as a matrix X of size nxn whose columns
and rows sum to 1, i.e., sum(X(:, i))= sum(X(i,:)) =
1 for i = 1:n . |
Tangent space at $X$ | $\mathcal{T}_{\mathbf{X}}\mathcal{D}\mathcal{P}_{n} = \left\{\mathbf{Z} \in \mathbb{R}^{n \times n} \big| \mathbf{Z}\mathbf{1}=\mathbf{0},\ \mathbf{Z}^T\mathbf{1}=\mathbf{0} \right\}$ | A tangent vector $Z$ at $X$ is represented as a matrix Z
of size nxn
such that each column and row of Z sum to 0,
i.e., sum(Z(:, i))= sum(Z(i,:)) = 0 for i
= 1:n . |
Ambient space | $\mathbb{R}^{n\times n}$ | Points and vectors in the ambient space are, naturally, represented as matrices of size nxn. |
The following table shows some of the nontrivial available
functions in the structure M
. The norm $\|\cdot\|$
refers to the norm in the ambient space, which is the Frobenius
norm. The tutorial page
gives more details about the functionality implemented by each
function.
Name | Field usage | Formula |
Dimension | M.dim() |
$\operatorname{dim}\mathcal{M} = (n-1)^2$ |
Metric | M.inner(X, U, V) |
$\langle U, V\rangle_X = \operatorname{trace}(U^T (V \oslash X)) = \sum\limits_{i,j=1}^n \cfrac{U_{ij} V_{ij}}{X_{ij}}$ |
Norm | M.norm(X, U) |
$\|U\|_X = \sqrt{\langle U, U \rangle_X}$ |
Distance | M.dist(X, Y) |
Not implemented |
Typical distance | M.typicaldist() |
$n$ |
Tangent space projector | M.proj(X, H) |
$P_X(H)$ represents the orthogonal projection, in the Fisher information inner product sens, of the ambient point H onto the tangent space $\mathcal{T}_{\mathbf{X}}\mathcal{D}\mathcal{P}_{n}$. |
Euclidean to Riemannian gradient | M.egrad2rgrad(X, egrad) |
$\operatorname{grad} f(X) = P_X(\nabla f(X) \odot X)$,
where egrad represents the Euclidean gradient
$\nabla f(X)$, which is a vector in the ambient space. |
Euclidean to Riemannian Hessian | M.ehess2rhess(X, egrad, ehess, U) |
$\operatorname{Hess} f(X)[U]$ represents the Riemannian
gradient, where egrad represents the Euclidean
gradient $\nabla f(X)$ and ehess represents
the Euclidean Hessian $\nabla^2 f(X)[U]$, both being vectors
in the ambient space. |
Retraction | M.retr(X, U, t) |
$\operatorname{Retr}_X(tU)$ retracts the tangent vector U to the manifold, i.e., U is a doubly stochastic matrix. |
Random point | M.rand() |
Returns a point uniformly at random from the set of doubly stochastic matrices. |
Random vector | M.randvec(X) |
Returns a unit-norm tangent vector at $X$ with uniformly random direction, obtained as follows: generate $H$ with i.i.d. normal entries; return: $U = P_X(H) / \|P_X(H)\|$. |
Vector transport | M.transp(X, Y, U) |
$\operatorname{Transp}_{Y\leftarrow X}(U) = P_Y(U)$, where $U$ is a tangent vector at $X$ that is transported to the tangent space at $Y$. |
Let $A\in\mathbb{R}^{n\times n}$ be a noisy version of a doubly stochastic matrix. We search for the doubly stochastic matrix that is closest to $A$ according to the Frobenius norm. We minimize the following cost function:
$$f(X) = \frac{1}{2} \|X-A\|^2,$$
such that $X \in \mathcal{DP}_n$. Compute the Euclidean gradient and Hessian of $f$:
$$\nabla f(X) = X-A,$$
$$\nabla^2 f(X)[U] = U.$$
The Riemannian gradient and Hessian are computed automatically from these.
% Generate the problem data. n = 100 ; % Size of the matrix sigma = 1/n^2 ; % Noise standard deviation at each entry % Generate a doubly stochastic matrix using the Sinkhorn algorithm B = doubly_stochastic(abs(randn(n, n))) ; % Adding noise to the matrix A = max(B + sigma*randn(n, n),0.01) ; % Denoising cost function and derivatives cost = @(X) 0.5*norm(A-X, 'fro')^2; egrad = @(X) X - A; % Euclidean Gradient ehess = @(X, U) U; % Euclidean Hessian % Manifold initialization manifold = multinomialdoublystochasticfactory(n) ; problem.M = manifold; problem.cost = cost; problem.egrad = egrad; problem.ehess = ehess; % Numerically check the differentials. checkgradient(problem); pause; checkhessian(problem); pause; % Since the retraction is only first-order, the slope test is valid only at critical points.
X = trustregions(problem); % Solve the problem with a local optimization algorithm
For theory on Riemannian submanifolds, see [AMS08], section 3.6.1 (first-order derivatives) and section 5.3.3 (second-order derivatives, i.e., connections).
For content specifically about the doubly stochastic multinomial manifold with applications, see [DH18].
[AMS08] P.-A. Absil, R. Mahony and R. Sepulchre, Optimization Algorithms
on Matrix Manifolds, Princeton University Press, 2008.
[DH18] A. Douik and B. Hassibi,
Manifold Optimization Over the Set of Doubly Stochastic
Matrices: A Second-Order Geometry, 2018.