Manifold struct to optimize fixed-rank matrices w/ an embedded geometry. function M = fixedrankembeddedfactory(m, n, k) Manifold of m-by-n real matrices of fixed rank k. This follows the embedded geometry described in Bart Vandereycken's 2013 paper: "Low-rank matrix completion by Riemannian optimization". Paper link: http://arxiv.org/pdf/1209.3834.pdf A point X on the manifold is represented as a structure with three fields: U, S and V. The matrices U (mxk) and V (nxk) are orthonormal, while the matrix S (kxk) is any /diagonal/ full-rank matrix. Following the SVD formalism, X = U*S*V'. Note that the diagonal entries of S are not constrained to be nonnegative. Tangent vectors are represented as a structure with three fields: Up, M and Vp. The matrices Up (mxk) and Vp (mxk) obey Up'*U = 0 and Vp'*V = 0. The matrix M (kxk) is arbitrary. Such a structure corresponds to the following tangent vector in the ambient space of mxn matrices: Z = U*M*V' + Up*V' + U*Vp' where (U, S, V) is the current point and (Up, M, Vp) is the tangent vector at that point. Vectors in the ambient space are best represented as mxn matrices. If these are low-rank, they may also be represented as structures with U, S, V fields, such that Z = U*S*V'. There are no restrictions on what U, S and V are, as long as their product as indicated yields a real, mxn matrix. The chosen geometry yields a Riemannian submanifold of the embedding space R^(mxn) equipped with the usual trace (Frobenius) inner product. The tools X_triplet = M.matrix2triplet(X_matrix) and X_matrix = M.triplet2matrix(X_triplet) can be used to easily convert between full matrix representation (as a matrix of size mxn) and triplet representation as a structure with fields U, S, V. The tool matrix2triplet also accepts an optional second input r to choose the rank of the triplet representation. By default, r = k. If the input matrix X_matrix has rank more than r, the triplet represents its best rank-r approximation in the Frobenius norm (truncated SVD). Note that these conversions are computationally expensive for large m and n: this is mostly useful for small matrices and for prototyping. Please cite the Manopt paper as well as the research paper: @Article{vandereycken2013lowrank, Title = {Low-rank matrix completion by {Riemannian} optimization}, Author = {Vandereycken, B.}, Journal = {SIAM Journal on Optimization}, Year = {2013}, Number = {2}, Pages = {1214--1236}, Volume = {23}, Doi = {10.1137/110845768} } See also: fixedrankfactory_2factors fixedrankfactory_3factors fixedranktensorembeddedfactory
0001 function M = fixedrankembeddedfactory(m, n, k) 0002 % Manifold struct to optimize fixed-rank matrices w/ an embedded geometry. 0003 % 0004 % function M = fixedrankembeddedfactory(m, n, k) 0005 % 0006 % Manifold of m-by-n real matrices of fixed rank k. This follows the 0007 % embedded geometry described in Bart Vandereycken's 2013 paper: 0008 % "Low-rank matrix completion by Riemannian optimization". 0009 % 0010 % Paper link: http://arxiv.org/pdf/1209.3834.pdf 0011 % 0012 % A point X on the manifold is represented as a structure with three 0013 % fields: U, S and V. The matrices U (mxk) and V (nxk) are orthonormal, 0014 % while the matrix S (kxk) is any /diagonal/ full-rank matrix. 0015 % Following the SVD formalism, X = U*S*V'. Note that the diagonal entries 0016 % of S are not constrained to be nonnegative. 0017 % 0018 % Tangent vectors are represented as a structure with three fields: Up, M 0019 % and Vp. The matrices Up (mxk) and Vp (mxk) obey Up'*U = 0 and Vp'*V = 0. 0020 % The matrix M (kxk) is arbitrary. Such a structure corresponds to the 0021 % following tangent vector in the ambient space of mxn matrices: 0022 % Z = U*M*V' + Up*V' + U*Vp' 0023 % where (U, S, V) is the current point and (Up, M, Vp) is the tangent 0024 % vector at that point. 0025 % 0026 % Vectors in the ambient space are best represented as mxn matrices. If 0027 % these are low-rank, they may also be represented as structures with 0028 % U, S, V fields, such that Z = U*S*V'. There are no restrictions on what 0029 % U, S and V are, as long as their product as indicated yields a real, mxn 0030 % matrix. 0031 % 0032 % The chosen geometry yields a Riemannian submanifold of the embedding 0033 % space R^(mxn) equipped with the usual trace (Frobenius) inner product. 0034 % 0035 % The tools 0036 % X_triplet = M.matrix2triplet(X_matrix) and 0037 % X_matrix = M.triplet2matrix(X_triplet) 0038 % can be used to easily convert between full matrix representation (as a 0039 % matrix of size mxn) and triplet representation as a structure with fields 0040 % U, S, V. The tool matrix2triplet also accepts an optional second input r 0041 % to choose the rank of the triplet representation. By default, r = k. If 0042 % the input matrix X_matrix has rank more than r, the triplet represents 0043 % its best rank-r approximation in the Frobenius norm (truncated SVD). 0044 % Note that these conversions are computationally expensive for large m 0045 % and n: this is mostly useful for small matrices and for prototyping. 0046 % 0047 % 0048 % Please cite the Manopt paper as well as the research paper: 0049 % @Article{vandereycken2013lowrank, 0050 % Title = {Low-rank matrix completion by {Riemannian} optimization}, 0051 % Author = {Vandereycken, B.}, 0052 % Journal = {SIAM Journal on Optimization}, 0053 % Year = {2013}, 0054 % Number = {2}, 0055 % Pages = {1214--1236}, 0056 % Volume = {23}, 0057 % Doi = {10.1137/110845768} 0058 % } 0059 % 0060 % See also: fixedrankfactory_2factors fixedrankfactory_3factors fixedranktensorembeddedfactory 0061 0062 % This file is part of Manopt: www.manopt.org. 0063 % Original author: Nicolas Boumal, Dec. 30, 2012. 0064 % Contributors: Bart Vandereycken, Eitan Levin 0065 % Change log: 0066 % 0067 % Feb. 20, 2014 (NB): 0068 % Added function tangent to work with checkgradient. 0069 % 0070 % June 24, 2014 (NB): 0071 % A couple modifications following Bart's feedback: 0072 % - The checksum (hash) was replaced for a faster alternative: it's a 0073 % bit less "safe" in that collisions could arise with higher 0074 % probability, but they're still very unlikely. 0075 % - The vector transport was changed. 0076 % The typical distance was also modified, hopefully giving the 0077 % trustregions method a better initial guess for the trust-region 0078 % radius, but that should be tested for different cost functions too. 0079 % 0080 % July 11, 2014 (NB): 0081 % Added ehess2rhess and tangent2ambient, supplied by Bart. 0082 % 0083 % July 14, 2014 (NB): 0084 % Added vec, mat and vecmatareisometries so that hessianspectrum now 0085 % works with this geometry. Implemented the tangent function. 0086 % Made it clearer in the code and in the documentation in what format 0087 % ambient vectors may be supplied, and generalized some functions so 0088 % that they should now work with both accepted formats. 0089 % It is now clearly stated that for a point X represented as a 0090 % triplet (U, S, V), the matrix S needs to be diagonal. 0091 % 0092 % Sep. 6, 2018 (NB): 0093 % Removed M.exp() as it was not implemented. 0094 % 0095 % March 20, 2019 (NB): 0096 % Added M.matrix2triplet and M.triplet2matrix to allow easy 0097 % conversion between matrix representations either as full matrices 0098 % or as triplets (U, S, V). 0099 % 0100 % Dec. 14, 2019 (EL): 0101 % The original retraction code was repaced with a somewhat slower but 0102 % numerically more stable version. With the original code, trouble 0103 % could arise when the matrices Up, Vp defining the tangent vector 0104 % being retracted were ill-conditioned. 0105 % 0106 % Jan. 28, 2020 (NB): 0107 % In retraction code, moved parameter t around to highlight the fact 0108 % that it comes up in only one computation. 0109 % Replaced vec/mat codes: they are still isometries, but they produce 0110 % representations of length k*(m+n+k) instead of m*n, which is much 0111 % more efficient: it only exceeds the true dimension by 2k^2. Also, 0112 % mat does not attempt to project to the tangent space (which it did 0113 % before but shouldn't): compose mat with tangent if that is the 0114 % desired effect. 0115 0116 M.name = @() sprintf('Manifold of %dx%d matrices of rank %d', m, n, k); 0117 0118 M.dim = @() (m+n-k)*k; 0119 0120 M.inner = @(x, d1, d2) d1.M(:).'*d2.M(:) + d1.Up(:).'*d2.Up(:) ... 0121 + d1.Vp(:).'*d2.Vp(:); 0122 0123 M.norm = @(x, d) sqrt(norm(d.M, 'fro')^2 + norm(d.Up, 'fro')^2 ... 0124 + norm(d.Vp, 'fro')^2); 0125 0126 M.dist = @(x, y) error('fixedrankembeddedfactory.dist not implemented yet.'); 0127 0128 M.typicaldist = @() M.dim(); 0129 0130 % Given Z in tangent vector format, projects the components Up and Vp 0131 % such that they satisfy the tangent space constraints up to numerical 0132 % errors. If Z was indeed a tangent vector at X, this should barely 0133 % affect Z (it would not at all if we had infinite numerical accuracy). 0134 M.tangent = @tangent; 0135 function Z = tangent(X, Z) 0136 Z.Up = Z.Up - X.U*(X.U'*Z.Up); 0137 Z.Vp = Z.Vp - X.V*(X.V'*Z.Vp); 0138 end 0139 0140 % For a given ambient vector Z, applies it to a matrix W. If Z is given 0141 % as a matrix, this is straightforward. If Z is given as a structure 0142 % with fields U, S, V such that Z = U*S*V', the product is executed 0143 % efficiently. 0144 function ZW = apply_ambient(Z, W) 0145 if ~isstruct(Z) 0146 ZW = Z*W; 0147 else 0148 ZW = Z.U*(Z.S*(Z.V'*W)); 0149 end 0150 end 0151 0152 % Same as apply_ambient, but applies Z' to W. 0153 function ZtW = apply_ambient_transpose(Z, W) 0154 if ~isstruct(Z) 0155 ZtW = Z'*W; 0156 else 0157 ZtW = Z.V*(Z.S'*(Z.U'*W)); 0158 end 0159 end 0160 0161 % Orthogonal projection of an ambient vector Z represented as an mxn 0162 % matrix or as a structure with fields U, S, V to the tangent space at 0163 % X, in a tangent vector structure format. 0164 M.proj = @projection; 0165 function Zproj = projection(X, Z) 0166 0167 ZV = apply_ambient(Z, X.V); 0168 UtZV = X.U'*ZV; 0169 ZtU = apply_ambient_transpose(Z, X.U); 0170 0171 Zproj.M = UtZV; 0172 Zproj.Up = ZV - X.U*UtZV; 0173 Zproj.Vp = ZtU - X.V*UtZV'; 0174 0175 end 0176 0177 M.egrad2rgrad = @projection; 0178 0179 % Code supplied by Bart. 0180 % Given the Euclidean gradient at X and the Euclidean Hessian at X 0181 % along H, where egrad and ehess are vectors in the ambient space and H 0182 % is a tangent vector at X, returns the Riemannian Hessian at X along 0183 % H, which is a tangent vector. 0184 M.ehess2rhess = @ehess2rhess; 0185 function rhess = ehess2rhess(X, egrad, ehess, H) 0186 0187 % Euclidean part 0188 rhess = projection(X, ehess); 0189 0190 % Curvature part 0191 T = apply_ambient(egrad, H.Vp)/X.S; 0192 rhess.Up = rhess.Up + (T - X.U*(X.U'*T)); 0193 T = apply_ambient_transpose(egrad, H.Up)/X.S; 0194 rhess.Vp = rhess.Vp + (T - X.V*(X.V'*T)); 0195 0196 end 0197 0198 % Transforms a tangent vector Z represented as a structure (Up, M, Vp) 0199 % into a structure with fields (U, S, V) that represents that same 0200 % tangent vector in the ambient space of mxn matrices, as U*S*V'. 0201 % This matrix is equal to X.U*Z.M*X.V' + Z.Up*X.V' + X.U*Z.Vp'. The 0202 % latter is an mxn matrix, which could be too large to build 0203 % explicitly, and this is why we return a low-rank representation 0204 % instead. Note that there are no guarantees on U, S and V other than 0205 % that USV' is the desired matrix. In particular, U and V are not (in 0206 % general) orthonormal and S is not (in general) diagonal. 0207 % (In this implementation, S is identity, but this might change.) 0208 M.tangent2ambient_is_identity = false; 0209 M.tangent2ambient = @tangent2ambient; 0210 function Zambient = tangent2ambient(X, Z) 0211 Zambient.U = [X.U*Z.M + Z.Up, X.U]; 0212 Zambient.S = eye(2*k); 0213 Zambient.V = [X.V, Z.Vp]; 0214 end 0215 0216 % This retraction is second order, following general results from 0217 % Absil, Malick, "Projection-like retractions on matrix manifolds", 0218 % SIAM J. Optim., 22 (2012), pp. 135-158. 0219 % 0220 % Notice that this retraction is only locally smooth. Indeed, the 0221 % following code exhibits a discontinuity when retracting from 0222 % X = [1 0 ; 0 0] along V = [0 0 ; 0 1]: 0223 % 0224 % M = fixedrankembeddedfactory(2, 2, 1); 0225 % X = struct('U', [1;0], 'V', [1;0], 'S', 1); 0226 % V = struct('Up', [0;1], 'Vp', [0;1], 'M', 1); 0227 % entry = @(M) M(1, 1); 0228 % mat = @(X) X.U*X.S*X.V'; 0229 % g = @(t) entry(mat(M.retr(X, V, t))); 0230 % ezplot(g, [-2, 2]); 0231 % 0232 M.retr = @retraction; 0233 function Y = retraction(X, Z, t) 0234 if nargin < 3 0235 t = 1.0; 0236 end 0237 0238 % Mathematically, Z.Up is orthogonal to X.U, and likewise for 0239 % Z.Vp compared to X.V. Thus, in principle, we could call QR 0240 % on Z.Up and Z.Vp alone, which should be about 4 times faster 0241 % than the calls here where we orthonormalize twice as many 0242 % vectors. However, when Z.Up, Z.Vp are poorly conditioned, 0243 % orthonormalizing them can lead to loss of orthogonality 0244 % against X.U, X.V. 0245 [Qu, Ru] = qr([X.U, Z.Up], 0); 0246 [Qv, Rv] = qr([X.V, Z.Vp], 0); 0247 0248 % Calling svds or svd should yield the same result, but BV 0249 % advocated svd is more robust, and it doesn't change the 0250 % asymptotic complexity to call svd then trim rather than call 0251 % svds. Also, apparently Matlab calls ARPACK in a suboptimal way 0252 % for svds in this scenario. 0253 % Notice that the parameter t appears only here. Thus, in princple, 0254 % we could make some savings for line-search procedures where we 0255 % retract the same vector multiple times, only with different 0256 % values of t. The asymptotic complexity remains the same though 0257 % (up to a constant factor) because of the matrix-matrix products 0258 % below which cost essentially the same as the QR factorizations. 0259 [U, S, V] = svd(Ru*[X.S + t*Z.M, t*eye(k); t*eye(k), zeros(k)]*Rv'); 0260 0261 Y.U = Qu*U(:, 1:k); 0262 Y.V = Qv*V(:, 1:k); 0263 Y.S = S(1:k, 1:k); 0264 0265 % Equivalent but very slow code 0266 % [U, S, V] = svds(X.U*X.S*X.V' + t*(X.U*Z.M*X.V' + Z.Up*X.V' + X.U*Z.Vp'), k); 0267 % Y.U = U; Y.V = V; Y.S = S; 0268 end 0269 0270 0271 % Orthographic retraction provided by Teng Zhang. One interest of the 0272 % orthographic retraction is that if matrices are represented in full 0273 % size, it can be computed without any SVDs. If for an application it 0274 % makes sense to represent the matrices in full size, this may be a 0275 % good idea, but it won't shine in the present implementation of the 0276 % manifold. 0277 M.retr_ortho = @retraction_orthographic; 0278 function Y = retraction_orthographic(X, Z, t) 0279 if nargin < 3 0280 t = 1.0; 0281 end 0282 0283 % First, write Y (the output) as U1*S0*V1', where U1 and V1 are 0284 % orthogonal matrices and S0 is of size r by r. 0285 [U1, ~] = qr(t*(X.U*Z.M + Z.Up) + X.U*X.S, 0); 0286 [V1, ~] = qr(t*(X.V*Z.M' + Z.Vp) + X.V*X.S, 0); 0287 S0 = (U1'*X.U)*(X.S + t*Z.M)*(X.V'*V1) ... 0288 + t*((U1'*Z.Up)*(X.V'*V1) + (U1'*X.U)*(Z.Vp'*V1)); 0289 0290 % Then, obtain the singular value decomposition of Y. 0291 [U2, S2, V2] = svd(S0); 0292 Y.U = U1*U2; 0293 Y.S = S2; 0294 Y.V = V1*V2; 0295 0296 end 0297 0298 0299 % Less safe but much faster checksum, June 24, 2014. 0300 % Older version right below. 0301 M.hash = @(X) ['z' hashmd5([sum(X.U(:)) ; sum(X.S(:)); sum(X.V(:)) ])]; 0302 %M.hash = @(X) ['z' hashmd5([X.U(:) ; X.S(:) ; X.V(:)])]; 0303 0304 M.rand = @random; 0305 % Factors U, V live on Stiefel manifolds: reuse their random generator. 0306 stiefelm = stiefelfactory(m, k); 0307 stiefeln = stiefelfactory(n, k); 0308 function X = random() 0309 X.U = stiefelm.rand(); 0310 X.V = stiefeln.rand(); 0311 X.S = diag(sort(rand(k, 1), 1, 'descend')); 0312 end 0313 0314 % Generate a random tangent vector at X. 0315 % Note: this may not be the uniform distribution over the set of 0316 % unit-norm tangent vectors. 0317 M.randvec = @randomvec; 0318 function Z = randomvec(X) 0319 Z.M = randn(k); 0320 Z.Up = randn(m, k); 0321 Z.Vp = randn(n, k); 0322 Z = tangent(X, Z); 0323 nrm = M.norm(X, Z); 0324 Z.M = Z.M / nrm; 0325 Z.Up = Z.Up / nrm; 0326 Z.Vp = Z.Vp / nrm; 0327 end 0328 0329 M.lincomb = @lincomb; 0330 0331 M.zerovec = @(X) struct('M', zeros(k, k), 'Up', zeros(m, k), ... 0332 'Vp', zeros(n, k)); 0333 0334 % New vector transport on June 24, 2014 (as indicated by Bart) 0335 % Reference: Absil, Mahony, Sepulchre 2008 section 8.1.3: 0336 % For Riemannian submanifolds of a Euclidean space, it is acceptable to 0337 % transport simply by orthogonal projection of the tangent vector 0338 % translated in the ambient space. 0339 M.transp = @project_tangent; 0340 function Z2 = project_tangent(X1, X2, Z1) 0341 Z2 = projection(X2, tangent2ambient(X1, Z1)); 0342 end 0343 0344 % The function 'vec' is isometric from the tangent space at X to real 0345 % vectors of length k(m+n+k). The function 'mat' is the left-inverse 0346 % of 'vec'. It is sometimes useful to apply 'tangent' to the output of 0347 % 'mat'. 0348 M.vec = @vec; 0349 function Zvec = vec(X, Z) %#ok<INUSL> 0350 A = Z.M; 0351 B = Z.Up; 0352 C = Z.Vp; 0353 Zvec = [A(:) ; B(:) ; C(:)]; 0354 end 0355 rangeM = 1:(k^2); 0356 rangeUp = (k^2)+(1:(m*k)); 0357 rangeVp = (k^2+m*k)+(1:(n*k)); 0358 M.mat = @(X, Zvec) struct('M', reshape(Zvec(rangeM), [k, k]), ... 0359 'Up', reshape(Zvec(rangeUp), [m, k]), ... 0360 'Vp', reshape(Zvec(rangeVp), [n, k])); 0361 M.vecmatareisometries = @() true; 0362 0363 0364 % It is sometimes useful to switch between representation of matrices 0365 % as triplets or as full matrices of size m x n. The function to 0366 % convert a matrix to a triplet, matrix2triplet, allows to specify the 0367 % rank of the representation. By default, it is equal to k. Omit the 0368 % second input (or set to inf) to get a full SVD triplet (in economy 0369 % format). If so, the resulting triplet does not represent a point on 0370 % the manifold. 0371 M.matrix2triplet = @matrix2triplet; 0372 function X_triplet = matrix2triplet(X_matrix, r) 0373 if ~exist('r', 'var') || isempty(r) || r <= 0 0374 r = k; 0375 end 0376 if r < min(m, n) 0377 [U, S, V] = svds(X_matrix, r); 0378 else 0379 [U, S, V] = svd(X_matrix, 'econ'); 0380 end 0381 X_triplet.U = U; 0382 X_triplet.S = S; 0383 X_triplet.V = V; 0384 end 0385 M.triplet2matrix = @triplet2matrix; 0386 function X_matrix = triplet2matrix(X_triplet) 0387 U = X_triplet.U; 0388 S = X_triplet.S; 0389 V = X_triplet.V; 0390 X_matrix = U*S*V'; 0391 end 0392 0393 end 0394 0395 % Linear combination of tangent vectors 0396 function d = lincomb(x, a1, d1, a2, d2) %#ok<INUSL> 0397 0398 if nargin == 3 0399 d.Up = a1*d1.Up; 0400 d.Vp = a1*d1.Vp; 0401 d.M = a1*d1.M; 0402 elseif nargin == 5 0403 d.Up = a1*d1.Up + a2*d2.Up; 0404 d.Vp = a1*d1.Vp + a2*d2.Vp; 0405 d.M = a1*d1.M + a2*d2.M; 0406 else 0407 error('fixedrank.lincomb takes either 3 or 5 inputs.'); 0408 end 0409 0410 end