Example of stochastic gradient algorithm in Manopt on a PCA problem. PCA (principal component analysis) on a dataset A of size nxd consists in solving minimize_X f(X) = -.5*norm(A*X, 'fro')^2 / n, where X is a matrix of dimension dxk with orthonormal columns. This is equivalent to finding k dominant singular vectors of A, or k top eigenvectors of A'*A. If n is large, this computation can be expensive. Thus, stochastic gradient algorithms take the point of view that f(X) is a sum of many (n) terms: each term involves only one of the n rows of A. To make progress, it may be sufficient to optimize with respect to a subset of the terms at each iteration. This way, each individual iteration can be very cheap. In particular, individual operations have cost independent of n, because f or its gradient need never be evaluated completely (or at all in the case of f.) Stochastic gradient algorithms (this implementation in particular) are sensitive to proper parameter tuning. See in code.
0001 function [X, A] = PCA_stochastic(A, k) 0002 % Example of stochastic gradient algorithm in Manopt on a PCA problem. 0003 % 0004 % PCA (principal component analysis) on a dataset A of size nxd consists 0005 % in solving 0006 % 0007 % minimize_X f(X) = -.5*norm(A*X, 'fro')^2 / n, 0008 % 0009 % where X is a matrix of dimension dxk with orthonormal columns. This 0010 % is equivalent to finding k dominant singular vectors of A, or k top 0011 % eigenvectors of A'*A. 0012 % 0013 % If n is large, this computation can be expensive. Thus, stochastic 0014 % gradient algorithms take the point of view that f(X) is a sum of many (n) 0015 % terms: each term involves only one of the n rows of A. 0016 % 0017 % To make progress, it may be sufficient to optimize with respect to a 0018 % subset of the terms at each iteration. This way, each individual 0019 % iteration can be very cheap. In particular, individual operations have 0020 % cost independent of n, because f or its gradient need never be evaluated 0021 % completely (or at all in the case of f.) 0022 % 0023 % Stochastic gradient algorithms (this implementation in particular) are 0024 % sensitive to proper parameter tuning. See in code. 0025 0026 % This file is part of Manopt and is copyrighted. See the license file. 0027 % 0028 % Main author: Bamdev Mishra and Nicolas Boumal, Sept. 6, 2017 0029 % Contributors: 0030 % 0031 % Change log: 0032 % 0033 0034 0035 % If none is given, generate a random data set: n samples in R^d 0036 if ~exist('A', 'var') || isempty(A) 0037 d = 1000; 0038 n = 100000; 0039 fprintf('Generating data...'); 0040 A = randn(n, d)*diag([[15 10 5], ones(1, d-3)]); 0041 fprintf(' done (size: %d x %d).\n', size(A)); 0042 else 0043 [n, d] = size(A); 0044 end 0045 0046 % Pick a number of component to compute 0047 if ~exist('k', 'var') || isempty(k) 0048 k = 3; 0049 end 0050 0051 % We are looking for k orthonormal vectors in R^d: Stiefel manifold. 0052 problem.M = stiefelfactory(d, k); 0053 0054 % The cost function to minimize is a sum of n terms. This parameter 0055 % must be set for stochastic algorithms. 0056 problem.ncostterms = n; 0057 0058 % We do not need to specify how to compute the value of the cost 0059 % function (stochastic algorithms never use this). All we need is to 0060 % specify how to compute the gradient of the cost function, where the 0061 % sum is restricted to a subset of the terms (a sample). Notice that we 0062 % specify a partial Euclidean gradient (hence the 'e' in partialegrad). 0063 % This way, Manopt will automatically convert the Euclidean vector into 0064 % a proper Riemannian partial gradient, in the tangent space at X. 0065 % In particular, if sample = 1:n, then the partial gradient corresponds 0066 % to the actual (complete) gradient. 0067 problem.partialegrad = @partialegrad; 0068 function G = partialegrad(X, sample) 0069 0070 % X is an orthonormal matrix of size dxk 0071 % sample is a vector if indices between 1 and n: a subset 0072 % Extract a subset of the dataset 0073 Asample = A(sample, :); 0074 0075 % Compute the gradient of f restricted to that sample 0076 G = -Asample'*(Asample*X); 0077 G = G / n; 0078 0079 end 0080 0081 % If one wants to use checkgradient to verify one's work, then it is 0082 % necessary to specify the cost function as well, as below. 0083 % problem.cost = @(X) -.5*norm(A*X, 'fro')^2 / n; 0084 % checkgradient(problem); pause; 0085 0086 % To have the solver record statistics every x iterations, set 0087 % options.checkperiod to x. This will record simple quantities which 0088 % are almost free to compute (namely, elapsed time and step size of the 0089 % last step.) To record more sophisticated quantities, you can use 0090 % options.statsfun as usual. Time spent computing these statistics is 0091 % not counted in times reported in the info structure returned by the 0092 % solver. 0093 options.checkperiod = 10; 0094 options.statsfun = statsfunhelper('metric', @(X) norm(A*X, 'fro')); 0095 0096 % Set the parameters for the solver: stochastic gradient algorithms 0097 % tend to be quite sensitive to proper tuning, especially regarding 0098 % step size selection. See the solver's documentation for details. 0099 options.maxiter = 200; 0100 options.batchsize = 10; 0101 % options.stepsize_type = 'decay'; 0102 options.stepsize_init = 1e2; 0103 options.stepsize_lambda = 1e-3; 0104 options.verbosity = 2; 0105 0106 % Run the solver 0107 [X, info] = stochasticgradient(problem, [], options); 0108 0109 0110 % Plot the special metric recorded by options.statsfun 0111 plot([info.iter], [info.metric], '.-'); 0112 xlabel('Iteration #'); 0113 ylabel('Frobenius norm of A*X'); 0114 title('Convergence of stochasticgradient on stiefelfactory for PCA'); 0115 0116 % Add to that plot a reference: the globally optimal value attained if 0117 % the true dominant singular vectors are computed. 0118 fprintf('Running svds... '); 0119 t = tic(); 0120 [V, ~] = svds(A', k); 0121 fprintf('done: %g [s] (note: svd may be faster)\n', toc(t)); 0122 hold all; 0123 bound = norm(A*V, 'fro'); 0124 plot([info.iter], bound*ones(size([info.iter])), '--'); 0125 hold off; 0126 0127 legend('Algorithm', 'SVD bound', 'Location', 'SouthEast'); 0128 0129 end