Home > examples > PCA_stochastic.m

PCA_stochastic

PURPOSE ^

Example of stochastic gradient algorithm in Manopt on a PCA problem.

SYNOPSIS ^

function [X, A] = PCA_stochastic(A, k)

DESCRIPTION ^

 Example of stochastic gradient algorithm in Manopt on a PCA problem.
 
 PCA (principal component analysis) on a dataset A of size nxd consists
 in solving
 
   minimize_X  f(X) = -.5*norm(A*X, 'fro')^2 / n,
 
 where X is a matrix of dimension dxk with orthonormal columns. This
 is equivalent to finding k dominant singular vectors of A, or k top
 eigenvectors of A'*A.
 
 If n is large, this computation can be expensive. Thus,  stochastic
 gradient algorithms take the point of view that f(X) is a sum of many (n)
 terms: each term involves only one of the n rows of A.

 To make progress, it may be sufficient to optimize with respect to a
 subset of the terms at each iteration. This way, each individual
 iteration can be very cheap. In particular, individual operations have
 cost independent of n, because f or its gradient need never be evaluated
 completely (or at all in the case of f.)

 Stochastic gradient algorithms (this implementation in particular) are
 sensitive to proper parameter tuning. See in code.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [X, A] = PCA_stochastic(A, k)
0002 % Example of stochastic gradient algorithm in Manopt on a PCA problem.
0003 %
0004 % PCA (principal component analysis) on a dataset A of size nxd consists
0005 % in solving
0006 %
0007 %   minimize_X  f(X) = -.5*norm(A*X, 'fro')^2 / n,
0008 %
0009 % where X is a matrix of dimension dxk with orthonormal columns. This
0010 % is equivalent to finding k dominant singular vectors of A, or k top
0011 % eigenvectors of A'*A.
0012 %
0013 % If n is large, this computation can be expensive. Thus,  stochastic
0014 % gradient algorithms take the point of view that f(X) is a sum of many (n)
0015 % terms: each term involves only one of the n rows of A.
0016 %
0017 % To make progress, it may be sufficient to optimize with respect to a
0018 % subset of the terms at each iteration. This way, each individual
0019 % iteration can be very cheap. In particular, individual operations have
0020 % cost independent of n, because f or its gradient need never be evaluated
0021 % completely (or at all in the case of f.)
0022 %
0023 % Stochastic gradient algorithms (this implementation in particular) are
0024 % sensitive to proper parameter tuning. See in code.
0025 
0026 % This file is part of Manopt and is copyrighted. See the license file.
0027 %
0028 % Main author: Bamdev Mishra and Nicolas Boumal, Sept. 6, 2017
0029 % Contributors:
0030 %
0031 % Change log:
0032 %
0033 
0034 
0035     % If none is given, generate a random data set: n samples in R^d
0036     if ~exist('A', 'var') || isempty(A)
0037         d = 1000;
0038         n = 100000;
0039         fprintf('Generating data...');
0040         A = randn(n, d)*diag([[15 10 5], ones(1, d-3)]);
0041         fprintf(' done (size: %d x %d).\n', size(A));
0042     else
0043         [n, d] = size(A);
0044     end
0045 
0046     % Pick a number of component to compute
0047     if ~exist('k', 'var') || isempty(k)
0048         k = 3;
0049     end
0050     
0051     % We are looking for k orthonormal vectors in R^d: Stiefel manifold.
0052     problem.M = stiefelfactory(d, k);
0053     
0054     % The cost function to minimize is a sum of n terms. This parameter
0055     % must be set for stochastic algorithms.
0056     problem.ncostterms = n;
0057     
0058     % We do not need to specify how to compute the value of the cost
0059     % function (stochastic algorithms never use this). All we need is to
0060     % specify how to compute the gradient of the cost function, where the
0061     % sum is restricted to a subset of the terms (a sample). Notice that we
0062     % specify a partial Euclidean gradient (hence the 'e' in partialegrad).
0063     % This way, Manopt will automatically convert the Euclidean vector into
0064     % a proper Riemannian partial gradient, in the tangent space at X.
0065     % In particular, if sample = 1:n, then the partial gradient corresponds
0066     % to the actual (complete) gradient.
0067     problem.partialegrad = @partialegrad;
0068     function G = partialegrad(X, sample)
0069         
0070         % X is an orthonormal matrix of size dxk
0071         % sample is a vector if indices between 1 and n: a subset
0072         % Extract a subset of the dataset
0073         Asample = A(sample, :);
0074         
0075         % Compute the gradient of f restricted to that sample
0076         G = -Asample'*(Asample*X);
0077         G = G / n;
0078         
0079     end
0080 
0081     % If one wants to use checkgradient to verify one's work, then it is
0082     % necessary to specify the cost function as well, as below.
0083     % problem.cost = @(X) -.5*norm(A*X, 'fro')^2 / n;
0084     % checkgradient(problem); pause;
0085 
0086     % To have the solver record statistics every x iterations, set
0087     % options.checkperiod to x. This will record simple quantities which
0088     % are almost free to compute (namely, elapsed time and step size of the
0089     % last step.) To record more sophisticated quantities, you can use
0090     % options.statsfun as usual. Time spent computing these statistics is
0091     % not counted in times reported in the info structure returned by the
0092     % solver.
0093     options.checkperiod = 10;
0094     options.statsfun = statsfunhelper('metric', @(X) norm(A*X, 'fro'));
0095     
0096     % Set the parameters for the solver: stochastic gradient algorithms
0097     % tend to be quite sensitive to proper tuning, especially regarding
0098     % step size selection. See the solver's documentation for details.
0099     options.maxiter = 200;
0100     options.batchsize = 10;
0101     % options.stepsize_type = 'decay';
0102     options.stepsize_init = 1e2;
0103     options.stepsize_lambda = 1e-3;
0104     options.verbosity = 2;
0105     
0106     % Run the solver
0107     [X, info] = stochasticgradient(problem, [], options);
0108     
0109     
0110     % Plot the special metric recorded by options.statsfun
0111     plot([info.iter], [info.metric], '.-');
0112     xlabel('Iteration #');
0113     ylabel('Frobenius norm of A*X');
0114     title('Convergence of stochasticgradient on stiefelfactory for PCA');
0115     
0116     % Add to that plot a reference: the globally optimal value attained if
0117     % the true dominant singular vectors are computed.
0118     fprintf('Running svds... ');
0119     t = tic();
0120     [V, ~] = svds(A', k);
0121     fprintf('done: %g [s] (note: svd may be faster)\n', toc(t));
0122     hold all;
0123     bound = norm(A*V, 'fro');
0124     plot([info.iter], bound*ones(size([info.iter])), '--');
0125     hold off;
0126     
0127     legend('Algorithm', 'SVD bound', 'Location', 'SouthEast');
0128     
0129 end

Generated on Mon 10-Sep-2018 11:48:06 by m2html © 2005