Description of shapefit

0001 function [T_hub, T_lsq, T_cvx] = shapefit_smoothed(V, J)
0002 % ShapeFit formulation for sensor network localization from pair directions
0003 %
0004 % function [T_hub, T_lsq, T_cvx] = shapefit_smoothed(V, J)
0005 %
0006 % This example in based on the paper http://arxiv.org/abs/1506.01437:
0007 % ShapeFit: Exact location recovery from corrupted pairwise directions, 2015
0008 % by Paul Hand, Choongbum Lee and Vladislav Voroninski.
0009 %
0010 % The problem is the following: there are n points t_1, ..., t_n in R^d
0011 % which need to be estimated (localized). To this end, we are given
0012 % measurements of some of the pairwise directions,
0013 % v_ij = (t_i - t_j) / norm(t_i - t_j) + noise.
0014 % Assume there are m such pairwise measurements, defining a graph with m
0015 % edges over n nodes. J is the signed incidence matrix of this graph (see
0016 % in code). To build J from lists I, J in R^m of nodes, use:
0017 % J = sparse([I ; J], [(1:m)' ; (1:m)'], [ones(m, 1), -ones(m, 1)], n, m, 2*m);
0018 %
0019 % The measurements are arranged in the matrix V of size d x m. From V, we
0020 % attempt to estimate t_1, ..., t_n, arranged in T, a matrix of size d x n.
0021 % The estimation can only be done up to translation and scaling. The
0022 % returned T's are centered: the columns sum to zero.
0023 %
0024 % ShapeFit is a formulation of this estimation problem which is robust to
0025 % outliers. It is a nonsmooth, convex optimization problem over an affine
0026 % space, i.e., a linear manifold. We smooth the cost using the pseudo-Huber
0027 % loss cost and solve the problem using Manopt. This requires two
0028 % ingredients: (1) a factory to describe the affine space, see
0029 % shapefitfactory; (2) defining the cost and its derivative, and minimizing
0030 % it while progressively tightening the smooth approximation (with
0031 % warm-start).
0032 %
0033 % Simply run the example to see the results on random data. It compares the
0034 % smoothed ShapeFit formulation against a least-squares formulation, when
0035 % the measurements include outliers. See in code to vary the noise
0036 % parameters, dimension d, number of nodes n, number of measurements m, ...
0037 %
0038 % Note: since the problem is convex, this returns the global optimum.
0039 % This example also illustrates the use of Manopt for optimization under
0040 % linear constraints: admittedly a simple subcase of optimization on
0041 % manifolds.
0042 %
0043 %
0044 % See also: shapefitfactory
0045 
0046 % This file is part of Manopt: www.manopt.org.
0047 % Original author: Nicolas Boumal, June 18, 2015.
0048 % Contributors:
0049 % Change log:
0050 %
0051 %   Jan.  4, 2021 (NB):
0052 %       Changes for compatibility with Octave 6.1.0.
0053 %
0054 %   Aug. 23, 2021 (XJ):
0055 %       Added AD to compute the egrad and the ehess
0056 
0057     % Generic useful functions
0058     center_cols = @(A) bsxfun(@minus, A, mean(A, 2));
0059     normalize_cols = @(A) bsxfun(@times, A, 1./sqrt(sum(A.^2, 1)));
0060     sqnorm_cols = @(A) sum(A.^2, 1);
0061 
0062     
0063     % DATA GENERATION
0064     %
0065     % If no inputs are specified, generate some random data for
0066     % illustration purposes.
0067     if nargin == 0
0068 
0069         % We estimate n points in R^d
0070         d =   2;
0071         n = 500;
0072 
0073         % Those points are the columns of T : they are what we need to
0074         % estimate, up to scaling and translation. We center T for
0075         % convenience.
0076         T_tru = center_cols(rand(d, n));
0077 
0078         % We get a measurement of some pairs of relative directions.
0079         % Which pairs is encoded in this graph, with J being the (signed,
0080         % transposed) incidence matrix. J is n x m, sparse.
0081         % There are roughly edge_fraction * n * (n-1) / 2 measurements.
0082         edge_fraction = 0.10;
0083         % [ii, jj] = erdosrenyi(n, edge_fraction);
0084         [ii, jj] = randomgraph(n, edge_fraction*nchoosek(n, 2));
0085         m = length(ii);
0086         J = sparse([ii ; jj], [(1:m)' ; (1:m)'], ...
0087                    [ones(m, 1), -ones(m, 1)], n, m, 2*m);
0088 
0089         % The measurements give the directions from one point to another.
0090         % That is: we get the position difference, normalized. Here, with
0091         % Gaussian noise. Least-squares will be well-suited for this.
0092         sigma = .0;
0093         V = normalize_cols(T_tru*J + sigma*randn(d, m)); % d x m
0094 
0095         % Outliers: we replace some of the direction measurements by
0096         % uniformly random unit-norm vectors.
0097         outlier_fraction = .3;
0098         outliers = rand(1, m) < outlier_fraction;
0099         V(:, outliers) = normalize_cols(randn(d, sum(outliers)));
0100         
0101     end % done generating random data
0102     
0103     
0104     
0105     
0106     
0107     [d, m] = size(V);
0108     n = size(J, 1);
0109     assert(size(J, 2) == m, 'J must be n x m, with V of size d x m.');
0110 
0111     VJt = full(V*J');
0112 
0113     % This "manifold" describes the Euclidean space of matrices T of size
0114     % d x n such that <VJt, T> = 1 and T has centered columns: T1 = 0.
0115     problem.M = shapefitfactory(VJt);
0116 
0117     % This linear operator computes the orthogonal projection of each
0118     % difference ti - tj on the orthogonal space to v_ij.
0119     % If the alignment is compatible with the data, then this is zero.
0120     % A(T) is a d x m matrix.
0121     A = @(T) Afun(T, V, J);
0122     function AT = Afun(T, V, J)
0123         TJ = T*J;
0124         AT = TJ - bsxfun(@times, V, sum(V .* TJ, 1));
0125     end
0126 
0127     % Need the adjoint of A, too. Input is d x m, output is d x n.
0128     Astar = @(W) (W - bsxfun(@times, V, sum(V.*W, 1)))*J';
0129 
0130     
0131     
0132     % LEAST-SQUARES
0133     %
0134     % First, work with a least-squares formulation of the problem.
0135     % That is, we minimize a (very nice) convex cost over an affine space.
0136     % Since the smooth solvers in Manopt converge to critical points, this
0137     % means they converge to global optimizers.
0138     problem.cost  = @(T) 0.5*norm(A(T), 'fro')^2;
0139     problem.egrad = @(T) Astar(A(T));
0140     problem.ehess = @(T, Tdot) Astar(A(Tdot));
0141 
0142     % An alternative way to compute the egrad and the ehess is to use
0143     % automatic differentiation provided in the deep learning toolbox (slower)
0144     % Notice that the function norm is not supported for AD so far.
0145     % Replace norm(...,'fro')^2 with cnormsqfro described in the file
0146     % manoptADhelp.m. Also operations between sparse matrices and dlarrys
0147     % is not supported so far. Transform V,J into full matrices. AD does
0148     % not support bsxfunc as well. Translate it into the expression of
0149     % repmat and .*. The whole thing would make the computation much slower
0150     % than specifying the egrad and the ehess.
0151     % V_full = full(V);
0152     % J_full = full(J);
0153     % problem.cost  = @(T) 0.5*cnormsqfro(A_AD(T));
0154     % function AT = A_AD(T)
0155     %    sum1 = sum(V_full .* (T*J_full), 1);
0156     %    repsum1 = repmat(sum1,size(sum1,1),1);
0157     %    AT = T*J_full - V_full.*repsum1;
0158     % end
0159     % call manoptAD to prepare AD for the problem structure
0160     % problem = manoptAD(problem);
0161     
0162     T_lsq = trustregions(problem);
0163     
0164 
0165     
0166     % PSEUDO-HUBER SMOOTHED SHAPEFIT
0167     %
0168     % Now solve the same, but with a pseudo-Huber loss instead of
0169     % least-squares.
0170     % We iteratively sharpen the Huber function, i.e., reduce delta.
0171     % It is important to warm start in such a fashion: trying to optimize
0172     % with a random initial guess and a very small delta is typically slow.
0173     % How fast one should decrease delta, and how accurately one should
0174     % optimize at each intermediate stage, is open for research.
0175     delta = 1;
0176     T_hub = []; % We could use T_lsq as initial guess, too.
0177     problem = rmfield(problem, 'ehess');
0178     warning('off', 'manopt:getHessian:approx');
0179     for iter = 1 : 12
0180         
0181         delta = delta / 2;
0182         
0183         h = @(x2) sqrt(x2 + delta^2) - delta; % pseudo-Huber loss
0184 
0185         problem.cost  = @(T) sum(h(sqnorm_cols(A(T))));
0186         problem.egrad = @(T) Astar(bsxfun(@times, A(T), ...
0187                                     1./sqrt(sqnorm_cols(A(T)) + delta^2)));
0188 
0189         % AD version
0190         % problem = rmfield(problem, 'egrad');
0191         % problem = rmfield(problem, 'autogradfunc');
0192         % problem = manoptAD(problem,'egrad');
0193                                 
0194         % Solve, using the previous solution as initial guess.
0195         T_hub = trustregions(problem, T_hub);
0196         
0197     end
0198     
0199     
0200     
0201     % CVX SHAPEFIT
0202     %
0203     % Actual ShapeFit cost (nonsmooth), with CVX.
0204     % You can get CVX from http://cvxr.com/.
0205     use_cvx_if_available = false;
0206     if use_cvx_if_available && exist('cvx_version', 'file')
0207         T_cvx = shapefit_cvx(V, J);
0208     else
0209         T_cvx = NaN(d, n);
0210     end
0211     
0212     
0213     
0214     % VISUALIZATION
0215     %
0216     % If T_true is available, for display, we scale the estimators to match
0217     % the norm of the target. The scaling factor is obtained by minimizing
0218     % the norm of the discrepancy : norm(T_tru - scale*T_xxx, 'fro').
0219     % A plot is produced if d is 2 or 3.
0220     if exist('T_tru', 'var') && (d == 2 || d == 3)
0221         
0222         T_lsq = T_lsq * trace(T_tru'*T_lsq) / norm(T_lsq, 'fro')^2;
0223         T_hub = T_hub * trace(T_tru'*T_hub) / norm(T_hub, 'fro')^2;
0224         T_cvx = T_cvx * trace(T_tru'*T_cvx) / norm(T_cvx, 'fro')^2;
0225 
0226     
0227         switch d
0228             case 2
0229                 plot(T_tru(1, :), T_tru(2, :), 'bo', ...
0230                      T_lsq(1, :), T_lsq(2, :), 'rx', ...
0231                      T_hub(1, :), T_hub(2, :), 'k.', ...
0232                      T_cvx(1, :), T_cvx(2, :), 'g.');
0233             case 3
0234                 plot3(T_tru(1, :), T_tru(2, :), T_tru(3, :), 'bo', ...
0235                       T_lsq(1, :), T_lsq(2, :), T_lsq(3, :), 'rx', ...
0236                       T_hub(1, :), T_hub(2, :), T_hub(3, :), 'k.', ...
0237                       T_cvx(1, :), T_cvx(2, :), T_cvx(3, :), 'g.');
0238         end
0239 
0240         legend('ground truth', 'least squares', ...
0241                sprintf('pseudo-huber, \\delta = %.1e', delta), ...
0242                'CVX ShapeFit');
0243            
0244         title(sprintf(['ShapeFit problem : d = %d, n = %d, edge ' ...
0245                        'fraction = %.2g, sigma = %.2g, outlier ' ...
0246                        'fraction = %.2g'], d, n, edge_fraction, sigma, ...
0247                        outlier_fraction));
0248         axis equal;
0249     
0250     end
0251 
0252 end
0253 
0254 
0255 % If CVX is available, it can be used to solve the nonsmooth problem
0256 % directly, very elegantly.
0257 function T_cvx = shapefit_cvx(V, J)
0258     d = size(V, 1);
0259     n = size(J, 1); %#ok<NASGU>
0260     VJt = full(V*J');
0261     cvx_begin
0262         variable T_cvx(d, n)
0263         % We want to minimize this:
0264         % minimize sum( norms( A(T_cvx), 2, 1 ) )
0265         % But unfortunately, CVX doesn't handle bsxfun. Instead, we use
0266         % repmat, which is slower, and hence hurts the comparison in
0267         % disfavor of CVX.
0268         minimize sum( norms( T_cvx*J - V .* repmat(sum(V .* (T_cvx*J), 1), [d, 1])  , 2, 1 ) )
0269         sum(T_cvx, 2) == zeros(d, 1); %#ok<NODEF,EQEFF>
0270         VJt(:).' * T_cvx(:) == 1; %#ok<EQEFF>
0271     cvx_end
0272 end
0273 
0274 
0275 function [I, J, A] = erdosrenyi(n, p) %#ok<DEFNU>
0276 % Generate a random Erdos-Renyi graph with n nodes and edge probability p.
0277 %
0278 % [I, J, A] = erdosrenyi(n, p)
0279 %
0280 % Returns a list of edges (I(k), J(k)) for a random, undirected Erdos-Renyi
0281 % graph with n nodes and edge probability p. A is the adjacency matrix.
0282 %
0283 % I(k) < J(k) for all k, i.e., all(I<J) is true.
0284 %
0285 % The memory requirements for this simple implementation scale as O(n^2).
0286 
0287     X = rand(n);
0288     mask = X <= p;
0289     X( mask) = 1;
0290     X(~mask) = 0;
0291     X = triu(X, 1);
0292 
0293     % A is the adjacency matrix
0294     A = X + X';
0295     
0296     [I, J] = find(X);
0297 
0298 end
0299 
0300 
0301 function [I, J, A] = randomgraph(n, m)
0302 % Generates a random graph over n nodes with at most m edges.
0303 %
0304 % function [I, J, A] = randomgraph(n, m)
0305 %
0306 % Selects m (undirected) edges from a graph over n nodes, uniformly at
0307 % random, with replacement. The self edges and repeated edges are then
0308 % discarded. The remaining number of edges is at most m, and should be
0309 % close to m if m is much smaller than nchoosek(n, 2).
0310 %
0311 % The output satisfies all(I < J). A is the corresponding adjacency matrix.
0312 %
0313 % Uses O(m) memory (not O(n^2)), making it fit for large, sparse graphs.
0314 
0315     % Generate m edges at random, with replacement, and remove repetitions.
0316     IJ = unique(sort(randi(n, m, 2), 2), 'rows');
0317     
0318     % Remove self-edges if any.
0319     IJ = IJ(IJ(:, 1) ~= IJ(:, 2), :);
0320     
0321     % Actual number of selected edges
0322     k = size(IJ, 1);
0323     
0324     I = IJ(:, 1);
0325     J = IJ(:, 2);
0326     
0327     % Build the adjacency matrix of the graph.
0328     A = sparse([I ; J], [J ; I], ones(2*k, 1), n, n, 2*k);
0329 
0330 end
shapefit_smoothed

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SUBFUNCTIONS

SOURCE CODE