Home > manopt > solvers > trustregions > tCG.m

tCG

PURPOSE ^

tCG - Truncated (Steihaug-Toint) Conjugate-Gradient method

SYNOPSIS ^

function [eta, Heta, inner_it, stop_tCG]= tCG(problem, x, grad, eta, Delta, options, storedb, key)

DESCRIPTION ^

 tCG - Truncated (Steihaug-Toint) Conjugate-Gradient method
 minimize <eta,grad> + .5*<eta,Hess(eta)>
 subject to <eta,eta>_[inverse precon] <= Delta^2

 See also: trustregions

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [eta, Heta, inner_it, stop_tCG] ...
0002                  = tCG(problem, x, grad, eta, Delta, options, storedb, key)
0003 % tCG - Truncated (Steihaug-Toint) Conjugate-Gradient method
0004 % minimize <eta,grad> + .5*<eta,Hess(eta)>
0005 % subject to <eta,eta>_[inverse precon] <= Delta^2
0006 %
0007 % See also: trustregions
0008 
0009 % This file is part of Manopt: www.manopt.org.
0010 % This code is an adaptation to Manopt of the original GenRTR code:
0011 % RTR - Riemannian Trust-Region
0012 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan
0013 % Florida State University
0014 % School of Computational Science
0015 % (http://www.math.fsu.edu/~cbaker/GenRTR/?page=download)
0016 % See accompanying license file.
0017 % The adaptation was executed by Nicolas Boumal.
0018 %
0019 % Change log:
0020 %
0021 %   NB Feb. 12, 2013:
0022 %       We do not project r back to the tangent space anymore: it was not
0023 %       necessary, and as of Manopt 1.0.1, the proj operator does not
0024 %       coincide with this notion anymore.
0025 %
0026 %   NB April 3, 2013:
0027 %       tCG now also returns Heta, the Hessian at x along eta. Additional
0028 %       esthetic modifications.
0029 %
0030 %   NB Dec. 2, 2013:
0031 %       If options.useRand is activated, we now make sure the preconditio-
0032 %       ner is not used, as was originally intended in GenRTR. In time, we
0033 %       may want to investigate whether useRand can be modifed to work well
0034 %       with preconditioning too.
0035 %
0036 %   NB Jan. 9, 2014:
0037 %       Now checking explicitly for model decrease at each iteration. The
0038 %       first iteration is a Cauchy point, which necessarily realizes a
0039 %       decrease of the model cost. If a model increase is witnessed
0040 %       (which is theoretically impossible if a linear operator is used for
0041 %       the Hessian approximation), then we return the previous eta. This
0042 %       ensures we always achieve at least the Cauchy decrease, which
0043 %       should be sufficient for convergence.
0044 %
0045 %   NB Feb. 17, 2015:
0046 %       The previous update was in effect verifying that the current eta
0047 %       performed at least as well as the first eta (the Cauchy step) with
0048 %       respect to the model cost. While this is an acceptable strategy,
0049 %       the documentation (and the original intent) was to ensure a
0050 %       monotonic decrease of the model cost at each new eta. This is now
0051 %       the case, with the added line: "model_value = new_model_value;".
0052 %
0053 %   NB April 3, 2015:
0054 %       Works with the new StoreDB class system.
0055 %
0056 %   NB April 17, 2018:
0057 %       Two changes:
0058 %        (a) Instead of updating delta and Hdelta, we now update -delta and
0059 %            -Hdelta, named mdelta and Hmdelta. This allows to spare one
0060 %            call to lincomb(x, -1, z).
0061 %        (b) We re-project mdelta to the tangent space at every iteration,
0062 %            to avoid drifting away from it. The choice to project mdelta
0063 %            specifically is motivated by the fact that this is the vector
0064 %            passed to getHessian, hence this is where accurate tangency
0065 %            may be most important. (All other operations are linear
0066 %            combinations of tangent vectors, which should be fairly safe.)
0067 
0068 
0069 % All terms involving the trust-region radius use an inner product
0070 % w.r.t. the preconditioner; this is because the iterates grow in
0071 % length w.r.t. the preconditioner, guaranteeing that we do not
0072 % re-enter the trust-region.
0073 %
0074 % The following recurrences for Prec-based norms and inner
0075 % products come from [CGT2000], pg. 205, first edition.
0076 % Below, P is the preconditioner.
0077 %
0078 % <eta_k,P*delta_k> =
0079 %          beta_k-1 * ( <eta_k-1,P*delta_k-1> + alpha_k-1 |delta_k-1|^2_P )
0080 % |delta_k|^2_P = <r_k,z_k> + beta_k-1^2 |delta_k-1|^2_P
0081 %
0082 % Therefore, we need to keep track of:
0083 % 1)   |delta_k|^2_P
0084 % 2)   <eta_k,P*delta_k> = <eta_k,delta_k>_P
0085 % 3)   |eta_k  |^2_P
0086 %
0087 % Initial values are given by
0088 %    |delta_0|_P = <r,z>
0089 %    |eta_0|_P   = 0
0090 %    <eta_0,delta_0>_P = 0
0091 % because we take eta_0 = 0 (if useRand = false).
0092 %
0093 % [CGT2000] Conn, Gould and Toint: Trust-region methods, 2000.
0094 
0095 inner   = @(u, v) problem.M.inner(x, u, v);
0096 lincomb = @(a, u, b, v) problem.M.lincomb(x, a, u, b, v);
0097 tangent = @(u) problem.M.tangent(x, u);
0098 
0099 theta = options.theta;
0100 kappa = options.kappa;
0101 
0102 if ~options.useRand % and therefore, eta == 0
0103     Heta = problem.M.zerovec(x);
0104     r = grad;
0105     e_Pe = 0;
0106 else % and therefore, no preconditioner
0107     % eta (presumably) ~= 0 was provided by the caller.
0108     Heta = getHessian(problem, x, eta, storedb, key);
0109     r = lincomb(1, grad, 1, Heta);
0110     e_Pe = inner(eta, eta);
0111 end
0112 r_r = inner(r, r);
0113 norm_r = sqrt(r_r);
0114 norm_r0 = norm_r;
0115 
0116 % Precondition the residual.
0117 if ~options.useRand
0118     z = getPrecon(problem, x, r, storedb, key);
0119 else
0120     z = r;
0121 end
0122 
0123 % Compute z'*r.
0124 z_r = inner(z, r);
0125 d_Pd = z_r;
0126 
0127 % Initial search direction (we maintain -delta in memory, called mdelta, to
0128 % avoid a change of sign of the tangent vector.)
0129 mdelta = z;
0130 if ~options.useRand % and therefore, eta == 0
0131     e_Pd = 0;
0132 else % and therefore, no preconditioner
0133     e_Pd = -inner(eta, mdelta);
0134 end
0135 
0136 % If the Hessian or a linear Hessian approximation is in use, it is
0137 % theoretically guaranteed that the model value decreases strictly
0138 % with each iteration of tCG. Hence, there is no need to monitor the model
0139 % value. But, when a nonlinear Hessian approximation is used (such as the
0140 % built-in finite-difference approximation for example), the model may
0141 % increase. It is then important to terminate the tCG iterations and return
0142 % the previous (the best-so-far) iterate. The variable below will hold the
0143 % model value.
0144 model_fun = @(eta, Heta) inner(eta, grad) + .5*inner(eta, Heta);
0145 if ~options.useRand
0146     model_value = 0;
0147 else
0148     model_value = model_fun(eta, Heta);
0149 end
0150 
0151 % Pre-assume termination because j == end.
0152 stop_tCG = 5;
0153 
0154 % Begin inner/tCG loop.
0155 j = 0;
0156 for j = 1 : options.maxinner
0157     
0158     % This call is the computationally expensive step.
0159     Hmdelta = getHessian(problem, x, mdelta, storedb, key);
0160     
0161     % Compute curvature (often called kappa).
0162     d_Hd = inner(mdelta, Hmdelta);
0163     
0164     
0165     % Note that if d_Hd == 0, we will exit at the next "if" anyway.
0166     alpha = z_r/d_Hd;
0167     % <neweta,neweta>_P =
0168     % <eta,eta>_P + 2*alpha*<eta,delta>_P + alpha*alpha*<delta,delta>_P
0169     e_Pe_new = e_Pe + 2.0*alpha*e_Pd + alpha*alpha*d_Pd;
0170     
0171     if options.debug > 2
0172         fprintf('DBG:   (r,r)  : %e\n', r_r);
0173         fprintf('DBG:   (d,Hd) : %e\n', d_Hd);
0174         fprintf('DBG:   alpha  : %e\n', alpha);
0175     end
0176     
0177     % Check against negative curvature and trust-region radius violation.
0178     % If either condition triggers, we bail out.
0179     if d_Hd <= 0 || e_Pe_new >= Delta^2
0180         % want
0181         %  ee = <eta,eta>_prec,x
0182         %  ed = <eta,delta>_prec,x
0183         %  dd = <delta,delta>_prec,x
0184         tau = (-e_Pd + sqrt(e_Pd*e_Pd + d_Pd*(Delta^2-e_Pe))) / d_Pd;
0185         if options.debug > 2
0186             fprintf('DBG:     tau  : %e\n', tau);
0187         end
0188         eta  = lincomb(1,  eta, -tau,  mdelta);
0189         
0190         % If only a nonlinear Hessian approximation is available, this is
0191         % only approximately correct, but saves an additional Hessian call.
0192         Heta = lincomb(1, Heta, -tau, Hmdelta);
0193         
0194         % Technically, we may want to verify that this new eta is indeed
0195         % better than the previous eta before returning it (this is always
0196         % the case if the Hessian approximation is linear, but I am unsure
0197         % whether it is the case or not for nonlinear approximations.)
0198         % At any rate, the impact should be limited, so in the interest of
0199         % code conciseness (if we can still hope for that), we omit this.
0200         
0201         if d_Hd <= 0
0202             stop_tCG = 1;     % negative curvature
0203         else
0204             stop_tCG = 2;     % exceeded trust region
0205         end
0206         break;
0207     end
0208     
0209     % No negative curvature and eta_prop inside TR: accept it.
0210     e_Pe = e_Pe_new;
0211     new_eta  = lincomb(1,  eta, -alpha,  mdelta);
0212     
0213     % If only a nonlinear Hessian approximation is available, this is
0214     % only approximately correct, but saves an additional Hessian call.
0215     new_Heta = lincomb(1, Heta, -alpha, Hmdelta);
0216     
0217     % Verify that the model cost decreased in going from eta to new_eta. If
0218     % it did not (which can only occur if the Hessian approximation is
0219     % nonlinear or because of numerical errors), then we return the
0220     % previous eta (which necessarily is the best reached so far, according
0221     % to the model cost). Otherwise, we accept the new eta and go on.
0222     new_model_value = model_fun(new_eta, new_Heta);
0223     if new_model_value >= model_value
0224         stop_tCG = 6;
0225         break;
0226     end
0227     
0228     eta = new_eta;
0229     Heta = new_Heta;
0230     model_value = new_model_value; %% added Feb. 17, 2015
0231     
0232     % Update the residual.
0233     r = lincomb(1, r, -alpha, Hmdelta);
0234     
0235     % Compute new norm of r.
0236     r_r = inner(r, r);
0237     norm_r = sqrt(r_r);
0238     
0239     % Check kappa/theta stopping criterion.
0240     % Note that it is somewhat arbitrary whether to check this stopping
0241     % criterion on the r's (the gradients) or on the z's (the
0242     % preconditioned gradients). [CGT2000], page 206, mentions both as
0243     % acceptable criteria.
0244     if j >= options.mininner && norm_r <= norm_r0*min(norm_r0^theta, kappa)
0245         % Residual is small enough to quit
0246         if kappa < norm_r0^theta
0247             stop_tCG = 3;  % linear convergence
0248         else
0249             stop_tCG = 4;  % superlinear convergence
0250         end
0251         break;
0252     end
0253     
0254     % Precondition the residual.
0255     if ~options.useRand
0256         z = getPrecon(problem, x, r, storedb, key);
0257     else
0258         z = r;
0259     end
0260     
0261     % Save the old z'*r.
0262     zold_rold = z_r;
0263     % Compute new z'*r.
0264     z_r = inner(z, r);
0265     
0266     % Compute new search direction.
0267     beta = z_r/zold_rold;
0268     mdelta = lincomb(1, z, beta, mdelta);
0269     
0270     % Since mdelta is passed to getHessian, which is the part of the code
0271     % we have least control over from here, we want to make sure mdelta is
0272     % a tangent vector up to numerical errors that should remain small.
0273     % For this reason, we re-project mdelta to the tangent space.
0274     % In limited tests, it was observed that it is a good idea to project
0275     % at every iteration rather than only every k iterations, the reason
0276     % being that loss of tangency can lead to more inner iterations being
0277     % run, which leads to an overall higher computational cost.
0278     mdelta = tangent(mdelta);
0279     
0280     % Update new P-norms and P-dots [CGT2000, eq. 7.5.6 & 7.5.7].
0281     e_Pd = beta*(e_Pd + alpha*d_Pd);
0282     d_Pd = z_r + beta*beta*d_Pd;
0283     
0284 end  % of tCG loop
0285 inner_it = j;
0286 
0287 end

Generated on Mon 10-Sep-2018 11:48:06 by m2html © 2005