Home > manopt > solvers > trustregions > tCG.m

tCG

PURPOSE ^

tCG - Truncated (Steihaug-Toint) Conjugate-Gradient method

SYNOPSIS ^

function [eta, Heta, inner_it, stop_tCG]= tCG(problem, x, grad, eta, Delta, options, storedb, key)

DESCRIPTION ^

 tCG - Truncated (Steihaug-Toint) Conjugate-Gradient method
 minimize <eta,grad> + .5*<eta,Hess(eta)>
 subject to <eta,eta>_[inverse precon] <= Delta^2

 See also: trustregions

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [eta, Heta, inner_it, stop_tCG] ...
0002                  = tCG(problem, x, grad, eta, Delta, options, storedb, key)
0003 % tCG - Truncated (Steihaug-Toint) Conjugate-Gradient method
0004 % minimize <eta,grad> + .5*<eta,Hess(eta)>
0005 % subject to <eta,eta>_[inverse precon] <= Delta^2
0006 %
0007 % See also: trustregions
0008 
0009 % This file is part of Manopt: www.manopt.org.
0010 % This code is an adaptation to Manopt of the original GenRTR code:
0011 % RTR - Riemannian Trust-Region
0012 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan
0013 % Florida State University
0014 % School of Computational Science
0015 % (http://www.math.fsu.edu/~cbaker/GenRTR/?page=download)
0016 % See accompanying license file.
0017 % The adaptation was executed by Nicolas Boumal.
0018 %
0019 % Change log:
0020 %
0021 %   NB Feb. 12, 2013:
0022 %       We do not project r back to the tangent space anymore: it was not
0023 %       necessary, and as of Manopt 1.0.1, the proj operator does not
0024 %       coincide with this notion anymore.
0025 %
0026 %   NB April 3, 2013:
0027 %       tCG now also returns Heta, the Hessian at x along eta. Additional
0028 %       esthetic modifications.
0029 %
0030 %   NB Dec. 2, 2013:
0031 %       If options.useRand is activated, we now make sure the preconditio-
0032 %       ner is not used, as was originally intended in GenRTR. In time, we
0033 %       may want to investigate whether useRand can be modifed to work well
0034 %       with preconditioning too.
0035 %
0036 %   NB Jan. 9, 2014:
0037 %       Now checking explicitly for model decrease at each iteration. The
0038 %       first iteration is a Cauchy point, which necessarily realizes a
0039 %       decrease of the model cost. If a model increase is witnessed
0040 %       (which is theoretically impossible if a linear operator is used for
0041 %       the Hessian approximation), then we return the previous eta. This
0042 %       ensures we always achieve at least the Cauchy decrease, which
0043 %       should be sufficient for convergence.
0044 %
0045 %   NB Feb. 17, 2015:
0046 %       The previous update was in effect verifying that the current eta
0047 %       performed at least as well as the first eta (the Cauchy step) with
0048 %       respect to the model cost. While this is an acceptable strategy,
0049 %       the documentation (and the original intent) was to ensure a
0050 %       monotonic decrease of the model cost at each new eta. This is now
0051 %       the case, with the added line: "model_value = new_model_value;".
0052 %
0053 %   NB April 3, 2015:
0054 %       Works with the new StoreDB class system.
0055 
0056 
0057 % All terms involving the trust-region radius will use an inner product
0058 % w.r.t. the preconditioner; this is because the iterates grow in
0059 % length w.r.t. the preconditioner, guaranteeing that we will not
0060 % re-enter the trust-region.
0061 %
0062 % The following recurrences for Prec-based norms and inner
0063 % products come from [CGT2000], pg. 205, first edition.
0064 % Below, P is the preconditioner.
0065 %
0066 % <eta_k,P*delta_k> =
0067 %          beta_k-1 * ( <eta_k-1,P*delta_k-1> + alpha_k-1 |delta_k-1|^2_P )
0068 % |delta_k|^2_P = <r_k,z_k> + beta_k-1^2 |delta_k-1|^2_P
0069 %
0070 % therefore, we need to keep track of
0071 % 1)   |delta_k|^2_P
0072 % 2)   <eta_k,P*delta_k> = <eta_k,delta_k>_P
0073 % 3)   |eta_k  |^2_P
0074 %
0075 % initial values are given by:
0076 %    |delta_0|_P = <r,z>
0077 %    |eta_0|_P   = 0
0078 %    <eta_0,delta_0>_P = 0
0079 % because we take eta_0 = 0 (if useRand = false).
0080 %
0081 % [CGT2000] Conn, Gould and Toint: Trust-region methods, 2000.
0082 
0083 inner = problem.M.inner;
0084 lincomb = problem.M.lincomb;
0085 
0086 theta = options.theta;
0087 kappa = options.kappa;
0088 
0089 if ~options.useRand % and therefore, eta == 0
0090     Heta = problem.M.zerovec(x);
0091     r = grad;
0092     e_Pe = 0;
0093 else % and therefore, no preconditioner
0094     % eta (presumably) ~= 0 was provided by the caller.
0095     Heta = getHessian(problem, x, eta, storedb, key);
0096     r = lincomb(x, 1, grad, 1, Heta);
0097     e_Pe = inner(x, eta, eta);
0098 end
0099 r_r = inner(x, r, r);
0100 norm_r = sqrt(r_r);
0101 norm_r0 = norm_r;
0102 
0103 % Precondition the residual.
0104 if ~options.useRand
0105     z = getPrecon(problem, x, r, storedb, key);
0106 else
0107     z = r;
0108 end
0109 
0110 % Compute z'*r.
0111 z_r = inner(x, z, r);
0112 d_Pd = z_r;
0113 
0114 % Initial search direction.
0115 delta  = lincomb(x, -1, z);
0116 if ~options.useRand % and therefore, eta == 0
0117     e_Pd = 0;
0118 else % and therefore, no preconditioner
0119     e_Pd = inner(x, eta, delta);
0120 end
0121 
0122 % If the Hessian or a linear Hessian approximation is in use, it is
0123 % theoretically guaranteed that the model value decreases strictly
0124 % with each iteration of tCG. Hence, there is no need to monitor the model
0125 % value. But, when a nonlinear Hessian approximation is used (such as the
0126 % built-in finite-difference approximation for example), the model may
0127 % increase. It is then important to terminate the tCG iterations and return
0128 % the previous (the best-so-far) iterate. The variable below will hold the
0129 % model value.
0130 model_fun = @(eta, Heta) inner(x, eta, grad) + .5*inner(x, eta, Heta);
0131 if ~options.useRand
0132     model_value = 0;
0133 else
0134     model_value = model_fun(eta, Heta);
0135 end
0136 
0137 % Pre-assume termination because j == end.
0138 stop_tCG = 5;
0139 
0140 % Begin inner/tCG loop.
0141 j = 0;
0142 for j = 1 : options.maxinner
0143     
0144     % This call is the computationally expensive step.
0145     Hdelta = getHessian(problem, x, delta, storedb, key);
0146     
0147     % Compute curvature (often called kappa).
0148     d_Hd = inner(x, delta, Hdelta);
0149     
0150     
0151     % Note that if d_Hd == 0, we will exit at the next "if" anyway.
0152     alpha = z_r/d_Hd;
0153     % <neweta,neweta>_P =
0154     % <eta,eta>_P + 2*alpha*<eta,delta>_P + alpha*alpha*<delta,delta>_P
0155     e_Pe_new = e_Pe + 2.0*alpha*e_Pd + alpha*alpha*d_Pd;
0156     
0157     if options.debug > 2,
0158         fprintf('DBG:   (r,r)  : %e\n', r_r);
0159         fprintf('DBG:   (d,Hd) : %e\n', d_Hd);
0160         fprintf('DBG:   alpha  : %e\n', alpha);
0161     end
0162     
0163     % Check against negative curvature and trust-region radius violation.
0164     % If either condition triggers, we bail out.
0165     if d_Hd <= 0 || e_Pe_new >= Delta^2,
0166         % want
0167         %  ee = <eta,eta>_prec,x
0168         %  ed = <eta,delta>_prec,x
0169         %  dd = <delta,delta>_prec,x
0170         tau = (-e_Pd + sqrt(e_Pd*e_Pd + d_Pd*(Delta^2-e_Pe))) / d_Pd;
0171         if options.debug > 2,
0172             fprintf('DBG:     tau  : %e\n', tau);
0173         end
0174         eta  = lincomb(x, 1,  eta, tau,  delta);
0175         
0176         % If only a nonlinear Hessian approximation is available, this is
0177         % only approximately correct, but saves an additional Hessian call.
0178         Heta = lincomb(x, 1, Heta, tau, Hdelta);
0179         
0180         % Technically, we may want to verify that this new eta is indeed
0181         % better than the previous eta before returning it (this is always
0182         % the case if the Hessian approximation is linear, but I am unsure
0183         % whether it is the case or not for nonlinear approximations.)
0184         % At any rate, the impact should be limited, so in the interest of
0185         % code conciseness (if we can still hope for that), we omit this.
0186         
0187         if d_Hd <= 0,
0188             stop_tCG = 1;     % negative curvature
0189         else
0190             stop_tCG = 2;     % exceeded trust region
0191         end
0192         break;
0193     end
0194     
0195     % No negative curvature and eta_prop inside TR: accept it.
0196     e_Pe = e_Pe_new;
0197     new_eta  = lincomb(x, 1,  eta, alpha,  delta);
0198     
0199     % If only a nonlinear Hessian approximation is available, this is
0200     % only approximately correct, but saves an additional Hessian call.
0201     new_Heta = lincomb(x, 1, Heta, alpha, Hdelta);
0202     
0203     % Verify that the model cost decreased in going from eta to new_eta. If
0204     % it did not (which can only occur if the Hessian approximation is
0205     % nonlinear or because of numerical errors), then we return the
0206     % previous eta (which necessarily is the best reached so far, according
0207     % to the model cost). Otherwise, we accept the new eta and go on.
0208     new_model_value = model_fun(new_eta, new_Heta);
0209     if new_model_value >= model_value
0210         stop_tCG = 6;
0211         break;
0212     end
0213     
0214     eta = new_eta;
0215     Heta = new_Heta;
0216     model_value = new_model_value; %% added Feb. 17, 2015
0217     
0218     % Update the residual.
0219     r = lincomb(x, 1, r, alpha, Hdelta);
0220     
0221     % Compute new norm of r.
0222     r_r = inner(x, r, r);
0223     norm_r = sqrt(r_r);
0224     
0225     % Check kappa/theta stopping criterion.
0226     % Note that it is somewhat arbitrary whether to check this stopping
0227     % criterion on the r's (the gradients) or on the z's (the
0228     % preconditioned gradients). [CGT2000], page 206, mentions both as
0229     % acceptable criteria.
0230     if j >= options.mininner && norm_r <= norm_r0*min(norm_r0^theta, kappa)
0231         % Residual is small enough to quit
0232         if kappa < norm_r0^theta,
0233             stop_tCG = 3;  % linear convergence
0234         else
0235             stop_tCG = 4;  % superlinear convergence
0236         end
0237         break;
0238     end
0239     
0240     % Precondition the residual.
0241     if ~options.useRand
0242         z = getPrecon(problem, x, r, storedb, key);
0243     else
0244         z = r;
0245     end
0246     
0247     % Save the old z'*r.
0248     zold_rold = z_r;
0249     % Compute new z'*r.
0250     z_r = inner(x, z, r);
0251     
0252     % Compute new search direction.
0253     beta = z_r/zold_rold;
0254     delta = lincomb(x, -1, z, beta, delta);
0255     
0256     % Update new P-norms and P-dots [CGT2000, eq. 7.5.6 & 7.5.7].
0257     e_Pd = beta*(e_Pd + alpha*d_Pd);
0258     d_Pd = z_r + beta*beta*d_Pd;
0259     
0260 end  % of tCG loop
0261 inner_it = j;
0262 
0263 end

Generated on Fri 08-Sep-2017 12:43:19 by m2html © 2005