# RE: Linear VS LTBS

From: Bob Leary <bleary>
Date: Tue, 25 Aug 2009 18:28:24 -0400

Sorry to back up a few days on this thread, but I could not
resist piling yet another comment on the traditional covariance
success/convergence debate.

From a purely algorithmic perspective, the relationship between =
convergence behavior
and success of the covariance step is extremely tenuous.

NONMEM uses a version of a BFGS quasi Newton method to drive the ELS =
obective function
optimization. At any given stage, the descent direction is of the form
-(H_BFGS)**-1 * g, where g is the gradient of the objective function
and H_BFGS is a positive definite matrix
that captures accumulated curvature information recovered from the =
entire sequence
of previously evaluated gradients at various points. Note that the =
'true'
Newton direction is -H**-1 * g, where H is the Hessian evaluated at the =
current point.
H also captures localized (to the current point) curvature information,
but is difficult and expensive to evaluate, so BFGS methods use the much =
more easily
computed quasi-Newton direction. With accurate gradients, the =
quasi-Newton direction
is necessarily a descent direction (since H_BFGS is guaranteed to be =
positive definite),
while the true Newton direction need not be. Note that H and H_BFGS
do not bear any necessary relation to each other, although H_BFGS is =
often thought
of as a surrogate or approximant for H.

At convergence (usually recognized by the algorithm as the gradient =
becoming
'sufficiently' small in magnitude), the standard errors in principle are =

computed from the square roots of the diagonal elements of H**(-1), =
assuming H
is positive definite (which is almost always the case if at least a
local minimum has been found - there are some degenerate exceptions, =
called
non-Morse points where the Hessian is only positive semidefinite at
the local optimum, but these are rarely encountered).

But in practice the true Hessian H is hard to compute,
and NM uses a numerical approximation to the Hessian
(I believe forward difference, meaning for example in one dimension
the second derivative
at x is approximated by [f(x+2*eps) - 2f(x+eps) + f(x )}/eps^2. Even =
if an optimal
step size eps is used and f can be computed to
full 15 digit double precision accuracy (which is way too optimistic if =
the model
is defined by ODEs which are solved numerically), the best you can do =
numerically
with forward differences is about 5 significant
digits of accuracy of agreement of the numerical hessian with the
true Hessian . Central differences, which are much more expensive, do a =
little bit better.

But for the hessian inverse, if the numerical hessian or the true =
Hessian has a condition
number greater than 10^5 (a condition number of 10^5 is actually quite =
benign for most
matrix computations), then there
are no siginificant digits of agreement of the std errors computed from
the numerical hessian and the standard errors computed from the true
hessian. Moreover, in this case a perfectly respectable
positive definite matrix H can easily become indefinite when =
approximated numerically,
which will cause the covariance step to fail.

Thus failure of the covariance step means only that the numerical =
Hessian at
the converged point is not positive definite, but very little can be =
concluded
from this regarding the true H due to the inherent numerical =
difficulties
and imprecison in numerical
Hessian computations. Moreover, the numerical Hessian plays no role =
whatever
in the convergence of the algorithm. Conversely, success of the =
covariance step
simply means the numerical Hessian is positive definite.

What is well known is that most gradient based (such as simple gradient =
descent,
newton, quasi-Newton, and conjugate gradient) unconstrained
numerical optimization methods typically work best
(speed ,accuracy, and reliability) when the true Hessian H in the =
vicinity of the optimum
has a fairly narrow range of eigenvalues - e.g. when the condition =
number
max eigenvalue/ min eigenvalue is relatively small. (In the case of =
methods, it is possible to obtain theoretical bounds on rates of =
convergence
in terms of condition number with the rates being inversely related to
the condition number).

Of course, a good condition number near the optimum
but not near the starting point does not
preclude convergence problems occuring away from the neighborhood
of the optimum.

But all other things being equal, well conditioned true Hessians (small =
condition numbers) in
the neighborhood of the optimum are generally favorable for convergence
behavior. In terms of parameterization,
each additional parameter usually increases the condition number at the =
optimum
(this can be shown to be strictly true in the case of ordinary linear
regression). So in this sense, overparameterization 'typically'
adversely affects convergence behavior. But unfortunately it is not =
clear how
to turn this into a practical numerical overparametization criterion, =
particularly
given the inherent unreliability of information derived from numerical =
Hessians.

Robert H. Leary, PhD
Fellow

Pharsight - A Certara(tm) Company
5625 Dillard Dr., Suite 205
Cary, NC 27511

Phone/Voice Mail: (919) 852-4625, Fax: (919) 859-6871
Email: bleary

> This email message (including any attachments) is for the sole use of =
the intended recipient and may contain confidential and proprietary =
information. Any disclosure or distribution to third parties that is =
not specifically authorized by the sender is prohibited. If you are not =
destroy all copies of the original message.

-----Original Message-----
From: owner-nmusers
[mailto:owner-nmusers
Sent: Friday, August 21, 2009 16:44 PM
To: nmusers
Subject: Re: [NMusers] Linear VS LTBS

Leonid,

You are once again ignoring the actual evidence that NONMEM VI will fail =

to converge or not complete the covariance step more or less at random.
If you bootstrap simulated data in which the model is known and not
overparameterised it has been shown repeatedly that NONMEM VI will
sometimes converge and do the covariance step and sometimes fail to
converge.

Of course, I agree that overparameterisation could be a cause of
convergence problems but I would not agree that this is often the =
reason.

Bob Bauer has made efforts in NONMEM 7 to try to fix the random
termination behaviour and covariance step problems by providing
additional control over numerical tolerances. It remains to be seen by
direct experiment if NONMEM 7 is indeed less random than NONMEM VI.

BTW in this discussion about LTBS I think it is important to point out
that the only systematic study I know of comparing LTBS with
untransformed models was the one you reported at the 2008 PAGE meeting
(www.page-meeting.org/?abstract=1268). My understanding of your =
results
was that there was no clear advantage of LTBS if INTER was used with
non-transformed data:
"Models with exponential residual error presented in the log-transformed =

variables
performed similar to the ones fitted in original variables with INTER
option. For problems with
residual variability exceeding 40%, use of INTER option or
log-transformation was necessary to
obtain unbiased estimates of inter- and intra-subject variability."

Do you know of any other systematic studies comparing LTBS with no
transformation?

Nick

Leonid Gibiansky wrote:
> Neil
> Large RSE, inability to converge, failure of the covariance step are
> often caused by the over-parametrization of the model. If you already
> have bootstrap, look at the scatter-plot matrix of parameters versus
> parameters (THATA1 vs THETA2, .., THETA1 vs OMEGA1, ...), these are
> very informative plots. If you have over-parametrization on the
> population level, it will be seen in these plots as strong
> correlations of the parameter estimates.
>
> Also, look on plots of ETAs vs ETAs. If you see strong correlation
> (close to 1) there, it may indicate over-parametrization on the
> individual level (too many ETAs in the model).
>
> For random effect with a very large RSE on the variance, I would try
> to remove it and see what happens with the model: often, this (high
> RSE) is the indication that the error effect is not needed.
>
> Also, try combined error model (on log-transformed variables):
>
> W1=SQRT(THETA(...)/IPRED**2+THETA(...))
> Y = LOG(IPRED) + W1*EPS(1)
>
>
> \$SIGMA
> 1 FIXED
>
>
> Why concentrations were on LOQ? Was it because BQLs were inserted as
> LOQ? Then this is not a good idea.
> Thanks
> Leonid
>
>
> --------------------------------------
> Leonid Gibiansky, Ph.D.
> President, QuantPharm LLC
> web: www.quantpharm.com
> e-mail: LGibiansky at quantpharm.com
> tel: (301) 767 5566
>
>
>
>
> Indranil Bhattacharya wrote:
>>
>> When using LTBS I had used a different error model and the error
>> block is shown below
>> \$ERROR
>> IPRED = -5
>> IF (F.GT.0) IPRED = LOG(F) ;log transforming predicition
>> IRES=DV-IPRED
>> W=1
>> IWRES=IRES/W ;Uniform Weighting
>> Y = IPRED + ERR(1)
>>
>> I also performed bootsrap on both LTBS and non-LTBS models and the
>> non-LTBS CI were much more tighter and the precision was greater than =

>> non-LTBS.
>> I think the problem plausibly is with the fact that when fitting the
>> non-transformed data I have used the proportional + additive model
>> while using LTBS the exponential model (which converts to additional
>> model due to LTBS) was used. The extra additive component also may be =

>> more important in the non-LTBS model as for some subjects the
>> concentrations were right on LOQ.
>>
>> I tried the dual error model for LTBS but does not provide a CV%. So
>> I am currently running a bootstrap to get the CI when using the dual
>> error model with LTBS.
>>
>> Neil
>>
>> On Fri, Aug 21, 2009 at 3:01 AM, Grevel, Joachim
>> <Joachim.Grevel
>> <mailto:Joachim.Grevel
>>
>> Hi Neil,
>> 1. When data are log-transformed the \$ERROR block has to
>> change:
>> additive error becomes true exponential error which cannot be
>> achieved without log-transformation (Nick, correct me if I am
>> wrong).
>> 2. Error cannot "go away". You claim your structural model
>> (THs)
>> remained unchanged. Therefore the "amount" of error will remain =
the
>> same as well. If you reduce BSV you may have to "pay" for it with
>> increased residual variability.
>> 3. Confidence intervals of ETAs based on standard errors
>> produced
>> during the covariance step are unreliable (many threads in =
NMusers).
>> Do bootstrap to obtain more reliable C.I..
>> These are my five cents worth of thought in the early =
morning,
>> Good luck,
>> Joachim
>>
>>
>> =
------------------------------------------------------------------------
>>
>> AstraZeneca UK Limited is a company incorporated in England and
>> Wales with registered number: 03674842 and a registered office at =
15
>> Stanhope Gate, London W1K 1LN.
>>
>> *Confidentiality Notice: *This message is private and may contain
>> confidential, proprietary and legally privileged information. If =
you
>> have received this message in error, please notify us and remove =
it
>> from your system and note that you must not copy, distribute or =
take
>> any action in reliance on it. Any unauthorised use or disclosure =
of
>> the contents of this message is not permitted and may be =
unlawful.
>>
>> *Disclaimer:* Email messages may be subject to delays, =
interception,
>> non-delivery and unauthorised alterations. Therefore, information
>> expressed in this message is not given or endorsed by AstraZeneca =
UK
>> Limited unless otherwise notified by an authorised representative
>> independent of this message. No contractual relationship is =
created
>> by this message by any person unless specifically indicated by
>> agreement in writing other than email.
>>
>> *Monitoring: *AstraZeneca UK Limited may monitor email traffic =
data
>> and content for the purposes of the prevention and detection of
>> crime, ensuring the security of our computer systems and checking
>> compliance with our Code of Conduct and policies.
>>
>> -----Original Message-----
>>
>>
>> *From:* owner-nmusers
>> <mailto:owner-nmusers
>> [mailto:owner-nmusers
>> <mailto:owner-nmusers
>> Bhattacharya
>> *Sent:* 20 August 2009 17:07
>> *To:* nmusers
>> *Subject:* [NMusers] Linear VS LTBS
>>
>> Hi, while data fitting using NONMEM on a regular PK data set
>> and its log transformed version I made the following
>> observations
>> - PK parameters (thetas) were generally similar
>> between
>> regular and when using LTBS.
>> -ETA on CL was similar
>> -ETA on Vc was different between the two runs.
>> - Sigma was higher in LTBS (51%) than linear (33%)
>> Now using LTBS, I would have expected to see the
>> ETAs unchanged
>> or actually decrease and accordingly I observed that the eta
>> values decreased showing less BSV. However the %RSE for ETA =
on
>> VC changed from 40% (linear) to 350% (LTBS) and further the
>> lower 95% CI bound has a negative number for ETA on Vc =
(-0.087).
>> What would be the explanation behind the above
>> observations
>> regarding increased %RSE using LTBS and a negative lower =
bound
>> for ETA on Vc? Can a negative lower bound in ETA be =
considered
>> as zero?
>> Also why would the residual vriability increase when using =
LTBS?
>> Please note that the PK is multiexponential (may be
>> this is
>> responsible).
>> Thanks.
>> Neil
>>
>> -- Indranil Bhattacharya
>>
>>
>>
>>
>> --
>> Indranil Bhattacharya
>>

--
Nick Holford, Professor Clinical Pharmacology
Dept Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New =
Zealand
n.holford
mobile: +64 21 46 23 53
http://www.fmhs.auckland.ac.nz/sms/pharmacology/holford
Received on Tue Aug 25 2009 - 18:28:24 EDT

The NONMEM Users Network is maintained by ICON plc. Requests to subscribe to the network should be sent to: nmusers-request@iconplc.com.

Once subscribed, you may contribute to the discussion by emailing: nmusers@globomaxnm.com.