From: Bob Leary <*bleary*>

Date: Tue, 25 Aug 2009 18:28:24 -0400

Sorry to back up a few days on this thread, but I could not

resist piling yet another comment on the traditional covariance

success/convergence debate.

From a purely algorithmic perspective, the relationship between =

convergence behavior

and success of the covariance step is extremely tenuous.

NONMEM uses a version of a BFGS quasi Newton method to drive the ELS =

obective function

optimization. At any given stage, the descent direction is of the form

-(H_BFGS)**-1 * g, where g is the gradient of the objective function

and H_BFGS is a positive definite matrix

that captures accumulated curvature information recovered from the =

entire sequence

of previously evaluated gradients at various points. Note that the =

'true'

Newton direction is -H**-1 * g, where H is the Hessian evaluated at the =

current point.

H also captures localized (to the current point) curvature information,

but is difficult and expensive to evaluate, so BFGS methods use the much =

more easily

computed quasi-Newton direction. With accurate gradients, the =

quasi-Newton direction

is necessarily a descent direction (since H_BFGS is guaranteed to be =

positive definite),

while the true Newton direction need not be. Note that H and H_BFGS

do not bear any necessary relation to each other, although H_BFGS is =

often thought

of as a surrogate or approximant for H.

At convergence (usually recognized by the algorithm as the gradient =

becoming

'sufficiently' small in magnitude), the standard errors in principle are =

computed from the square roots of the diagonal elements of H**(-1), =

assuming H

is positive definite (which is almost always the case if at least a

local minimum has been found - there are some degenerate exceptions, =

called

non-Morse points where the Hessian is only positive semidefinite at

the local optimum, but these are rarely encountered).

But in practice the true Hessian H is hard to compute,

and NM uses a numerical approximation to the Hessian

(I believe forward difference, meaning for example in one dimension

the second derivative

at x is approximated by [f(x+2*eps) - 2f(x+eps) + f(x )}/eps^2. Even =

if an optimal

step size eps is used and f can be computed to

full 15 digit double precision accuracy (which is way too optimistic if =

the model

is defined by ODEs which are solved numerically), the best you can do =

numerically

with forward differences is about 5 significant

digits of accuracy of agreement of the numerical hessian with the

true Hessian . Central differences, which are much more expensive, do a =

little bit better.

But for the hessian inverse, if the numerical hessian or the true =

Hessian has a condition

number greater than 10^5 (a condition number of 10^5 is actually quite =

benign for most

matrix computations), then there

are no siginificant digits of agreement of the std errors computed from

the numerical hessian and the standard errors computed from the true

hessian. Moreover, in this case a perfectly respectable

positive definite matrix H can easily become indefinite when =

approximated numerically,

which will cause the covariance step to fail.

Thus failure of the covariance step means only that the numerical =

Hessian at

the converged point is not positive definite, but very little can be =

concluded

from this regarding the true H due to the inherent numerical =

difficulties

and imprecison in numerical

Hessian computations. Moreover, the numerical Hessian plays no role =

whatever

in the convergence of the algorithm. Conversely, success of the =

covariance step

simply means the numerical Hessian is positive definite.

What is well known is that most gradient based (such as simple gradient =

descent,

newton, quasi-Newton, and conjugate gradient) unconstrained

numerical optimization methods typically work best

(speed ,accuracy, and reliability) when the true Hessian H in the =

vicinity of the optimum

has a fairly narrow range of eigenvalues - e.g. when the condition =

number

max eigenvalue/ min eigenvalue is relatively small. (In the case of =

conjugate gradient

methods, it is possible to obtain theoretical bounds on rates of =

convergence

in terms of condition number with the rates being inversely related to

the condition number).

Of course, a good condition number near the optimum

but not near the starting point does not

preclude convergence problems occuring away from the neighborhood

of the optimum.

But all other things being equal, well conditioned true Hessians (small =

condition numbers) in

the neighborhood of the optimum are generally favorable for convergence

behavior. In terms of parameterization,

each additional parameter usually increases the condition number at the =

optimum

(this can be shown to be strictly true in the case of ordinary linear

regression). So in this sense, overparameterization 'typically'

adversely affects convergence behavior. But unfortunately it is not =

clear how

to turn this into a practical numerical overparametization criterion, =

particularly

given the inherent unreliability of information derived from numerical =

Hessians.

Robert H. Leary, PhD

Fellow

Pharsight - A Certara(tm) Company

5625 Dillard Dr., Suite 205

Cary, NC 27511

Phone/Voice Mail: (919) 852-4625, Fax: (919) 859-6871

Email: bleary

*> This email message (including any attachments) is for the sole use of =
*

the intended recipient and may contain confidential and proprietary =

information. Any disclosure or distribution to third parties that is =

not specifically authorized by the sender is prohibited. If you are not =

the intended recipient, please contact the sender by reply email and =

destroy all copies of the original message.

-----Original Message-----

From: owner-nmusers

[mailto:owner-nmusers

Sent: Friday, August 21, 2009 16:44 PM

To: nmusers

Subject: Re: [NMusers] Linear VS LTBS

Leonid,

You are once again ignoring the actual evidence that NONMEM VI will fail =

to converge or not complete the covariance step more or less at random.

If you bootstrap simulated data in which the model is known and not

overparameterised it has been shown repeatedly that NONMEM VI will

sometimes converge and do the covariance step and sometimes fail to

converge.

Of course, I agree that overparameterisation could be a cause of

convergence problems but I would not agree that this is often the =

reason.

Bob Bauer has made efforts in NONMEM 7 to try to fix the random

termination behaviour and covariance step problems by providing

additional control over numerical tolerances. It remains to be seen by

direct experiment if NONMEM 7 is indeed less random than NONMEM VI.

BTW in this discussion about LTBS I think it is important to point out

that the only systematic study I know of comparing LTBS with

untransformed models was the one you reported at the 2008 PAGE meeting

(www.page-meeting.org/?abstract=1268). My understanding of your =

results

was that there was no clear advantage of LTBS if INTER was used with

non-transformed data:

"Models with exponential residual error presented in the log-transformed =

variables

performed similar to the ones fitted in original variables with INTER

option. For problems with

residual variability exceeding 40%, use of INTER option or

log-transformation was necessary to

obtain unbiased estimates of inter- and intra-subject variability."

Do you know of any other systematic studies comparing LTBS with no

transformation?

Nick

Leonid Gibiansky wrote:

*> Neil
*

*> Large RSE, inability to converge, failure of the covariance step are
*

*> often caused by the over-parametrization of the model. If you already
*

*> have bootstrap, look at the scatter-plot matrix of parameters versus
*

*> parameters (THATA1 vs THETA2, .., THETA1 vs OMEGA1, ...), these are
*

*> very informative plots. If you have over-parametrization on the
*

*> population level, it will be seen in these plots as strong
*

*> correlations of the parameter estimates.
*

*>
*

*> Also, look on plots of ETAs vs ETAs. If you see strong correlation
*

*> (close to 1) there, it may indicate over-parametrization on the
*

*> individual level (too many ETAs in the model).
*

*>
*

*> For random effect with a very large RSE on the variance, I would try
*

*> to remove it and see what happens with the model: often, this (high
*

*> RSE) is the indication that the error effect is not needed.
*

*>
*

*> Also, try combined error model (on log-transformed variables):
*

*>
*

*> W1=SQRT(THETA(...)/IPRED**2+THETA(...))
*

*> Y = LOG(IPRED) + W1*EPS(1)
*

*>
*

*>
*

*> $SIGMA
*

*> 1 FIXED
*

*>
*

*>
*

*> Why concentrations were on LOQ? Was it because BQLs were inserted as
*

*> LOQ? Then this is not a good idea.
*

*> Thanks
*

*> Leonid
*

*>
*

*>
*

*> --------------------------------------
*

*> Leonid Gibiansky, Ph.D.
*

*> President, QuantPharm LLC
*

*> web: www.quantpharm.com
*

*> e-mail: LGibiansky at quantpharm.com
*

*> tel: (301) 767 5566
*

*>
*

*>
*

*>
*

*>
*

*> Indranil Bhattacharya wrote:
*

*>> Hi Joachim, thanks for your suggestions/comments.
*

*>>
*

*>> When using LTBS I had used a different error model and the error
*

*>> block is shown below
*

*>> $ERROR
*

*>> IPRED = -5
*

*>> IF (F.GT.0) IPRED = LOG(F) ;log transforming predicition
*

*>> IRES=DV-IPRED
*

*>> W=1
*

*>> IWRES=IRES/W ;Uniform Weighting
*

*>> Y = IPRED + ERR(1)
*

*>>
*

*>> I also performed bootsrap on both LTBS and non-LTBS models and the
*

*>> non-LTBS CI were much more tighter and the precision was greater than =
*

*>> non-LTBS.
*

*>> I think the problem plausibly is with the fact that when fitting the
*

*>> non-transformed data I have used the proportional + additive model
*

*>> while using LTBS the exponential model (which converts to additional
*

*>> model due to LTBS) was used. The extra additive component also may be =
*

*>> more important in the non-LTBS model as for some subjects the
*

*>> concentrations were right on LOQ.
*

*>>
*

*>> I tried the dual error model for LTBS but does not provide a CV%. So
*

*>> I am currently running a bootstrap to get the CI when using the dual
*

*>> error model with LTBS.
*

*>>
*

*>> Neil
*

*>>
*

*>> On Fri, Aug 21, 2009 at 3:01 AM, Grevel, Joachim
*

*>> <Joachim.Grevel *

*>> <mailto:Joachim.Grevel *

*>>
*

*>> Hi Neil,
*

*>> 1. When data are log-transformed the $ERROR block has to
*

*>> change:
*

*>> additive error becomes true exponential error which cannot be
*

*>> achieved without log-transformation (Nick, correct me if I am
*

*>> wrong).
*

*>> 2. Error cannot "go away". You claim your structural model
*

*>> (THs)
*

*>> remained unchanged. Therefore the "amount" of error will remain =
*

the

*>> same as well. If you reduce BSV you may have to "pay" for it with
*

*>> increased residual variability.
*

*>> 3. Confidence intervals of ETAs based on standard errors
*

*>> produced
*

*>> during the covariance step are unreliable (many threads in =
*

NMusers).

*>> Do bootstrap to obtain more reliable C.I..
*

*>> These are my five cents worth of thought in the early =
*

morning,

*>> Good luck,
*

*>> Joachim
*

*>>
*

*>>
*

*>> =
*

------------------------------------------------------------------------

*>>
*

*>> AstraZeneca UK Limited is a company incorporated in England and
*

*>> Wales with registered number: 03674842 and a registered office at =
*

15

*>> Stanhope Gate, London W1K 1LN.
*

*>>
*

*>> *Confidentiality Notice: *This message is private and may contain
*

*>> confidential, proprietary and legally privileged information. If =
*

you

*>> have received this message in error, please notify us and remove =
*

it

*>> from your system and note that you must not copy, distribute or =
*

take

*>> any action in reliance on it. Any unauthorised use or disclosure =
*

of

*>> the contents of this message is not permitted and may be =
*

unlawful.

*>>
*

*>> *Disclaimer:* Email messages may be subject to delays, =
*

interception,

*>> non-delivery and unauthorised alterations. Therefore, information
*

*>> expressed in this message is not given or endorsed by AstraZeneca =
*

UK

*>> Limited unless otherwise notified by an authorised representative
*

*>> independent of this message. No contractual relationship is =
*

created

*>> by this message by any person unless specifically indicated by
*

*>> agreement in writing other than email.
*

*>>
*

*>> *Monitoring: *AstraZeneca UK Limited may monitor email traffic =
*

data

*>> and content for the purposes of the prevention and detection of
*

*>> crime, ensuring the security of our computer systems and checking
*

*>> compliance with our Code of Conduct and policies.
*

*>>
*

*>> -----Original Message-----
*

*>>
*

*>>
*

*>> *From:* owner-nmusers *

*>> <mailto:owner-nmusers *

*>> [mailto:owner-nmusers *

*>> <mailto:owner-nmusers *

*>> Bhattacharya
*

*>> *Sent:* 20 August 2009 17:07
*

*>> *To:* nmusers *

*>> *Subject:* [NMusers] Linear VS LTBS
*

*>>
*

*>> Hi, while data fitting using NONMEM on a regular PK data set
*

*>> and its log transformed version I made the following
*

*>> observations
*

*>> - PK parameters (thetas) were generally similar
*

*>> between
*

*>> regular and when using LTBS.
*

*>> -ETA on CL was similar
*

*>> -ETA on Vc was different between the two runs.
*

*>> - Sigma was higher in LTBS (51%) than linear (33%)
*

*>> Now using LTBS, I would have expected to see the
*

*>> ETAs unchanged
*

*>> or actually decrease and accordingly I observed that the eta
*

*>> values decreased showing less BSV. However the %RSE for ETA =
*

on

*>> VC changed from 40% (linear) to 350% (LTBS) and further the
*

*>> lower 95% CI bound has a negative number for ETA on Vc =
*

(-0.087).

*>> What would be the explanation behind the above
*

*>> observations
*

*>> regarding increased %RSE using LTBS and a negative lower =
*

bound

*>> for ETA on Vc? Can a negative lower bound in ETA be =
*

considered

*>> as zero?
*

*>> Also why would the residual vriability increase when using =
*

LTBS?

*>> Please note that the PK is multiexponential (may be
*

*>> this is
*

*>> responsible).
*

*>> Thanks.
*

*>> Neil
*

*>>
*

*>> -- Indranil Bhattacharya
*

*>>
*

*>>
*

*>>
*

*>>
*

*>> --
*

*>> Indranil Bhattacharya
*

*>>
*

--

Nick Holford, Professor Clinical Pharmacology

Dept Pharmacology & Clinical Pharmacology

University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New =

Zealand

n.holford

mobile: +64 21 46 23 53

http://www.fmhs.auckland.ac.nz/sms/pharmacology/holford

Received on Tue Aug 25 2009 - 18:28:24 EDT

Date: Tue, 25 Aug 2009 18:28:24 -0400

Sorry to back up a few days on this thread, but I could not

resist piling yet another comment on the traditional covariance

success/convergence debate.

From a purely algorithmic perspective, the relationship between =

convergence behavior

and success of the covariance step is extremely tenuous.

NONMEM uses a version of a BFGS quasi Newton method to drive the ELS =

obective function

optimization. At any given stage, the descent direction is of the form

-(H_BFGS)**-1 * g, where g is the gradient of the objective function

and H_BFGS is a positive definite matrix

that captures accumulated curvature information recovered from the =

entire sequence

of previously evaluated gradients at various points. Note that the =

'true'

Newton direction is -H**-1 * g, where H is the Hessian evaluated at the =

current point.

H also captures localized (to the current point) curvature information,

but is difficult and expensive to evaluate, so BFGS methods use the much =

more easily

computed quasi-Newton direction. With accurate gradients, the =

quasi-Newton direction

is necessarily a descent direction (since H_BFGS is guaranteed to be =

positive definite),

while the true Newton direction need not be. Note that H and H_BFGS

do not bear any necessary relation to each other, although H_BFGS is =

often thought

of as a surrogate or approximant for H.

At convergence (usually recognized by the algorithm as the gradient =

becoming

'sufficiently' small in magnitude), the standard errors in principle are =

computed from the square roots of the diagonal elements of H**(-1), =

assuming H

is positive definite (which is almost always the case if at least a

local minimum has been found - there are some degenerate exceptions, =

called

non-Morse points where the Hessian is only positive semidefinite at

the local optimum, but these are rarely encountered).

But in practice the true Hessian H is hard to compute,

and NM uses a numerical approximation to the Hessian

(I believe forward difference, meaning for example in one dimension

the second derivative

at x is approximated by [f(x+2*eps) - 2f(x+eps) + f(x )}/eps^2. Even =

if an optimal

step size eps is used and f can be computed to

full 15 digit double precision accuracy (which is way too optimistic if =

the model

is defined by ODEs which are solved numerically), the best you can do =

numerically

with forward differences is about 5 significant

digits of accuracy of agreement of the numerical hessian with the

true Hessian . Central differences, which are much more expensive, do a =

little bit better.

But for the hessian inverse, if the numerical hessian or the true =

Hessian has a condition

number greater than 10^5 (a condition number of 10^5 is actually quite =

benign for most

matrix computations), then there

are no siginificant digits of agreement of the std errors computed from

the numerical hessian and the standard errors computed from the true

hessian. Moreover, in this case a perfectly respectable

positive definite matrix H can easily become indefinite when =

approximated numerically,

which will cause the covariance step to fail.

Thus failure of the covariance step means only that the numerical =

Hessian at

the converged point is not positive definite, but very little can be =

concluded

from this regarding the true H due to the inherent numerical =

difficulties

and imprecison in numerical

Hessian computations. Moreover, the numerical Hessian plays no role =

whatever

in the convergence of the algorithm. Conversely, success of the =

covariance step

simply means the numerical Hessian is positive definite.

What is well known is that most gradient based (such as simple gradient =

descent,

newton, quasi-Newton, and conjugate gradient) unconstrained

numerical optimization methods typically work best

(speed ,accuracy, and reliability) when the true Hessian H in the =

vicinity of the optimum

has a fairly narrow range of eigenvalues - e.g. when the condition =

number

max eigenvalue/ min eigenvalue is relatively small. (In the case of =

conjugate gradient

methods, it is possible to obtain theoretical bounds on rates of =

convergence

in terms of condition number with the rates being inversely related to

the condition number).

Of course, a good condition number near the optimum

but not near the starting point does not

preclude convergence problems occuring away from the neighborhood

of the optimum.

But all other things being equal, well conditioned true Hessians (small =

condition numbers) in

the neighborhood of the optimum are generally favorable for convergence

behavior. In terms of parameterization,

each additional parameter usually increases the condition number at the =

optimum

(this can be shown to be strictly true in the case of ordinary linear

regression). So in this sense, overparameterization 'typically'

adversely affects convergence behavior. But unfortunately it is not =

clear how

to turn this into a practical numerical overparametization criterion, =

particularly

given the inherent unreliability of information derived from numerical =

Hessians.

Robert H. Leary, PhD

Fellow

Pharsight - A Certara(tm) Company

5625 Dillard Dr., Suite 205

Cary, NC 27511

Phone/Voice Mail: (919) 852-4625, Fax: (919) 859-6871

Email: bleary

the intended recipient and may contain confidential and proprietary =

information. Any disclosure or distribution to third parties that is =

not specifically authorized by the sender is prohibited. If you are not =

the intended recipient, please contact the sender by reply email and =

destroy all copies of the original message.

-----Original Message-----

From: owner-nmusers

[mailto:owner-nmusers

Sent: Friday, August 21, 2009 16:44 PM

To: nmusers

Subject: Re: [NMusers] Linear VS LTBS

Leonid,

You are once again ignoring the actual evidence that NONMEM VI will fail =

to converge or not complete the covariance step more or less at random.

If you bootstrap simulated data in which the model is known and not

overparameterised it has been shown repeatedly that NONMEM VI will

sometimes converge and do the covariance step and sometimes fail to

converge.

Of course, I agree that overparameterisation could be a cause of

convergence problems but I would not agree that this is often the =

reason.

Bob Bauer has made efforts in NONMEM 7 to try to fix the random

termination behaviour and covariance step problems by providing

additional control over numerical tolerances. It remains to be seen by

direct experiment if NONMEM 7 is indeed less random than NONMEM VI.

BTW in this discussion about LTBS I think it is important to point out

that the only systematic study I know of comparing LTBS with

untransformed models was the one you reported at the 2008 PAGE meeting

(www.page-meeting.org/?abstract=1268). My understanding of your =

results

was that there was no clear advantage of LTBS if INTER was used with

non-transformed data:

"Models with exponential residual error presented in the log-transformed =

variables

performed similar to the ones fitted in original variables with INTER

option. For problems with

residual variability exceeding 40%, use of INTER option or

log-transformation was necessary to

obtain unbiased estimates of inter- and intra-subject variability."

Do you know of any other systematic studies comparing LTBS with no

transformation?

Nick

Leonid Gibiansky wrote:

the

NMusers).

morning,

------------------------------------------------------------------------

15

you

it

take

of

unlawful.

interception,

UK

created

data

on

(-0.087).

bound

considered

LTBS?

--

Nick Holford, Professor Clinical Pharmacology

Dept Pharmacology & Clinical Pharmacology

University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New =

Zealand

n.holford

mobile: +64 21 46 23 53

http://www.fmhs.auckland.ac.nz/sms/pharmacology/holford

Received on Tue Aug 25 2009 - 18:28:24 EDT