From: Matt Hutmacher <*matt.hutmacher*>

Date: Thu, 11 Dec 2008 10:45:23 -0500

Yaning,

Perhaps I was not clear in my email. I should have stated it more

explicitly in the following;

For the normal density case then application of the Laplace approximation

yields

-2LL = (y-f(eta))'SIG^-1(y-f(eta)+eta'*OM^-1*eta+log|SIG|

Where y are the data, f is the mean function, eta is the subject specific

random variable, SIG is the intrasubject residual variance, OM is the

between subject variance of the etas. If SIG depends on eta, then the

extended least squares form, ie

-2LL =( y-f(etahat)-G*etahat)'MSIG^-1(y-f(etahat)-Getahat)+log(MSIG)

Where MSIG=G*OM^-1*G+SIG no longer represents a Laplace based approximation

to the marginal distribution of y. Now it can be approximately Laplacian

based by various procedures, but it is not Laplacian based anymore.

See Page 345 of Vonesh. Note that Wolfinger shows this derivation.

Matt

From: owner-nmusers

Behalf Of Wang, Yaning

Sent: Wednesday, December 10, 2008 8:45 PM

To: Matt Hutmacher; Bob Leary; ayyappa.5.chaturvedula

owner-nmusers

Subject: RE: [NMusers] OFV higher with FOCEI than FO

Matt:

That's not true. Those two references are discussing when the linearized

structure model can also be derived from direct Laplacian approximation of

the marginal likelihood. When there is an interaction between residual and

between subject variability (or residual error model contain

subject-specific random effect), linearizing the structure model around

eta_hat cannot be derived from the Laplacian approximation any more. But in

NONMEM, FOCE with interaction (when residual error model contain

subject-specific random effect) is still derived from Laplacian

approximation. In other words, NONMEM does not linearize the structure model

for FOCE with interaction case. I discussed this in details in my paper (1).

Adding the following splus code to the splus code in my paper and using the

simple numerical example, you can see how NONMEM is calculating the

objective function for FOCE with interaction. These things are further

visualized in my talk recently put on ACCP webpage

(http://www.accp1.org/pharmacometrics/PopPKCourse.html).

Yaning

#reproduce NONMEM result using my equation 28 which is further approximation

of Laplacian method

sum<-0

for (i in 1:10) {

data1<-data[data$ID==i,]

cov<-data1$fp%*%t(data1$fp)*omega+diag(data1$f**2)*eps+2*data1$fp%*%t(data1$

fp)*omega*eps

cov1<-diag(data1$f**2)*eps

ginv<-solve(cov1)

sec<-t(data1$DV-data1$IPRE)%*%ginv%*%(data1$DV-data1$IPRE)+data1$ETA1[1]**2/

omega

frs<-determinant(cov, logarithm=T)$modulus[[1]]

sum1<-sec+frs

sum<-sum+sum1

}

sum#39.45756 same as NONMEM OFV 39.458

1. Yaning Wang. Derivation of various NONMEM estimation methods. Journal of

Pharmacokinetics and pharmacodynamics. 34:575-93 (2007)

Yaning Wang, Ph.D.

Team Leader, Pharmacometrics

Office of Clinical Pharmacology

Office of Translational Science

Center for Drug Evaluation and Research

U.S. Food and Drug Administration

Phone: 301-796-1624

Email: yaning.wang

"The contents of this message are mine personally and do not necessarily

reflect any position of the Government or the Food and Drug Administration."

_____

From: owner-nmusers

Behalf Of Matt Hutmacher

Sent: Wednesday, December 10, 2008 2:04 PM

To: 'Bob Leary'; ayyappa.5.chaturvedula

owner-nmusers

Subject: RE: [NMusers] OFV higher with FOCEI than FO

Hi Bob,

I would just add one point of clarification. My understanding is that the

FOCE approximate is a Laplace-based approximation (related to it) only if

the within subject residual error model does not contain any

subject-specific random effects.

Wolfinger R (1993). Laplace's approximation for nonlinear mixed models.

Biometrika 80, 791-795.

Vonesh ER, Chinchilli VM (1997). Linear and nonlinear models for the

analysis of repeated measurements. Marcel Dekker.

Matt

From: owner-nmusers

Behalf Of Bob Leary

Sent: Wednesday, December 10, 2008 12:11 PM

To: ayyappa.5.chaturvedula

nmusers

Subject: RE: [NMusers] OFV higher with FOCEI than FO

As shown by X. Wang, FO, FOCE and LAPLACE form a hierarchy of

approximations.

Both the FO and FOCE methods are based on the same underlying Laplacian

approximation to the

integral of the joint likelihood function of the random effects (eta's).

The basic Laplace approximation requires knowledge of

the value of the joint likelihood function at its peak, and the second

derivatives at the

eta values at which the peak is reached.

The FOCE method adds 1 additional approximation to get the

Hessian matrix of second derivatives at the peak of the joint likelihood

function

from first derivatives, but accurately

determines the position of the peak (the empirical Bayes estimates)

in random effects (eta) space

and the function value at the peak (this determination of the EBE's is

what the 'conditional step'

is all about and is computationally costly.)

Although the underlying Laplacian approximation is based on the local

behavior of the

joint log likelihood function in the neighborhood of its peak, FO does not

investigate the behavior

of the joint likelihood function near its peak at all (which is basically

why FO estimates can be arbitrarily

poor). Instead it guestimates the value at the peak by extrapolating from

eta=0, using a single Newton step

based on approximate first and second derivatives at eta=0. It also simply

assigns the FOCE

approximate values of the

second derivatives at eta=0 to the values at the peak in order to evaluate

the Laplacian approximation.

These additional approximations layered on top of the basic Laplacian and

FOCE approximations

by FO are quite dubious for significantly nonlinear model functions, and

often result in very poor quality

parameter estimates compared to FOCE and Laplace.

Strictly speaking. FOCE and FO objective values cannot be compared in any

consistently meaningful sense.

But loosely speaking, since both FO and FOCE share a common base Laplacian

approximation, but FO layers

on additional approximations on top of FOCE, the difference in FO vs FOCE

objective values reflects the

effects of the additional FO approximations. Large differences may suggest

that the additional FO approximations

have large effects, and make the FO estimates even more suspect relative to

FOCE.

Robert H. Leary, PhD

Principal Software Engineer

Pharsight Corp.

5520 Dillard Dr., Suite 210

Cary, NC 27511

Phone/Voice Mail: (919) 852-4625, Fax: (919) 859-6871

This email message (including any attachments) is for the sole use of the

intended recipient and may contain confidential and proprietary

information. Any disclosure or distribution to third parties that is not

specifically authorized by the sender is prohibited. If you are not the

intended recipient, please contact the sender by reply email and destroy all

copies of the original message.

-----Original Message-----

From: owner-nmusers

Behalf Of ayyappa.5.chaturvedula

Sent: Wednesday, December 10, 2008 9:40 AM

To: owner-nmusers

Subject: [NMusers] OFV higher with FOCEI than FO

Dear All,

I am analyzing a data set pooled from 4 clinical studies with rich sampling.

When I fit a 2 comp oral absorption model with lag time using FO, I got

successful minimization with COV step, but minimization was not successful

when I used FO parameter estimates as initial estimates for FOCE run. When

I used FOCE with INTER minimization was successful with COV step but the OFV

is much higher (~25000 vs 20000) with FOCEI estimation than FO. The

parameter estimates make more sense with FOCEI than FO. My questions are,

Can we get something like this or I am missing something here?

Can we compare OFV between different estimation methods (my understanding is

no and OFV in case of FO does not make a lot of sense)?

Regards,

Ayyappa Chaturvedula

GlaxoSmithKline

1500 Littleton Road,

Parsippany, NJ 07054

Ph:9738892200

Received on Thu Dec 11 2008 - 10:45:23 EST

Date: Thu, 11 Dec 2008 10:45:23 -0500

Yaning,

Perhaps I was not clear in my email. I should have stated it more

explicitly in the following;

For the normal density case then application of the Laplace approximation

yields

-2LL = (y-f(eta))'SIG^-1(y-f(eta)+eta'*OM^-1*eta+log|SIG|

Where y are the data, f is the mean function, eta is the subject specific

random variable, SIG is the intrasubject residual variance, OM is the

between subject variance of the etas. If SIG depends on eta, then the

extended least squares form, ie

-2LL =( y-f(etahat)-G*etahat)'MSIG^-1(y-f(etahat)-Getahat)+log(MSIG)

Where MSIG=G*OM^-1*G+SIG no longer represents a Laplace based approximation

to the marginal distribution of y. Now it can be approximately Laplacian

based by various procedures, but it is not Laplacian based anymore.

See Page 345 of Vonesh. Note that Wolfinger shows this derivation.

Matt

From: owner-nmusers

Behalf Of Wang, Yaning

Sent: Wednesday, December 10, 2008 8:45 PM

To: Matt Hutmacher; Bob Leary; ayyappa.5.chaturvedula

owner-nmusers

Subject: RE: [NMusers] OFV higher with FOCEI than FO

Matt:

That's not true. Those two references are discussing when the linearized

structure model can also be derived from direct Laplacian approximation of

the marginal likelihood. When there is an interaction between residual and

between subject variability (or residual error model contain

subject-specific random effect), linearizing the structure model around

eta_hat cannot be derived from the Laplacian approximation any more. But in

NONMEM, FOCE with interaction (when residual error model contain

subject-specific random effect) is still derived from Laplacian

approximation. In other words, NONMEM does not linearize the structure model

for FOCE with interaction case. I discussed this in details in my paper (1).

Adding the following splus code to the splus code in my paper and using the

simple numerical example, you can see how NONMEM is calculating the

objective function for FOCE with interaction. These things are further

visualized in my talk recently put on ACCP webpage

(http://www.accp1.org/pharmacometrics/PopPKCourse.html).

Yaning

#reproduce NONMEM result using my equation 28 which is further approximation

of Laplacian method

sum<-0

for (i in 1:10) {

data1<-data[data$ID==i,]

cov<-data1$fp%*%t(data1$fp)*omega+diag(data1$f**2)*eps+2*data1$fp%*%t(data1$

fp)*omega*eps

cov1<-diag(data1$f**2)*eps

ginv<-solve(cov1)

sec<-t(data1$DV-data1$IPRE)%*%ginv%*%(data1$DV-data1$IPRE)+data1$ETA1[1]**2/

omega

frs<-determinant(cov, logarithm=T)$modulus[[1]]

sum1<-sec+frs

sum<-sum+sum1

}

sum#39.45756 same as NONMEM OFV 39.458

1. Yaning Wang. Derivation of various NONMEM estimation methods. Journal of

Pharmacokinetics and pharmacodynamics. 34:575-93 (2007)

Yaning Wang, Ph.D.

Team Leader, Pharmacometrics

Office of Clinical Pharmacology

Office of Translational Science

Center for Drug Evaluation and Research

U.S. Food and Drug Administration

Phone: 301-796-1624

Email: yaning.wang

"The contents of this message are mine personally and do not necessarily

reflect any position of the Government or the Food and Drug Administration."

_____

From: owner-nmusers

Behalf Of Matt Hutmacher

Sent: Wednesday, December 10, 2008 2:04 PM

To: 'Bob Leary'; ayyappa.5.chaturvedula

owner-nmusers

Subject: RE: [NMusers] OFV higher with FOCEI than FO

Hi Bob,

I would just add one point of clarification. My understanding is that the

FOCE approximate is a Laplace-based approximation (related to it) only if

the within subject residual error model does not contain any

subject-specific random effects.

Wolfinger R (1993). Laplace's approximation for nonlinear mixed models.

Biometrika 80, 791-795.

Vonesh ER, Chinchilli VM (1997). Linear and nonlinear models for the

analysis of repeated measurements. Marcel Dekker.

Matt

From: owner-nmusers

Behalf Of Bob Leary

Sent: Wednesday, December 10, 2008 12:11 PM

To: ayyappa.5.chaturvedula

nmusers

Subject: RE: [NMusers] OFV higher with FOCEI than FO

As shown by X. Wang, FO, FOCE and LAPLACE form a hierarchy of

approximations.

Both the FO and FOCE methods are based on the same underlying Laplacian

approximation to the

integral of the joint likelihood function of the random effects (eta's).

The basic Laplace approximation requires knowledge of

the value of the joint likelihood function at its peak, and the second

derivatives at the

eta values at which the peak is reached.

The FOCE method adds 1 additional approximation to get the

Hessian matrix of second derivatives at the peak of the joint likelihood

function

from first derivatives, but accurately

determines the position of the peak (the empirical Bayes estimates)

in random effects (eta) space

and the function value at the peak (this determination of the EBE's is

what the 'conditional step'

is all about and is computationally costly.)

Although the underlying Laplacian approximation is based on the local

behavior of the

joint log likelihood function in the neighborhood of its peak, FO does not

investigate the behavior

of the joint likelihood function near its peak at all (which is basically

why FO estimates can be arbitrarily

poor). Instead it guestimates the value at the peak by extrapolating from

eta=0, using a single Newton step

based on approximate first and second derivatives at eta=0. It also simply

assigns the FOCE

approximate values of the

second derivatives at eta=0 to the values at the peak in order to evaluate

the Laplacian approximation.

These additional approximations layered on top of the basic Laplacian and

FOCE approximations

by FO are quite dubious for significantly nonlinear model functions, and

often result in very poor quality

parameter estimates compared to FOCE and Laplace.

Strictly speaking. FOCE and FO objective values cannot be compared in any

consistently meaningful sense.

But loosely speaking, since both FO and FOCE share a common base Laplacian

approximation, but FO layers

on additional approximations on top of FOCE, the difference in FO vs FOCE

objective values reflects the

effects of the additional FO approximations. Large differences may suggest

that the additional FO approximations

have large effects, and make the FO estimates even more suspect relative to

FOCE.

Robert H. Leary, PhD

Principal Software Engineer

Pharsight Corp.

5520 Dillard Dr., Suite 210

Cary, NC 27511

Phone/Voice Mail: (919) 852-4625, Fax: (919) 859-6871

This email message (including any attachments) is for the sole use of the

intended recipient and may contain confidential and proprietary

information. Any disclosure or distribution to third parties that is not

specifically authorized by the sender is prohibited. If you are not the

intended recipient, please contact the sender by reply email and destroy all

copies of the original message.

-----Original Message-----

From: owner-nmusers

Behalf Of ayyappa.5.chaturvedula

Sent: Wednesday, December 10, 2008 9:40 AM

To: owner-nmusers

Subject: [NMusers] OFV higher with FOCEI than FO

Dear All,

I am analyzing a data set pooled from 4 clinical studies with rich sampling.

When I fit a 2 comp oral absorption model with lag time using FO, I got

successful minimization with COV step, but minimization was not successful

when I used FO parameter estimates as initial estimates for FOCE run. When

I used FOCE with INTER minimization was successful with COV step but the OFV

is much higher (~25000 vs 20000) with FOCEI estimation than FO. The

parameter estimates make more sense with FOCEI than FO. My questions are,

Can we get something like this or I am missing something here?

Can we compare OFV between different estimation methods (my understanding is

no and OFV in case of FO does not make a lot of sense)?

Regards,

Ayyappa Chaturvedula

GlaxoSmithKline

1500 Littleton Road,

Parsippany, NJ 07054

Ph:9738892200

Received on Thu Dec 11 2008 - 10:45:23 EST