From: Matt Hutmacher <*matt.hutmacher*>

Date: Mon, 17 Nov 2008 16:27:24 -0500

Hello all,

This is my understanding of the issue on the influence of non-normal etas

and the quality of the prediction of these, such as measured through

shrinkage, on the estimation of model parameters in NONMEM.

NONMEM uses the extended least squares (ELS) procedure to estimate the fixed

(thetas) and variance components of the random effects (omegas) jointly. If

the data between individuals are independent and normally distributed and

the etas enter the model linearly then ELS estimation is equivalent to

maximum likelihood estimation. This implies the estimates should be

consistent, asymptotically normally distributed, and asymptotically

efficient (small standard errors) - based on sufficient sample size. If the

data are not normally distributed, the ELS estimates are still consistent

and asymptotically normally distributed, but lose some efficiency

(relatively larger standard errors of the estimates). Estimation is no

longer considered maximum likelihood estimation. It is my understanding

that Sheiner and Beal coined the term extended least squares because of the

nice property for non-normal data - that the estimates are still consistent

(unbiased for large samples sizes). So for models linear in the etas, it is

true that the distribution of population residuals (based on the etas and

epsilons) should not adversely affect estimation as long as the mean and

marginal (or population) variance (based on the omegas and sigmas) are

correctly specified. The sandwich estimator of the standard errors,

referred to in NONMEM as R^-1*S*R^-1 ("^" is the exponential operator)

ensures consistent estimates of the standard errors for thetas, omegas, and

epsilons in the case of non-normal data (essentially it is robust to the

distributional assumptions of the data).

When the etas enter the model nonlinearly, things get more complicated in

NONMEM. The etas are assumed to be normally distributed to facilitate a

convenient approximation (FO or the Laplacian based FOCE, etc) to the

marginal likelihood. This approximation appears as a multivariate normal

distribution. This allows the use of ELS to estimate the parameters. Thus,

the assumption of normality of the etas directly relates to the

approximation implemented to estimate the parameters. What happens if the

eta's are not normal is not directly clear with respect to the approximation

and hence the estimates. Also, if the distribution of the etas is markedly

skewed, the interpretation of the model prediction with etas=0 as the

"typical individual" model prediction is probably no longer appropriate.

This is because the prediction at eta=0 is no longer at the most likely eta

value (0 is most likely in symmetric distributions). These things are

avoided in the linear case above because the linear model is parameterized

directly with respect to the population mean, so the thetas in that model

are already interpretable with respect to the population of the data despite

the lack of normality.

So, when the etas are nonlinear in the model, the FO or FOCE model is now an

approximate model in that the means and marginal variances are only

approximately correct. For FOCE, how close these are to 'correct' depends

upon how good the etas are predicted, which is a function of the amount of

data within an individual (i.e. data quality), and because the theta's and

omega's apply to all individuals, criteria for sufficient data within all

individuals also need be met. This result is related to the FOCE versus FO

issue with bias (as data become sparse FOCE approaches FO - perhaps

quantified conveniently by shrinkage) and the WRES versus CWRES for

residuals (CWRES go to WRES as data become sparse). But as stated, we are

fitting the likelihood to an approximate model and so bias can result

because of the approximation, ie the (approximately) incorrect mean and

variance (which is a function of the etas and the distributional assumption

of these). This relates to the quality of the etas.

However, we should not get too depressed because the Laplace method works

well quite often for estimation, and in my experience, the first place it

tends to fail is with respect to estimating the omegas (compared to better

approximations like adaptive Gaussian quadrature).

References:

NONMEM Users Guides.

LB Sheiner, SL BeaL. Pharmacokinetic parameter estimates from several least

squares procedures: superiority of extended least squares. J Pharmacokinet

Biopharm. 1985 Apr;13(2):185-201.

CC Peck, SL Beal, LB Sheiner AI Nichols. Extended least squares nonlinear

regression: A possible solution to the "choice of weights" problem in

analysis of individual pharmacokinetic data , JPP Volume 12, Number 5 /

October, 1984

SL Beal. Commentary on Significance Levels for Covariate Effects in NONMEM

JPP Volume 29, Number 4 / August, 2002

Vonesh and chinchilli. Linear and Nonlinear models for the analysis of

repeated measurements. Marcel Dekker.

EF Vonesh. A note on the use of Laplace's approximation for nonlinear

mixed-effects models. Biometrika, June 1996; 83: 447 - 452.

-----Original Message-----

From: owner-nmusers

Behalf Of Ribbing, Jakob

Sent: Thursday, November 13, 2008 6:28 PM

To: nmusers

Cc: BAE, KYUN-SEOP; XIA LI

Subject: RE: [NMusers] Very small P-Value for ETABAR

Dear all,

First of all, I am not sure that there is any assumption of etas having

a normal distribution when estimating a parametric model in NONMEM. The

variance of eta (OMEGA) does not carry an assumption of normality. I

believe that Stuart used to say the assumption of normality is only when

simulating. I guess the assumption also affects EBE:s unless the

individual information is completely dominating? If the assumption of

normality is wrong, the weighting of information may not be optimal, but

as long as the true distribution is symmetric the estimated parameters

are in principle correct (but again, the model may not be suitable for

simulation if the distributional assumption was wrong). I will be off

line for a few days, but I am sure somebody will correct me if I am

wrong about this.

If etas are shrunk, you can not expect a normal distribution of that

(EBE) eta. That does not invalidate parameterization/distributional

assumptions. Trying other semi-parametric distributions or a

non-parametric distribution (or a mixture model) may give more

confidence in sticking with the original parameterization or else reject

it as inadequate. In the end, you may feel confident about the model

even if the EBE eta distribution is asymmetric and biased (I mentioned

two examples in my earlier posting).

Connecting to how PsN may help in this case: http://psn.sourceforge.net/

In practice to evaluate shrinkage, you would simply give the command

(assuming the model file is called run1.mod):

execute --shrinkage run1.mod

Another quick evaluation that can be made with this program is to

produce mirror plots (PsN links in nicely with Xpose for producing the

diagnostic plots):

execute --mirror=3 run1.mod

This will give you three simulation table files that have been derived

by simulating under the model and then fitting the simulated data using

the same model (using the design of the original data). If you see a

similar pattern in the mirror plots as in the original diagnostic plots,

this gives you more confidence in the model. That brings us back to

Leonids point about it being more useful to look at diagnostic plots

than eta bar.

Wishing you a great weekend!

Jakob

-----Original Message-----

From: BAE, KYUN-SEOP

Sent: 13 November 2008 22:05

To: Ribbing, Jakob; XIA LI; nmusers

Subject: RE: [NMusers] Very small P-Value for ETABAR

Dear All,

Realized etas (EBEs, MAPs) is estimated under the assumption of normal

distribution.

However, the resultant distribution of EBEs may not be normal or mean of

them may not be 0.

To pass t-test, one may use "CENTERING" option at $ESTIMATION.

But, this practice is discouraged by some (and I agree).

Normal assumption cannot coerce the distribution of EBE to be normal,

and furthermore non-normal (and/or not-zero-mean) distribution of EBE

can be nature's nature.

One simple example is mixture population with polymorphism.

If I could not get normal(?) EBEs even after careful examination of

covariate relationships as others suggested,

I would bear it and assume nonparametric distribution.

Regards,

Kyun-Seop

=====================

Kyun-Seop Bae MD PhD

Email: kyun-seop.bae

-----Original Message-----

From: owner-nmusers

On Behalf Of Ribbing, Jakob

Sent: Thursday, November 13, 2008 13:19

To: XIA LI; nmusers

Subject: RE: [NMusers] Very small P-Value for ETABAR

Hi Xia,

Just to clarify one thing (I agree with almost everything you said):

The p-value indeed is related to the test of ETABAR=0. However, this is

not a test of normality, only a test that may reject the mean of the

etas being zero (H0). Therefore, shrinkage per se does not lead to

rejection of HO, as long as both tails of the eta distribution are

shrunk to a similar degree.

I agree with the assumption of normality. This comes into play when you

simulate from the model and if you got the distribution of individual

parameters wrong, simulations may not reflect even the data used to fit

the model.

Best Regards

Jakob

-----Original Message-----

From: owner-nmusers

On Behalf Of XIA LI

Sent: 13 November 2008 20:31

To: nmusers

Subject: Re: [NMusers] Very small P-Value for ETABAR

Dear All,

Just some quick statistical points...

P value is usually associated with hypothesis test. As far as I know,

NONMEM assume normal distribution for ETA, ETA~N(0,omega), which means

the null hypothesis to test is H0: ETABAR=0. A small P value indicates a

significant test. You reject the null hypothesis.

More...

As we all know, ETA is used to capture the variation among individual

parameters and model's unexplained error. We usually use the function

(or model) parameter=typical value*exp (ETA), which leads to a lognormal

distribution assumption for all fixed effect parameters (i.e., CL, V,

Ka, Ke...).

By some statistical theory, the variation of individual parameter equals

a function of the typical value and the variance of ETA.

VAR (CL) = typical value*exp (omega/2). NO MATH PLS!!

If your typical value captures all overall patterns among patients

clearance, then ETA will have a nice symmetric normal distribution with

small variance. Otherwise, you leave too many patterns to ETA and will

see some deviation or shrinkage (whatever you call).

Why adding covariates is a good way to deal with this situation? You

model become CL=typical value*exp (covariate)*exp (ETA). The variation

of individual parameter will be changed to:

VAR (CL) = (typical value + covariate)*exp (omega/2)).

You have one more item to capture the overall patterns, less leave to

ETA. So a 'good' covariate will reduce both the magnitude of omega and

ETA's deviation from normal.

Understanding this is also useful when you are modeling BOV studies.

When you see variation of PK parameters decrease with time (or

occasions). Adding a covariate that make physiological sense and also

decrease with time may help your modeling.

Best,

Xia

======================================

Xia Li

Mathematical Science Department

University of Cincinnati

Received on Mon Nov 17 2008 - 16:27:24 EST

Date: Mon, 17 Nov 2008 16:27:24 -0500

Hello all,

This is my understanding of the issue on the influence of non-normal etas

and the quality of the prediction of these, such as measured through

shrinkage, on the estimation of model parameters in NONMEM.

NONMEM uses the extended least squares (ELS) procedure to estimate the fixed

(thetas) and variance components of the random effects (omegas) jointly. If

the data between individuals are independent and normally distributed and

the etas enter the model linearly then ELS estimation is equivalent to

maximum likelihood estimation. This implies the estimates should be

consistent, asymptotically normally distributed, and asymptotically

efficient (small standard errors) - based on sufficient sample size. If the

data are not normally distributed, the ELS estimates are still consistent

and asymptotically normally distributed, but lose some efficiency

(relatively larger standard errors of the estimates). Estimation is no

longer considered maximum likelihood estimation. It is my understanding

that Sheiner and Beal coined the term extended least squares because of the

nice property for non-normal data - that the estimates are still consistent

(unbiased for large samples sizes). So for models linear in the etas, it is

true that the distribution of population residuals (based on the etas and

epsilons) should not adversely affect estimation as long as the mean and

marginal (or population) variance (based on the omegas and sigmas) are

correctly specified. The sandwich estimator of the standard errors,

referred to in NONMEM as R^-1*S*R^-1 ("^" is the exponential operator)

ensures consistent estimates of the standard errors for thetas, omegas, and

epsilons in the case of non-normal data (essentially it is robust to the

distributional assumptions of the data).

When the etas enter the model nonlinearly, things get more complicated in

NONMEM. The etas are assumed to be normally distributed to facilitate a

convenient approximation (FO or the Laplacian based FOCE, etc) to the

marginal likelihood. This approximation appears as a multivariate normal

distribution. This allows the use of ELS to estimate the parameters. Thus,

the assumption of normality of the etas directly relates to the

approximation implemented to estimate the parameters. What happens if the

eta's are not normal is not directly clear with respect to the approximation

and hence the estimates. Also, if the distribution of the etas is markedly

skewed, the interpretation of the model prediction with etas=0 as the

"typical individual" model prediction is probably no longer appropriate.

This is because the prediction at eta=0 is no longer at the most likely eta

value (0 is most likely in symmetric distributions). These things are

avoided in the linear case above because the linear model is parameterized

directly with respect to the population mean, so the thetas in that model

are already interpretable with respect to the population of the data despite

the lack of normality.

So, when the etas are nonlinear in the model, the FO or FOCE model is now an

approximate model in that the means and marginal variances are only

approximately correct. For FOCE, how close these are to 'correct' depends

upon how good the etas are predicted, which is a function of the amount of

data within an individual (i.e. data quality), and because the theta's and

omega's apply to all individuals, criteria for sufficient data within all

individuals also need be met. This result is related to the FOCE versus FO

issue with bias (as data become sparse FOCE approaches FO - perhaps

quantified conveniently by shrinkage) and the WRES versus CWRES for

residuals (CWRES go to WRES as data become sparse). But as stated, we are

fitting the likelihood to an approximate model and so bias can result

because of the approximation, ie the (approximately) incorrect mean and

variance (which is a function of the etas and the distributional assumption

of these). This relates to the quality of the etas.

However, we should not get too depressed because the Laplace method works

well quite often for estimation, and in my experience, the first place it

tends to fail is with respect to estimating the omegas (compared to better

approximations like adaptive Gaussian quadrature).

References:

NONMEM Users Guides.

LB Sheiner, SL BeaL. Pharmacokinetic parameter estimates from several least

squares procedures: superiority of extended least squares. J Pharmacokinet

Biopharm. 1985 Apr;13(2):185-201.

CC Peck, SL Beal, LB Sheiner AI Nichols. Extended least squares nonlinear

regression: A possible solution to the "choice of weights" problem in

analysis of individual pharmacokinetic data , JPP Volume 12, Number 5 /

October, 1984

SL Beal. Commentary on Significance Levels for Covariate Effects in NONMEM

JPP Volume 29, Number 4 / August, 2002

Vonesh and chinchilli. Linear and Nonlinear models for the analysis of

repeated measurements. Marcel Dekker.

EF Vonesh. A note on the use of Laplace's approximation for nonlinear

mixed-effects models. Biometrika, June 1996; 83: 447 - 452.

-----Original Message-----

From: owner-nmusers

Behalf Of Ribbing, Jakob

Sent: Thursday, November 13, 2008 6:28 PM

To: nmusers

Cc: BAE, KYUN-SEOP; XIA LI

Subject: RE: [NMusers] Very small P-Value for ETABAR

Dear all,

First of all, I am not sure that there is any assumption of etas having

a normal distribution when estimating a parametric model in NONMEM. The

variance of eta (OMEGA) does not carry an assumption of normality. I

believe that Stuart used to say the assumption of normality is only when

simulating. I guess the assumption also affects EBE:s unless the

individual information is completely dominating? If the assumption of

normality is wrong, the weighting of information may not be optimal, but

as long as the true distribution is symmetric the estimated parameters

are in principle correct (but again, the model may not be suitable for

simulation if the distributional assumption was wrong). I will be off

line for a few days, but I am sure somebody will correct me if I am

wrong about this.

If etas are shrunk, you can not expect a normal distribution of that

(EBE) eta. That does not invalidate parameterization/distributional

assumptions. Trying other semi-parametric distributions or a

non-parametric distribution (or a mixture model) may give more

confidence in sticking with the original parameterization or else reject

it as inadequate. In the end, you may feel confident about the model

even if the EBE eta distribution is asymmetric and biased (I mentioned

two examples in my earlier posting).

Connecting to how PsN may help in this case: http://psn.sourceforge.net/

In practice to evaluate shrinkage, you would simply give the command

(assuming the model file is called run1.mod):

execute --shrinkage run1.mod

Another quick evaluation that can be made with this program is to

produce mirror plots (PsN links in nicely with Xpose for producing the

diagnostic plots):

execute --mirror=3 run1.mod

This will give you three simulation table files that have been derived

by simulating under the model and then fitting the simulated data using

the same model (using the design of the original data). If you see a

similar pattern in the mirror plots as in the original diagnostic plots,

this gives you more confidence in the model. That brings us back to

Leonids point about it being more useful to look at diagnostic plots

than eta bar.

Wishing you a great weekend!

Jakob

-----Original Message-----

From: BAE, KYUN-SEOP

Sent: 13 November 2008 22:05

To: Ribbing, Jakob; XIA LI; nmusers

Subject: RE: [NMusers] Very small P-Value for ETABAR

Dear All,

Realized etas (EBEs, MAPs) is estimated under the assumption of normal

distribution.

However, the resultant distribution of EBEs may not be normal or mean of

them may not be 0.

To pass t-test, one may use "CENTERING" option at $ESTIMATION.

But, this practice is discouraged by some (and I agree).

Normal assumption cannot coerce the distribution of EBE to be normal,

and furthermore non-normal (and/or not-zero-mean) distribution of EBE

can be nature's nature.

One simple example is mixture population with polymorphism.

If I could not get normal(?) EBEs even after careful examination of

covariate relationships as others suggested,

I would bear it and assume nonparametric distribution.

Regards,

Kyun-Seop

=====================

Kyun-Seop Bae MD PhD

Email: kyun-seop.bae

-----Original Message-----

From: owner-nmusers

On Behalf Of Ribbing, Jakob

Sent: Thursday, November 13, 2008 13:19

To: XIA LI; nmusers

Subject: RE: [NMusers] Very small P-Value for ETABAR

Hi Xia,

Just to clarify one thing (I agree with almost everything you said):

The p-value indeed is related to the test of ETABAR=0. However, this is

not a test of normality, only a test that may reject the mean of the

etas being zero (H0). Therefore, shrinkage per se does not lead to

rejection of HO, as long as both tails of the eta distribution are

shrunk to a similar degree.

I agree with the assumption of normality. This comes into play when you

simulate from the model and if you got the distribution of individual

parameters wrong, simulations may not reflect even the data used to fit

the model.

Best Regards

Jakob

-----Original Message-----

From: owner-nmusers

On Behalf Of XIA LI

Sent: 13 November 2008 20:31

To: nmusers

Subject: Re: [NMusers] Very small P-Value for ETABAR

Dear All,

Just some quick statistical points...

P value is usually associated with hypothesis test. As far as I know,

NONMEM assume normal distribution for ETA, ETA~N(0,omega), which means

the null hypothesis to test is H0: ETABAR=0. A small P value indicates a

significant test. You reject the null hypothesis.

More...

As we all know, ETA is used to capture the variation among individual

parameters and model's unexplained error. We usually use the function

(or model) parameter=typical value*exp (ETA), which leads to a lognormal

distribution assumption for all fixed effect parameters (i.e., CL, V,

Ka, Ke...).

By some statistical theory, the variation of individual parameter equals

a function of the typical value and the variance of ETA.

VAR (CL) = typical value*exp (omega/2). NO MATH PLS!!

If your typical value captures all overall patterns among patients

clearance, then ETA will have a nice symmetric normal distribution with

small variance. Otherwise, you leave too many patterns to ETA and will

see some deviation or shrinkage (whatever you call).

Why adding covariates is a good way to deal with this situation? You

model become CL=typical value*exp (covariate)*exp (ETA). The variation

of individual parameter will be changed to:

VAR (CL) = (typical value + covariate)*exp (omega/2)).

You have one more item to capture the overall patterns, less leave to

ETA. So a 'good' covariate will reduce both the magnitude of omega and

ETA's deviation from normal.

Understanding this is also useful when you are modeling BOV studies.

When you see variation of PK parameters decrease with time (or

occasions). Adding a covariate that make physiological sense and also

decrease with time may help your modeling.

Best,

Xia

======================================

Xia Li

Mathematical Science Department

University of Cincinnati

Received on Mon Nov 17 2008 - 16:27:24 EST