# RE: Very small P-Value for ETABAR

From: Matt Hutmacher <matt.hutmacher>
Date: Mon, 17 Nov 2008 16:27:24 -0500

Hello all,

This is my understanding of the issue on the influence of non-normal etas
and the quality of the prediction of these, such as measured through
shrinkage, on the estimation of model parameters in NONMEM.

NONMEM uses the extended least squares (ELS) procedure to estimate the fixed
(thetas) and variance components of the random effects (omegas) jointly. If
the data between individuals are independent and normally distributed and
the etas enter the model linearly then ELS estimation is equivalent to
maximum likelihood estimation. This implies the estimates should be
consistent, asymptotically normally distributed, and asymptotically
efficient (small standard errors) - based on sufficient sample size. If the
data are not normally distributed, the ELS estimates are still consistent
and asymptotically normally distributed, but lose some efficiency
(relatively larger standard errors of the estimates). Estimation is no
longer considered maximum likelihood estimation. It is my understanding
that Sheiner and Beal coined the term extended least squares because of the
nice property for non-normal data - that the estimates are still consistent
(unbiased for large samples sizes). So for models linear in the etas, it is
true that the distribution of population residuals (based on the etas and
epsilons) should not adversely affect estimation as long as the mean and
marginal (or population) variance (based on the omegas and sigmas) are
correctly specified. The sandwich estimator of the standard errors,
referred to in NONMEM as R^-1*S*R^-1 ("^" is the exponential operator)
ensures consistent estimates of the standard errors for thetas, omegas, and
epsilons in the case of non-normal data (essentially it is robust to the
distributional assumptions of the data).

When the etas enter the model nonlinearly, things get more complicated in
NONMEM. The etas are assumed to be normally distributed to facilitate a
convenient approximation (FO or the Laplacian based FOCE, etc) to the
marginal likelihood. This approximation appears as a multivariate normal
distribution. This allows the use of ELS to estimate the parameters. Thus,
the assumption of normality of the etas directly relates to the
approximation implemented to estimate the parameters. What happens if the
eta's are not normal is not directly clear with respect to the approximation
and hence the estimates. Also, if the distribution of the etas is markedly
skewed, the interpretation of the model prediction with etas=0 as the
"typical individual" model prediction is probably no longer appropriate.
This is because the prediction at eta=0 is no longer at the most likely eta
value (0 is most likely in symmetric distributions). These things are
avoided in the linear case above because the linear model is parameterized
directly with respect to the population mean, so the thetas in that model
are already interpretable with respect to the population of the data despite
the lack of normality.

So, when the etas are nonlinear in the model, the FO or FOCE model is now an
approximate model in that the means and marginal variances are only
approximately correct. For FOCE, how close these are to 'correct' depends
upon how good the etas are predicted, which is a function of the amount of
data within an individual (i.e. data quality), and because the theta's and
omega's apply to all individuals, criteria for sufficient data within all
individuals also need be met. This result is related to the FOCE versus FO
issue with bias (as data become sparse FOCE approaches FO - perhaps
quantified conveniently by shrinkage) and the WRES versus CWRES for
residuals (CWRES go to WRES as data become sparse). But as stated, we are
fitting the likelihood to an approximate model and so bias can result
because of the approximation, ie the (approximately) incorrect mean and
variance (which is a function of the etas and the distributional assumption
of these). This relates to the quality of the etas.

However, we should not get too depressed because the Laplace method works
well quite often for estimation, and in my experience, the first place it
tends to fail is with respect to estimating the omegas (compared to better

References:
NONMEM Users Guides.
LB Sheiner, SL BeaL. Pharmacokinetic parameter estimates from several least
squares procedures: superiority of extended least squares. J Pharmacokinet
Biopharm. 1985 Apr;13(2):185-201.
CC Peck, SL Beal, LB Sheiner AI Nichols. Extended least squares nonlinear
regression: A possible solution to the "choice of weights" problem in
analysis of individual pharmacokinetic data , JPP Volume 12, Number 5 /
October, 1984
SL Beal. Commentary on Significance Levels for Covariate Effects in NONMEM
JPP Volume 29, Number 4 / August, 2002
Vonesh and chinchilli. Linear and Nonlinear models for the analysis of
repeated measurements. Marcel Dekker.
EF Vonesh. A note on the use of Laplace's approximation for nonlinear
mixed-effects models. Biometrika, June 1996; 83: 447 - 452.

-----Original Message-----
From: owner-nmusers
Behalf Of Ribbing, Jakob
Sent: Thursday, November 13, 2008 6:28 PM
To: nmusers
Cc: BAE, KYUN-SEOP; XIA LI
Subject: RE: [NMusers] Very small P-Value for ETABAR

Dear all,

First of all, I am not sure that there is any assumption of etas having
a normal distribution when estimating a parametric model in NONMEM. The
variance of eta (OMEGA) does not carry an assumption of normality. I
believe that Stuart used to say the assumption of normality is only when
simulating. I guess the assumption also affects EBE:s unless the
individual information is completely dominating? If the assumption of
normality is wrong, the weighting of information may not be optimal, but
as long as the true distribution is symmetric the estimated parameters
are in principle correct (but again, the model may not be suitable for
simulation if the distributional assumption was wrong). I will be off
line for a few days, but I am sure somebody will correct me if I am

If etas are shrunk, you can not expect a normal distribution of that
(EBE) eta. That does not invalidate parameterization/distributional
assumptions. Trying other semi-parametric distributions or a
non-parametric distribution (or a mixture model) may give more
confidence in sticking with the original parameterization or else reject
it as inadequate. In the end, you may feel confident about the model
even if the EBE eta distribution is asymmetric and biased (I mentioned
two examples in my earlier posting).

Connecting to how PsN may help in this case: http://psn.sourceforge.net/
In practice to evaluate shrinkage, you would simply give the command
(assuming the model file is called run1.mod):
execute --shrinkage run1.mod

Another quick evaluation that can be made with this program is to
produce mirror plots (PsN links in nicely with Xpose for producing the
diagnostic plots):

execute --mirror=3 run1.mod

This will give you three simulation table files that have been derived
by simulating under the model and then fitting the simulated data using
the same model (using the design of the original data). If you see a
similar pattern in the mirror plots as in the original diagnostic plots,
this gives you more confidence in the model. That brings us back to
Leonids point about it being more useful to look at diagnostic plots
than eta bar.

Wishing you a great weekend!

Jakob

-----Original Message-----
From: BAE, KYUN-SEOP
Sent: 13 November 2008 22:05
To: Ribbing, Jakob; XIA LI; nmusers
Subject: RE: [NMusers] Very small P-Value for ETABAR

Dear All,

Realized etas (EBEs, MAPs) is estimated under the assumption of normal
distribution.
However, the resultant distribution of EBEs may not be normal or mean of
them may not be 0.
To pass t-test, one may use "CENTERING" option at \$ESTIMATION.
But, this practice is discouraged by some (and I agree).

Normal assumption cannot coerce the distribution of EBE to be normal,
and furthermore non-normal (and/or not-zero-mean) distribution of EBE
can be nature's nature.
One simple example is mixture population with polymorphism.

If I could not get normal(?) EBEs even after careful examination of
covariate relationships as others suggested,
I would bear it and assume nonparametric distribution.

Regards,

Kyun-Seop
=====================
Kyun-Seop Bae MD PhD
Email: kyun-seop.bae

-----Original Message-----
From: owner-nmusers
On Behalf Of Ribbing, Jakob
Sent: Thursday, November 13, 2008 13:19
To: XIA LI; nmusers
Subject: RE: [NMusers] Very small P-Value for ETABAR

Hi Xia,

Just to clarify one thing (I agree with almost everything you said):

The p-value indeed is related to the test of ETABAR=0. However, this is
not a test of normality, only a test that may reject the mean of the
etas being zero (H0). Therefore, shrinkage per se does not lead to
rejection of HO, as long as both tails of the eta distribution are
shrunk to a similar degree.

I agree with the assumption of normality. This comes into play when you
simulate from the model and if you got the distribution of individual
parameters wrong, simulations may not reflect even the data used to fit
the model.

Best Regards

Jakob

-----Original Message-----
From: owner-nmusers
On Behalf Of XIA LI
Sent: 13 November 2008 20:31
To: nmusers
Subject: Re: [NMusers] Very small P-Value for ETABAR

Dear All,

Just some quick statistical points...

P value is usually associated with hypothesis test. As far as I know,
NONMEM assume normal distribution for ETA, ETA~N(0,omega), which means
the null hypothesis to test is H0: ETABAR=0. A small P value indicates a
significant test. You reject the null hypothesis.

More...
As we all know, ETA is used to capture the variation among individual
parameters and model's unexplained error. We usually use the function
(or model) parameter=typical value*exp (ETA), which leads to a lognormal
distribution assumption for all fixed effect parameters (i.e., CL, V,
Ka, Ke...).

By some statistical theory, the variation of individual parameter equals
a function of the typical value and the variance of ETA.

VAR (CL) = typical value*exp (omega/2). NO MATH PLS!!

If your typical value captures all overall patterns among patients
clearance, then ETA will have a nice symmetric normal distribution with
small variance. Otherwise, you leave too many patterns to ETA and will
see some deviation or shrinkage (whatever you call).

Why adding covariates is a good way to deal with this situation? You
model become CL=typical value*exp (covariate)*exp (ETA). The variation
of individual parameter will be changed to:

VAR (CL) = (typical value + covariate)*exp (omega/2)).

You have one more item to capture the overall patterns, less leave to
ETA. So a 'good' covariate will reduce both the magnitude of omega and
ETA's deviation from normal.

Understanding this is also useful when you are modeling BOV studies.
When you see variation of PK parameters decrease with time (or
occasions). Adding a covariate that make physiological sense and also