From: Ahn, Jae Eun <*jaeeun.ahn*>

Date: Fri, 27 Feb 2009 13:20:26 -0500

Dear Jun and NMusers,

I would like to make a correction to my email below. Just to make things clearer, let me restate the way that NONMEM$B!G(Js PHI function works: internally it takes the absolute value of the argument (X) and computes the approximate area of normal density curve from |X| to the positive infinity. Since the standard normal density curve is symmetric around 0, the approximated value is actually the same as the area from minus infinity to -|X|. However, regardless of whether X is positive or negative, the PHI function returns the cdf.

Assuming f(t) is smaller than QL, the argument ((QL-f(t))/g(t)) is positive. The SMALLER f(t) makes the distance from QL larger (assuming g(t) is same or smaller) and so is the argument. Thus, the area between the argument and positive infinity gets smaller and the cdf gets BIGGER. I missed the part that the argument is not just f(t) but the difference from QL. Therefore, as Jun pointed out, the smaller f(t), the higher the cdf. This makes sense because the censored observation with smaller f(t) is more likely to be BQL than the one with bigger f(t) (in M3, the likelihood for BQL observation is the probability that the observation is BQL). For example, if the f(t) is same as QL, there is a 50% chance that an observation is BQL. If f(t) is 0, the probability of being BQL will be 1.

However, it may not necessarily be the case that the censored observation with smaller f(t) will have more contribution to the objective function. For uncensored observations, the closer f(t) to the observations is more favorable as the algorithm tries to maximize the likelihood (actually minimize -2 log likelihood). For censored observations, the smaller f(t) is more favorable as it makes cdf bigger. The answer we will get from the software should be somewhere in between where the likelihood for all the data (the product) is maximized.

Thank you,

Jae Eun

________________________________

From: owner-nmusers

Sent: Monday, February 23, 2009 12:02 PM

To: Jun Shen

Cc: nmusers

Subject: RE: [NMusers] Theoretical questions about Beal's M2 method

Dear Jun,

For your thought # 3, it is not quite true that $B!H(JIf we use cdf to represent the contribution of censored observations, then the smaller concentrations will have larger contribution to the objective function$B!H(J. Assuming f(t) is smaller than QL (which is usually the case for BQL data), the argument for the PHI function becomes a positive number. When the approximated value for cdf is returned, the final output is actually $B!H(J1-phi$B!I(J for a positive argument (note that NONMEM takes the absolute value for the argument when it computes the integral and returns the value according to the sign of the argument; see below). So using your terms, the smaller the f(t), yes, the higher the phi(); however, the cdf will be smaller because it is actually 1-phi().

; below is a part of NONMEM$B!G(Js PHI function.

ARG=X

IF (ARG.LT.0) PHI=PHITL(ARG)

IF (ARG.GT.0) PHI=ONE-PHITL(ARG)

IF (ARG.EQ.0) PHI=P5

Jae Eun Ahn

________________________________

From: owner-nmusers

Sent: Sunday, February 22, 2009 8:09 PM

To: Sebastien Bihorel

Cc: Leonid Gibiansky; nmusers

Subject: Re: [NMusers] Theoretical questions about Beal's M2 method

Sebastien,

I have been reading the BQL papers recently. Here is what I think.

1. M2 can be regarded as a truncated distribution method. You can look at it this way. Any probability density function (pdf) MUST be integrated into a cumulative distribution function (cdf) of 1 on its entire sample space, which is negative infinity to positive infinity for a normal distribution. If the BQL observations are discarded, the sample space is truncated from negative infinity to QL. This results in a cdf less than 1. In order to bring the cdf back to 1 on the new sample space (from QL to positive infinity), the pdf function has to be corrected by ( 1- phi((QL-f(t)/g(t))). Wikipedia has an exellent entry to explain the math (http://en.wikipedia.org/wiki/Truncated_distribution)

2. The question I have is from M3 and M4 where the likelihood of an uncencored observation is expressed in pdf and the likelihood of a cencored observation is in cdf. Well, you can still combine everything together to get an objective function. But what is the definition of likelihood here exactly, pdf or cdf?

3. If we use cdf to represent the contribution of censored observations, then the smaller concentrations will have larger contribution to the objective function. For example, in M3 the likelihood of a censored observation is l(t)=phi((QL-f(t))/sqrt(g(t))). The smaller the f(t), the higher the phi(). Does that make sense?

I believe the clear undersdanding on these theoretical concepts is just as important as to know those "know-how". Appreciate more comments.

--

Jun Shen PhD

PK/PD Scientist

BioPharma Services

Millipore Corporation

15 Research Park Dr.

St Charles, MO 63304

Direct: 636-720-1589

On Mon, Feb 16, 2009 at 4:22 PM, Sebastien Bihorel <Sebastien.Bihorel

Thanks Leonid,

However, there is still one point that is unclear to me. You have demonstrated that p(y | y > LLQ) = p(y) / p(y>LOQ), given the assumptions of the text. Now, this is a discrete probability, while l(t) is a likelihood... How can one mathematically demonstrate the expression of l(t) used by Dr. Beal starting from the previous expression of p(y | y > LLQ)?

*Sebastien Bihorel, PharmD, PhD*

PKPD Scientist

Cognigen Corp

Email: sebastien.bihorel

Phone: (716) 633-3463 ext. 323

Leonid Gibiansky wrote:

You can view it as:

p(y $B"A(J y > LLQ) = 0 when y < LLQ

p(y $B"A(J y > LLQ) = p(y) when y > LLQ

Another way to look on this is to say that

p(y | y > LLQ) is proportional to p(y) and should integrate to 1

integral(p(y)) over y > LLQ is ( 1- phi((LLQ-f(t)/g(t))) that immediately leads to l(t) below.

As to the 0 to 1 restriction, l(t) is the density, not probability. It should integrate to one but can be smaller or greater than 1 (any positive number).

Leonid

--------------------------------------

Leonid Gibiansky, Ph.D.

President, QuantPharm LLC

web: www.quantpharm.com

e-mail: LGibiansky at quantpharm.com

tel: (301) 767 5566

Sebastien Bihorel wrote:

Dear colleagues,

In a paper dated from 2001, Dr. Beal presented several methods to handle data below the quantification limit (Journal of Pharmacokinetics and Pharmacodynamics, Vol. 28, No. 5, October 2001), including the M2 method that can be implemented in NONMEM 6 via the YLO functionnality. I would like to submit some questions to the list about the theory associated to the M2 method.

I quote:

"...the BQL observations can be discarded, and under the assumption that all the D(t) [the distribution of residual errors] are normal, the method of maximum conditional likelihood estimation can be applied to the remaining observations (method M2). With this method, the likelihood for the data, conditional on the fact that by design, all (remaining) observations are above the QL, is maximized with respect to the model parameters. The density function of the distribution on possible observations at time t, evaluated at y(t), is 1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) and the probability that an observation at time t is above the QL is 1- phi((QL-f(t)/g(t))), where phi is the cumulative normal distribution function. Therefore, conditional on the observation at time t being above QL, the likelihood for y(t) is the ratio:

l(t)=(1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) /( 1- phi((QL-f(t)/g(t))) [equation 1]"

Now, lets A and B be two events. The probability of A, given B is: p(A|B) = p(A$B"A(JB) / p(B)

In the context of Dr. Beal's paper, I interpret A as simply the observation y(t) and B as the fact that y(t) is above QL, and thus have the following questions about equation 1:

- it looks like p(A$B"A(JB) in equation 1 simplifies to the probability of y(t) given the model parameters, i.e. p(A). Which part of the problem allows this simplification?

- how can l(t) be constrained between 0 and 1 if both numerator and denominator can vary between 0 and 1?

Any comment from nmusers will be greatly appreciated.

Received on Fri Feb 27 2009 - 13:20:26 EST

Date: Fri, 27 Feb 2009 13:20:26 -0500

Dear Jun and NMusers,

I would like to make a correction to my email below. Just to make things clearer, let me restate the way that NONMEM$B!G(Js PHI function works: internally it takes the absolute value of the argument (X) and computes the approximate area of normal density curve from |X| to the positive infinity. Since the standard normal density curve is symmetric around 0, the approximated value is actually the same as the area from minus infinity to -|X|. However, regardless of whether X is positive or negative, the PHI function returns the cdf.

Assuming f(t) is smaller than QL, the argument ((QL-f(t))/g(t)) is positive. The SMALLER f(t) makes the distance from QL larger (assuming g(t) is same or smaller) and so is the argument. Thus, the area between the argument and positive infinity gets smaller and the cdf gets BIGGER. I missed the part that the argument is not just f(t) but the difference from QL. Therefore, as Jun pointed out, the smaller f(t), the higher the cdf. This makes sense because the censored observation with smaller f(t) is more likely to be BQL than the one with bigger f(t) (in M3, the likelihood for BQL observation is the probability that the observation is BQL). For example, if the f(t) is same as QL, there is a 50% chance that an observation is BQL. If f(t) is 0, the probability of being BQL will be 1.

However, it may not necessarily be the case that the censored observation with smaller f(t) will have more contribution to the objective function. For uncensored observations, the closer f(t) to the observations is more favorable as the algorithm tries to maximize the likelihood (actually minimize -2 log likelihood). For censored observations, the smaller f(t) is more favorable as it makes cdf bigger. The answer we will get from the software should be somewhere in between where the likelihood for all the data (the product) is maximized.

Thank you,

Jae Eun

________________________________

From: owner-nmusers

Sent: Monday, February 23, 2009 12:02 PM

To: Jun Shen

Cc: nmusers

Subject: RE: [NMusers] Theoretical questions about Beal's M2 method

Dear Jun,

For your thought # 3, it is not quite true that $B!H(JIf we use cdf to represent the contribution of censored observations, then the smaller concentrations will have larger contribution to the objective function$B!H(J. Assuming f(t) is smaller than QL (which is usually the case for BQL data), the argument for the PHI function becomes a positive number. When the approximated value for cdf is returned, the final output is actually $B!H(J1-phi$B!I(J for a positive argument (note that NONMEM takes the absolute value for the argument when it computes the integral and returns the value according to the sign of the argument; see below). So using your terms, the smaller the f(t), yes, the higher the phi(); however, the cdf will be smaller because it is actually 1-phi().

; below is a part of NONMEM$B!G(Js PHI function.

ARG=X

IF (ARG.LT.0) PHI=PHITL(ARG)

IF (ARG.GT.0) PHI=ONE-PHITL(ARG)

IF (ARG.EQ.0) PHI=P5

Jae Eun Ahn

________________________________

From: owner-nmusers

Sent: Sunday, February 22, 2009 8:09 PM

To: Sebastien Bihorel

Cc: Leonid Gibiansky; nmusers

Subject: Re: [NMusers] Theoretical questions about Beal's M2 method

Sebastien,

I have been reading the BQL papers recently. Here is what I think.

1. M2 can be regarded as a truncated distribution method. You can look at it this way. Any probability density function (pdf) MUST be integrated into a cumulative distribution function (cdf) of 1 on its entire sample space, which is negative infinity to positive infinity for a normal distribution. If the BQL observations are discarded, the sample space is truncated from negative infinity to QL. This results in a cdf less than 1. In order to bring the cdf back to 1 on the new sample space (from QL to positive infinity), the pdf function has to be corrected by ( 1- phi((QL-f(t)/g(t))). Wikipedia has an exellent entry to explain the math (http://en.wikipedia.org/wiki/Truncated_distribution)

2. The question I have is from M3 and M4 where the likelihood of an uncencored observation is expressed in pdf and the likelihood of a cencored observation is in cdf. Well, you can still combine everything together to get an objective function. But what is the definition of likelihood here exactly, pdf or cdf?

3. If we use cdf to represent the contribution of censored observations, then the smaller concentrations will have larger contribution to the objective function. For example, in M3 the likelihood of a censored observation is l(t)=phi((QL-f(t))/sqrt(g(t))). The smaller the f(t), the higher the phi(). Does that make sense?

I believe the clear undersdanding on these theoretical concepts is just as important as to know those "know-how". Appreciate more comments.

--

Jun Shen PhD

PK/PD Scientist

BioPharma Services

Millipore Corporation

15 Research Park Dr.

St Charles, MO 63304

Direct: 636-720-1589

On Mon, Feb 16, 2009 at 4:22 PM, Sebastien Bihorel <Sebastien.Bihorel

Thanks Leonid,

However, there is still one point that is unclear to me. You have demonstrated that p(y | y > LLQ) = p(y) / p(y>LOQ), given the assumptions of the text. Now, this is a discrete probability, while l(t) is a likelihood... How can one mathematically demonstrate the expression of l(t) used by Dr. Beal starting from the previous expression of p(y | y > LLQ)?

*Sebastien Bihorel, PharmD, PhD*

PKPD Scientist

Cognigen Corp

Email: sebastien.bihorel

Phone: (716) 633-3463 ext. 323

Leonid Gibiansky wrote:

You can view it as:

p(y $B"A(J y > LLQ) = 0 when y < LLQ

p(y $B"A(J y > LLQ) = p(y) when y > LLQ

Another way to look on this is to say that

p(y | y > LLQ) is proportional to p(y) and should integrate to 1

integral(p(y)) over y > LLQ is ( 1- phi((LLQ-f(t)/g(t))) that immediately leads to l(t) below.

As to the 0 to 1 restriction, l(t) is the density, not probability. It should integrate to one but can be smaller or greater than 1 (any positive number).

Leonid

--------------------------------------

Leonid Gibiansky, Ph.D.

President, QuantPharm LLC

web: www.quantpharm.com

e-mail: LGibiansky at quantpharm.com

tel: (301) 767 5566

Sebastien Bihorel wrote:

Dear colleagues,

In a paper dated from 2001, Dr. Beal presented several methods to handle data below the quantification limit (Journal of Pharmacokinetics and Pharmacodynamics, Vol. 28, No. 5, October 2001), including the M2 method that can be implemented in NONMEM 6 via the YLO functionnality. I would like to submit some questions to the list about the theory associated to the M2 method.

I quote:

"...the BQL observations can be discarded, and under the assumption that all the D(t) [the distribution of residual errors] are normal, the method of maximum conditional likelihood estimation can be applied to the remaining observations (method M2). With this method, the likelihood for the data, conditional on the fact that by design, all (remaining) observations are above the QL, is maximized with respect to the model parameters. The density function of the distribution on possible observations at time t, evaluated at y(t), is 1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) and the probability that an observation at time t is above the QL is 1- phi((QL-f(t)/g(t))), where phi is the cumulative normal distribution function. Therefore, conditional on the observation at time t being above QL, the likelihood for y(t) is the ratio:

l(t)=(1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) /( 1- phi((QL-f(t)/g(t))) [equation 1]"

Now, lets A and B be two events. The probability of A, given B is: p(A|B) = p(A$B"A(JB) / p(B)

In the context of Dr. Beal's paper, I interpret A as simply the observation y(t) and B as the fact that y(t) is above QL, and thus have the following questions about equation 1:

- it looks like p(A$B"A(JB) in equation 1 simplifies to the probability of y(t) given the model parameters, i.e. p(A). Which part of the problem allows this simplification?

- how can l(t) be constrained between 0 and 1 if both numerator and denominator can vary between 0 and 1?

Any comment from nmusers will be greatly appreciated.

Received on Fri Feb 27 2009 - 13:20:26 EST