NONMEM Users Network Archive

Hosted by Cognigen

Re: Imputation of multiple categorical covariates with missing data

From: Leonid Gibiansky <lgibiansky>
Date: Wed, 23 Sep 2015 11:53:58 -0400

I would start with assigning a separate level to missing values rather
than running a mixture model(s). If missing is at random you would
expect "missing" level to be somewhere in the middle of all other
levels. One can try to assign different (larger) OMEGA for this level
(as by definition it combines subjects with wider range of parameters
due to differences in genotypes). Then you should be able to identify
all strong effects. You may retain even genotypes with large fraction of
missing values: you would care only about representation of each known
genotype (so that each analyzed level contains sufficient number of
subjects). "Missing" level should not be used as a reference. As a
common-sense check, parameter value for "missing" could be compared with
the weighted (by observed prevalence) sum of parameter values for all
known genotypes.

Leonid Gibiansky, Ph.D.
President, QuantPharm LLC
e-mail: LGibiansky at
tel: (301) 767 5566

On 9/23/2015 4:44 AM, ykl7 . wrote:
> Dear NMusers,
> I would like to investigate the effect of several genotypes on clearance
> in a pop PK model. The issue is most genotypes have some amount of
> missing data.
> I have discarded the genotypes which have way too many missing samples (
> >30%) and now want to handle the remaining genotypes appropriately,
> before I move on to an automated stepwise covariate search in PsN. A
> colleague informed me that the following mixture model can serve for
> imputation of a single categorical covariate (let's call it GENO):
> -----------------------------------
> ; In the dataset, the genotype is saved in the variable GENO and coded
> -99 if unknown, otherwise it takes on the values 0,1,2
> $PK
> ; here you check if the genotype is available or not (GENO==--99). If
> it's available, you save the new variable GENOME=GENO...
> ; ... otherwise you use the mixture to impute GENOME
> ; then you use the variable GENOME (not GENO, which was in the dataset)
> to define CL, or whichever other parameter you want.
> ; you need to use a new variable since NONMEM won't let you change the
> value of one of the fields in the dataset.
> TVCL = THETA(1)*((WT/12.5)**0.75)
> TVBIO = 1
> ; Three sub-populations whose proportion is given by the THETAs
> P(1)=THETA(14)
> P(2)=THETA(15)
> P(3)=THETA(16)
> $THETA 0.4 FIX ; GENO = 0 fixed to observed proportion in known genotype
> $THETA 0.4 FIX ; GENO = 1 fixed to observed proportion in known genotype
> $THETA 0.2 FIX ; GENO = 2 fixed to observed proportion in known genotype
> -----------------------------------
> *The question is how to repeat such an approach when there are several
> missing genotypes (GENO1, GENO2, ..., GENOX) which need to be explored? *
> The answer I received from my colleague is it would be rather difficult,
> as the mixture model would require the specification of every possible
> combination of different genotypes.
> One approach I am considering is performing the stepwise covariate
> search in PsN (where per default missing categorical data is set to
> equal the most common value). Then I retrace the steps of the search
> based on the scm log file and check the difference between the OFV drops
> + p-values of the chosen relationships with those observed had a mixture
> model approach been used. If the difference is small and far removed
> from any other relationships which could have been chosen, I accept it
> and build my covariate model.
> Any input on this matter would be very much appreciated.
> Have a good day.
> Best regards,
> Yassine
> Roskilde Hospital
> Denmark
Received on Wed Sep 23 2015 - 11:53:58 EDT

The NONMEM Users Network is maintained by ICON plc. Requests to subscribe to the network should be sent to:

Once subscribed, you may contribute to the discussion by emailing: