NONMEM Users Network Archive

Hosted by Cognigen

[NMusers] Imputation of multiple categorical covariates with missing data

From: ykl7 . <yassinekamallyauk_at_gmail.com>
Date: Wed, 23 Sep 2015 10:44:13 +0200

Dear NMusers,

I would like to investigate the effect of several genotypes on clearance in
a pop PK model. The issue is most genotypes have some amount of missing
data.

I have discarded the genotypes which have way too many missing samples (
>30%) and now want to handle the remaining genotypes appropriately, before
I move on to an automated stepwise covariate search in PsN. A colleague
informed me that the following mixture model can serve for imputation of a
single categorical covariate (let's call it GENO):

-----------------------------------

; In the dataset, the genotype is saved in the variable GENO and coded -99
if unknown, otherwise it takes on the values 0,1,2
$INPUT ID OCC TIME AMT DV .... GENO ....

$PK

; here you check if the genotype is available or not (GENO==--99). If it's
available, you save the new variable GENOME=GENO...
IF (GENO.NE.-99) THEN
    GENOME = GENO
; ... otherwise you use the mixture to impute GENOME
ELSE
    IF (MIXNUM.EQ.1) GENOME = 0
    IF (MIXNUM.EQ.2) GENOME = 1
    IF (MIXNUM.EQ.3) GENOME = 2
ENDIF

; then you use the variable GENOME (not GENO, which was in the dataset) to
define CL, or whichever other parameter you want.
; you need to use a new variable since NONMEM won't let you change the
value of one of the fields in the dataset.

IF(GENOME.EQ.0) THEN
   TVCL = THETA(1)*((WT/12.5)**0.75)
   TVBIO = 1
ENDIF

; Three sub-populations whose proportion is given by the THETAs
$MIX NSPOP=3
P(1)=THETA(14)
P(2)=THETA(15)
P(3)=THETA(16)

$THETA 0.4 FIX ; GENO = 0 fixed to observed proportion in known genotype
$THETA 0.4 FIX ; GENO = 1 fixed to observed proportion in known genotype
$THETA 0.2 FIX ; GENO = 2 fixed to observed proportion in known genotype


-----------------------------------

*The question is how to repeat such an approach when there are several
missing genotypes (GENO1, GENO2, ..., GENOX) which need to be explored? *

The answer I received from my colleague is it would be rather difficult, as
the mixture model would require the specification of every possible
combination of different genotypes.

One approach I am considering is performing the stepwise covariate search
in PsN (where per default missing categorical data is set to equal the most
common value). Then I retrace the steps of the search based on the scm log
file and check the difference between the OFV drops + p-values of the
chosen relationships with those observed had a mixture model approach been
used. If the difference is small and far removed from any other
relationships which could have been chosen, I accept it and build my
covariate model.

Any input on this matter would be very much appreciated.

Have a good day.

Best regards,
Yassine
Roskilde Hospital
Denmark

Received on Wed Sep 23 2015 - 04:44:13 EDT

The NONMEM Users Network is maintained by ICON plc. Requests to subscribe to the network should be sent to: nmusers-request_at_iconplc.com. Once subscribed, you may contribute to the discussion by emailing: nmusers_at_globomaxnm.com.