You are here

Using FIML to impute missing covariate data

2 posts / 0 new
Last post
Inga's picture
Offline
Joined: 09/02/2015 - 11:44
Using FIML to impute missing covariate data

I am running a simple ACE model with covariates (e.g., sex, age). Now I am trying to impute missing covariate data using full information maximum likelihood (FILM) (as described here: https://openmx.ssri.psu.edu/docs/OpenMx/2.0.0-3838/FIML_RowFit.html).

To do this, I used an example script that was posted earlier by Mike Neale (https://openmx.ssri.psu.edu/sites/default/files/UnivariateTwinAnalysis_MatrixRaw-3.R) as a response to a post in this forum (https://openmx.ssri.psu.edu/thread/554). To get missing values, I added the following code to the script before I run the ACE including covariates model:

dzData[1:3,3] <- NA #delete 3 datapoints from age covariate (first twin)

However, I got the following error message:
Error in runHelper(model, frontendStart, intervals, silent, suppressWarnings, :
Error: NA value for a definition variable is Not Yet Implemented.

which, basically says that it is not possible to use FILM to impute missing data in covariates if I understand it correctly.
As I was told by a reviewer that it is possible to use FILM to impute missing covariate data, I however wanted to be sure that I run the analysis in the right way. Is there for example a command or option I didn't use, or am I using code that is too old? Or is FILM, indeed, only possible to use for the phenotypic (dependent) variables? And if so, are there any alternatives to listwise deletion?

I installed the latest version of OpenMx, using source('https://openmx.ssri.psu.edu/getOpenMx.R') and I am using R version 3.1.2.

Thank you so much for you time and effort.
Kind regards,
Inga Schwabe

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
Impute, or just work around missing values?

Hi, Inga. Do you want to actually impute missing values on your covariates? Or do you just want to make full use of a dataset in which the covariates sometimes have missing values? I'm guessing it's probably the latter.

Note that everything I say in this post assumes that you are not using any ordinal variables.

If you're entering the covariates into your MxModel as "definition variables," then you can follow the trick I describe here, which involves setting the missing values on the covariates to a "pseudo-missing value," and setting the corresponding phenotype values to NA. You can then proceed with running the MxModel. Note that FIML, by itself, does not impute anything. Instead, it adjusts the likelihood function for each row of the dataset to be the multivariate-normal density of the subset of the endogenous variables that have non-missing values for that row.

If you're using your covariates only for linear regression as part of specifying the MxModel's means, an alternate way to proceed would be to treat the covariates as endogenous variables instead of definition variables. In this case, you would be modeling the joint distribution of phenotypes and covariates, and you would appropriately write the regression of phenotypes onto covariates into the specification of the MxModel. The simplest way to do that is probably to use RAM path specification, and represent the regression onto covariates by having single-headed paths running from covariates to phenotypes.

The error message you're seeing occurs because when doing FIML, OpenMx will work around missing values on the endogenous variables (the phenotypes), but does not tolerate NAs among the definition variables. Both approaches I described side-step the problem of NAs among the definition variables. The first changes the NAs to a pseudo-missing value, and then ensures that the pseudo-missing value will be ignored by setting the appropriate phenotype scores to NA. The second obviates the need for definition variables to begin with.

If for some reason you actually want to impute missing values on the covariates, FIML can help you there, too, but OpenMx does not have any kind of built-in imputation feature. You would start by obtaining a FIML estimate of the means and covariance matrix of the joint distribution of the phenotypes and covariates. I could describe this further if you'd like.