doubt with missing data
Posted on
JuanJMV
Joined: 07/20/2016
Forums
Hi everyone,
I have some doubts with regard to missing data and incomplete twin pairs and I do not know what is the best option.
For example, If I am fitting a bivariate model what should I do with one twin pair where one of the twins does not have the information for one of the variables? I mean if the bivariate twin analysis is with variable1 and variable 2 and one twin does not have the information for variable 1. Also, what should I do if one twin pair does not have information for variable 1 but does for variable 2.
Finally, what should I do with unpaired twins, should I remove them from my data set? I would like to know how the program treat missing data and unpaired twins and what is the correct option.
Thank you so much
Don't remove incomplete data rows
If you're analyzing raw data with an MxFitFunctionML object, OpenMx does full-information maximum-likelihood estimation (FIML), which is not biased by missing data unless the missing-data mechanism is NMAR (Not Missing At Random).
For reference, I refer you to:
Rubin, D. B. (1976). Inference and missing data. (1976). Biometrika, 63(3), 581-592.
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147-177.
Log in or register to post comments
In reply to Don't remove incomplete data rows by AdminRobK
Thank you so much Rob,
Here the results with all twins:
> mxCompare( fit, subs <- list(fitCov, fitEMTVO, fitEMTVZ) )
base comparison ep minus2LL df AIC diffLL diffdf p
1 twoSATja
2 twoSATja testCov 24 7477.7717 4035 -592.22831 131.8153087 4 1.5927412e-27
3 twoSATja eqThMVarTwin 22 7352.7322 4037 -721.26780 6.7758149 6 3.4207832e-01
4 twoSATja eqThMVarZyg 19 7365.9276 4040 -714.07242 19.9711958 9 1.8091000e-02
Here without unpaired twins:
28 6500.0277 3624 -747.97230 NA NA NA
> mxCompare( fit, subs <- list(fitCov, fitEMTVO, fitEMTVZ) )
base comparison ep minus2LL df AIC diffLL diffdf p
1 twoSATja
2 twoSATja testCov 24 6623.5203 3628 -632.47965 123.4926432 4 9.5830006e-26
3 twoSATja eqThMVarTwin 22 6508.7968 3630 -751.20316 8.7691344 6 1.8698413e-01
4 twoSATja eqThMVarZyg 19 6514.7080 3633 -751.29199 14.6803077 9 1.0010005e-01
So, I thought that maybe unpaired twins could be the problem and I wanted to be sure. However, if the unpaired twins are not the problem I don’t know what should I do.
Here is the full saturated script just in case something is wrong. But I guess the problem should be with the data.
# Select Continuous Variables
vars <- c('V1_Ln') # list of continuous variables names
nvc <- 1 # number of continuous variables
ntvc <- nvc*2 # number of total continuous variables
conVars <- paste(vars,c(rep(1,nvc),rep(2,nvc)),sep="")
# Select Ordinal Variables
nth <- 1 # number of thresholds
vars <- c('V2') # list of ordinal variables names
nvo <- 1 # number of ordinal variables
ntvo <- nvo*2 # number of total ordinal variables
ordVars <- paste(vars,c(rep(1,nvo),rep(2,nvo)),sep="")
ordData <- twinData
#for (i in ordVars) { ordData[,i] <- cut(twinData[,i],
#breaks=quantile(ordData[,i],(0:(nth+1))/(nth+1),na.rm=TRUE), labels=c(0:nth)) }
# Select Variables for Analysis
vars <- c('V1_Ln','V2') # list of variables names
nv <- nvc+nvo # number of variables
ntv <- nv*2 # number of total variables
selVars <- paste(vars,c(rep(1,nv),rep(2,nv)),sep="")
# Select Covariates for Analysis
ordData[,'age'] <- ordData[,'age']/100
#ordData <- ordData[-which(is.na(ordData$age)),]
covVars <- c('age', "Sex1" , "Sex2")
nc <- 2 # number of covariates
# Select Data for Analysis
mzData <- subset(ordData, Zyg==1|Zyg==3 , c(selVars, covVars))
dzData <- subset(ordData, Zyg==2|Zyg==4|Zyg==5, c(selVars, covVars))
mzDataF <- mzData
dzDataF <- dzData
mzDataF$V21 <- mxFactor(mzDataF$V21, levels =c(0,1))
mzDataF$V22 <- mxFactor(mzDataF$V22, levels =c(0,1))
dzDataF$V21 <- mxFactor(dzDataF$V21, levels =c(0,1))
dzDataF$V22 <- mxFactor(dzDataF$V22, levels =c(0,1))
# Generate Descriptive Statistics
apply(mzData,2,myMean)
apply(dzData,2,myMean)
#?apply(mzData,2,myCor)
#?apply(dzData,2,myCor)
# Set Starting Values
frMV <- c(TRUE,FALSE) # free status for variables
frCvD <- diag(frMV,ntv,ntv) # lower bounds for diagonal of covariance matrix
frCvD[lower.tri(frCvD)] <- TRUE # lower bounds for below diagonal elements
frCvD[upper.tri(frCvD)] <- TRUE # lower bounds for above diagonal elements
frCv <- matrix(as.logical(frCvD),4)
svMe <- c(2,0) # start value for means
svVa <- c(2,1) # start value for variance
lbVa <- .0001 # start value for lower bounds
svVas <- diag(svVa,ntv,ntv) # assign start values to diagonal of matrix
lbVas <- diag(lbVa,ntv,ntv) # assign lower bounds to diagonal of matrix
svLTh <- -1.5 # start value for first threshold
svITh <- 1 # start value for increments
svTh <- matrix(rep(c(svLTh,(rep(svITh,nth-1)))),nrow=nth,ncol=nv) # start value for thresholds
lbTh <- matrix(rep(c(-3,(rep(0.001,nth-1))),nv),nrow=nth,ncol=nv) # lower bounds for thresholds
# Create Labels
labMeMZ <- paste("meanMZ",selVars,sep="_")
labMeDZ <- paste("meanDZ",selVars,sep="_")
labMeZ <- paste("meanZ",selVars,sep="_")
labCvMZ <- labLower("covMZ",ntv)
labCvDZ <- labLower("covDZ",ntv)
labCvZ <- labLower("covZ",ntv)
labVaMZ <- labDiag("covMZ",ntv)
labVaDZ <- labDiag("covDZ",ntv)
labVaZ <- labDiag("covZ",ntv)
labThMZ <- paste(paste("t",1:nth,"MZ",sep=""),rep(ordVars,each=nth),sep="_")
labThDZ <- paste(paste("t",1:nth,"DZ",sep=""),rep(ordVars,each=nth),sep="_")
labThZ <- paste(paste("t",1:nth,"Z",sep=""),rep(ordVars,each=nth),sep="_")
labBe <- labFull("beta",nc,1)
# ------------------------------------------------------------------------------
# PREPARE MODEL
# Saturated Model
# Create Matrices for Covariates and linear Regression Coefficients
defAge <- mxMatrix( type="Full", nrow=1, ncol=1, free=FALSE, labels=c("data.age"), name="Age" )
pathB1 <- mxMatrix( type="Full", nrow=nc, ncol=1, free=TRUE, values=.01, label=c("b11","b12"), name="b1" )
defSex <- mxMatrix( type="Full", nrow=1, ncol=4, free=FALSE, labels=c("data.Sex1", "data.Sex1", "data.Sex2", "data.Sex2"), name="Sex" )
pathB2 <- mxMatrix( type="Full", nrow=1, ncol=4, free=TRUE, values=.01, label=c("b21", "b22","b21", "b22"), name="b2" )
# Create Algebra for expected Mean & Threshold Matrices
meanMZ <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=frMV, values=svMe, labels=labMeMZ, name="meanMZ" )
meanDZ <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=frMV, values=svMe, labels=labMeDZ, name="meanDZ" )
expMeanMZ <- mxAlgebra( expression= meanMZ + cbind(t(b1%*%Age),t(b1%*%Age))+b2*Sex, name="expMeanMZ" )
expMeanDZ <- mxAlgebra( expression= meanDZ + cbind(t(b1%*%Age),t(b1%*%Age))+b2*Sex, name="expMeanDZ" )
thinMZ <- mxMatrix( type="Full", nrow=nth, ncol=ntvo, free=TRUE, values=svTh, lbound=lbTh, labels=labThMZ, name="thinMZ" )
thinDZ <- mxMatrix( type="Full", nrow=nth, ncol=ntvo, free=TRUE, values=svTh, lbound=lbTh, labels=labThDZ, name="thinDZ" )
inc <- mxMatrix( type="Lower", nrow=nth, ncol=nth, free=FALSE, values=1, name="inc" )
threMZ <- mxAlgebra( expression= inc %*% thinMZ, name="threMZ" )
threDZ <- mxAlgebra( expression= inc %*% thinDZ, name="threDZ" )
# Create Algebra for expected Covariance Matrices
covMZ <- mxMatrix( type="Symm", nrow=ntv, ncol=ntv, free=frCv, values=svVas, lbound=lbVas, labels=labCvMZ, name="covMZ" )
covDZ <- mxMatrix( type="Symm", nrow=ntv, ncol=ntv, free=frCv, values=svVas, lbound=lbVas, labels=labCvDZ, name="covDZ" )
# Create Data Objects for Multiple Groups
dataMZ <- mxData( observed=mzDataF, type="raw" )
dataDZ <- mxData( observed=dzDataF, type="raw" )
# Create Expectation Objects for Multiple Groups
expMZ <- mxExpectationNormal( covariance="covMZ", means="expMeanMZ", dimnames=selVars, thresholds="threMZ", threshnames=ordVars )
expDZ <- mxExpectationNormal( covariance="covDZ", means="expMeanDZ", dimnames=selVars, thresholds="threDZ", threshnames=ordVars )
funML <- mxFitFunctionML()
# Create Model Objects for Multiple Groups
pars <- list(pathB1,pathB2, inc )
defs <- list(defAge, defSex)
modelMZ <- mxModel( name="MZ", pars, defs, meanMZ, expMeanMZ, covMZ, thinMZ, threMZ, dataMZ, expMZ, funML )
modelDZ <- mxModel( name="DZ", pars, defs, meanDZ, expMeanDZ, covDZ, thinDZ, threDZ, dataDZ, expDZ, funML )
multi <- mxFitFunctionMultigroup( c("MZ","DZ") )
# Create Confidence Interval Objects
ciCor <- mxCI( c('MZ.covMZ','DZ.covDZ' ))
ciThre <- mxCI( c('MZ.threMZ','DZ.threDZ' ))
# Build Saturated Model with Confidence Intervals
model <- mxModel( "twoSATja", pars, modelMZ, modelDZ, multi, ciCor, ciThre )
# ------------------------------------------------------------------------------
# RUN MODEL
# Run Saturated Model
fit <- mxTryHardOrdinal( model, intervals=F, extraTries = 11 )
sum <- summary( fit )
round(fit$output$estimate,4)
fitGofs(fit)
# ------------------------------------------------------------------------------
# RUN SUBMODELS
# Test Significance of Covariates
modelCov <- mxModel( fit, name="testCov" )
modelCov <- omxSetParameters( modelCov, label=c("b11","b12", "b21","b22"), free=FALSE, values=0 )
fitCov <- mxTryHard( modelCov, extraTries = 11 )
mxCompare( fit, fitCov )
# Constrain expected Thresholds to be equal across twin order
# Constrain expected Thresholds, Means and Variances (Continous variable) to be equal across twin order
modelEMTVO <- mxModel( fit, name="eqThMVarTwin" )
modelEMTVO <- omxSetParameters( modelEMTVO, label=c(labMeMZ), newlabels=c("meanMZ_V1_Ln","meanMZ_V2"))
modelEMTVO <- omxSetParameters( modelEMTVO, label=c(labMeDZ), newlabels=c("meanDZ_V1_Ln","meanDZ_V2"))
modelEMTVO <- omxSetParameters( modelEMTVO, label=c(labThMZ), newlabels=c("t1MZ_V2"))
modelEMTVO <- omxSetParameters( modelEMTVO, label=c(labThDZ), newlabels=c("t1DZ_V2"))
modelEMTVO <- omxSetParameters( modelEMTVO, label=c("covMZ_1_1","covMZ_3_3"), newlabels=c("covMZ_1_1"))
modelEMTVO <- omxSetParameters( modelEMTVO, label=c("covDZ_1_1","covDZ_3_3"), newlabels=c("covDZ_1_1"))
modelEMTVO <- omxAssignFirstParameters( modelEMTVO )
fitEMTVO <- mxRun( modelEMTVO )
fitEMTVO <- mxTryHardOrdinal( modelEMTVO, intervals=F, extraTries=10, exhaustive=F, finetuneGradient=F )
summary(fitEMTVO)
mxCompare(fit, fitEMTVO)
# Constrain expected Thresholds, Means and Variances (Continuous variable) to be equal across twin order and zygosity
modelEMTVZ <- mxModel( fitEMTVO, name="eqThMVarZyg" )
modelEMTVZ <- omxSetParameters( modelEMTVZ, label=c("meanMZ_V1_Ln","meanDZ_V1_Ln"), newlabels=c("meanZ_V1_Ln"))
modelEMTVZ <- omxSetParameters( modelEMTVZ, label=c("t1MZ_V2","t1DZ_V2"), newlabels=c("t1Z_V2"))
modelEMTVZ <- omxSetParameters( modelEMTVZ, label=c("covMZ_1_1","covDZ_1_1"), newlabels=c("covZ_1_1"))
modelEMTVZ <- omxAssignFirstParameters( modelEMTVZ )
fitEMTVZ <- mxRun( modelEMTVZ )
fitEMTVZ <- mxTryHardOrdinal( modelEMTVZ, intervals=F, extraTries=10, exhaustive=F, finetuneGradient=F )
summary(fitEMTVZ)
mxCompare(fitEMTVO, fitEMTVZ)
mxCompare( fit, subs <- list(fitCov, fitEMTVO, fitEMTVZ) )
Please let me know if you need something else and million thanks in advance.
Log in or register to post comments
In reply to Thank you so much Rob, by JuanJMV
Are you worried about equal means across zygosity?
Before I go further, let me allow you to expand on your concern. BTW I agree with Rob that FIML with everyone you've got is the way to go.
> mxCompare( fit, subs <- list(fitCov, fitEMTVO, fitEMTVZ) )
base comparison ep minus2LL df AIC diffLL diffdf p
1 twoSATja
2 twoSATja testCov 24 7477.7717 4035 -592.22831 131.8153087 4 1.5927412e-27
3 twoSATja eqThMVarTwin 22 7352.7322 4037 -721.26780 6.7758149 6 3.4207832e-01
4 twoSATja eqThMVarZyg 19 7365.9276 4040 -714.07242 19.9711958 9 1.8091000e-02
Log in or register to post comments
In reply to Are you worried about equal means across zygosity? by neale
Thank you so much for these
Since, the covariates (sex and age) are hugely significant should I think about sex-limitation model? I don’t know when is suitable fit that model. I want to be sure that I am fitting the correct model.
Thank you so much guys again.
Log in or register to post comments
In reply to Thank you so much for these by JuanJMV
Sex limitation is different
Log in or register to post comments
In reply to Sex limitation is different by neale
Thank you so much I really
My means are 0.14 for males and 0.28 for females. Should I do an exploratory GxSex model?
Thank you so much again!
Log in or register to post comments