doubt with missing data

Posted on Wed, 08/16/2017 - 04:11

JuanJMV Joined: 07/20/2016

Forums

Hi everyone,

I have some doubts with regard to missing data and incomplete twin pairs and I do not know what is the best option.
For example, If I am fitting a bivariate model what should I do with one twin pair where one of the twins does not have the information for one of the variables? I mean if the bivariate twin analysis is with variable1 and variable 2 and one twin does not have the information for variable 1. Also, what should I do if one twin pair does not have information for variable 1 but does for variable 2.
Finally, what should I do with unpaired twins, should I remove them from my data set? I would like to know how the program treat missing data and unpaired twins and what is the correct option.

Thank you so much

Replied on Fri, 08/18/2017 - 11:40

AdminRobK Joined: Jan 24, 2014

Don't remove incomplete data rows

Don't remove incomplete data rows from the analysis. Data rows that contain missing phenotype scores--even those containing singleton twins--still contain information about unknown parameters.

If you're analyzing raw data with an MxFitFunctionML object, OpenMx does full-information maximum-likelihood estimation (FIML), which is not biased by missing data unless the missing-data mechanism is NMAR (Not Missing At Random).

For reference, I refer you to:
Rubin, D. B. (1976). Inference and missing data. (1976). Biometrika, 63(3), 581-592.
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147-177.

Log in or register to post comments

Replied on Sat, 08/19/2017 - 04:41

JuanJMV Joined: Jul 20, 2016

Thank you so much Rob,

Thank you so much Rob, fantastic answer as always. I used to think the same. However, I had a problem with a bivariate saturated model (1 ordinal variable 1 binary variable) and I decided to try without the unpaired twins.
Here the results with all twins:
> mxCompare( fit, subs <- list(fitCov, fitEMTVO, fitEMTVZ) ) base comparison ep minus2LL df AIC diffLL diffdf p 1 twoSATja 28 7345.9564 4031 -716.04362 NA NA NA 2 twoSATja testCov 24 7477.7717 4035 -592.22831 131.8153087 4 1.5927412e-27 3 twoSATja eqThMVarTwin 22 7352.7322 4037 -721.26780 6.7758149 6 3.4207832e-01 4 twoSATja eqThMVarZyg 19 7365.9276 4040 -714.07242 19.9711958 9 1.8091000e-02

Here without unpaired twins:
> mxCompare( fit, subs <- list(fitCov, fitEMTVO, fitEMTVZ) ) base comparison ep minus2LL df AIC diffLL diffdf p 1 twoSATja 28 6500.0277 3624 -747.97230 NA NA NA 2 twoSATja testCov 24 6623.5203 3628 -632.47965 123.4926432 4 9.5830006e-26 3 twoSATja eqThMVarTwin 22 6508.7968 3630 -751.20316 8.7691344 6 1.8698413e-01 4 twoSATja eqThMVarZyg 19 6514.7080 3633 -751.29199 14.6803077 9 1.0010005e-01

So, I thought that maybe unpaired twins could be the problem and I wanted to be sure. However, if the unpaired twins are not the problem I don’t know what should I do.

Here is the full saturated script just in case something is wrong. But I guess the problem should be with the data.

# Select Continuous Variables vars <- c('V1_Ln') # list of continuous variables names nvc <- 1 # number of continuous variables ntvc <- nvc*2 # number of total continuous variables conVars <- paste(vars,c(rep(1,nvc),rep(2,nvc)),sep="")


# Select Ordinal Variables

nth       <- 1                         # number of thresholds

vars      <- c('V2')                   # list of ordinal variables names

nvo       <- 1                         # number of ordinal variables

ntvo      <- nvo*2                     # number of total ordinal variables

ordVars   <- paste(vars,c(rep(1,nvo),rep(2,nvo)),sep="")

ordData   <- twinData

#for (i in ordVars) { ordData[,i] <- cut(twinData[,i],

#breaks=quantile(ordData[,i],(0:(nth+1))/(nth+1),na.rm=TRUE), labels=c(0:nth)) }
# Select Variables for Analysis

vars      <- c('V1_Ln','V2')              # list of variables names

nv        <- nvc+nvo                   # number of variables

ntv       <- nv*2                      # number of total variables

selVars   <- paste(vars,c(rep(1,nv),rep(2,nv)),sep="")
# Select Covariates for Analysis

ordData[,'age'] <- ordData[,'age']/100

#ordData   <- ordData[-which(is.na(ordData$age)),]

covVars   <- c('age', "Sex1" , "Sex2")

nc        <- 2                         # number of covariates
# Select Data for Analysis

mzData    <- subset(ordData, Zyg==1|Zyg==3 , c(selVars, covVars))

dzData    <- subset(ordData, Zyg==2|Zyg==4|Zyg==5, c(selVars, covVars))

mzDataF   <- mzData

dzDataF   <- dzData

mzDataF$V21 <- mxFactor(mzDataF$V21, levels =c(0,1))

mzDataF$V22 <- mxFactor(mzDataF$V22, levels =c(0,1))

dzDataF$V21 <- mxFactor(dzDataF$V21, levels =c(0,1))

dzDataF$V22 <- mxFactor(dzDataF$V22, levels =c(0,1))

# Generate Descriptive Statistics

apply(mzData,2,myMean)

apply(dzData,2,myMean)

#?apply(mzData,2,myCor)

#?apply(dzData,2,myCor)
# Set Starting Values

frMV      <- c(TRUE,FALSE)             # free status for variables

frCvD     <- diag(frMV,ntv,ntv)        # lower bounds for diagonal of covariance matrix

frCvD[lower.tri(frCvD)] <- TRUE        # lower bounds for below diagonal elements

frCvD[upper.tri(frCvD)] <- TRUE        # lower bounds for above diagonal elements

frCv      <- matrix(as.logical(frCvD),4)

svMe      <- c(2,0)                   # start value for means

svVa      <- c(2,1)                    # start value for variance

lbVa      <- .0001                     # start value for lower bounds

svVas     <- diag(svVa,ntv,ntv)        # assign start values to diagonal of matrix

lbVas     <- diag(lbVa,ntv,ntv)        # assign lower bounds to diagonal of matrix

svLTh     <- -1.5                      # start value for first threshold

svITh     <- 1                         # start value for increments

svTh      <- matrix(rep(c(svLTh,(rep(svITh,nth-1)))),nrow=nth,ncol=nv)     # start value for thresholds

lbTh      <- matrix(rep(c(-3,(rep(0.001,nth-1))),nv),nrow=nth,ncol=nv)     # lower bounds for thresholds
# Create Labels

labMeMZ   <- paste("meanMZ",selVars,sep="_")

labMeDZ   <- paste("meanDZ",selVars,sep="_")

labMeZ    <- paste("meanZ",selVars,sep="_")

labCvMZ   <- labLower("covMZ",ntv)

labCvDZ   <- labLower("covDZ",ntv)

labCvZ    <- labLower("covZ",ntv)

labVaMZ   <- labDiag("covMZ",ntv)

labVaDZ   <- labDiag("covDZ",ntv)

labVaZ    <- labDiag("covZ",ntv)
labThMZ   <- paste(paste("t",1:nth,"MZ",sep=""),rep(ordVars,each=nth),sep="_")

labThDZ   <- paste(paste("t",1:nth,"DZ",sep=""),rep(ordVars,each=nth),sep="_")

labThZ    <- paste(paste("t",1:nth,"Z",sep=""),rep(ordVars,each=nth),sep="_")

labBe     <- labFull("beta",nc,1)
# ------------------------------------------------------------------------------

# PREPARE MODEL
# Saturated Model

# Create Matrices for Covariates and linear Regression Coefficients

defAge    <- mxMatrix( type="Full", nrow=1, ncol=1, free=FALSE, labels=c("data.age"), name="Age" )

pathB1     <- mxMatrix( type="Full", nrow=nc, ncol=1, free=TRUE, values=.01, label=c("b11","b12"), name="b1" )

defSex    <- mxMatrix( type="Full", nrow=1, ncol=4, free=FALSE, labels=c("data.Sex1", "data.Sex1", "data.Sex2", "data.Sex2"), name="Sex" )

pathB2     <- mxMatrix( type="Full", nrow=1, ncol=4, free=TRUE, values=.01, label=c("b21", "b22","b21", "b22"), name="b2" )
# Create Algebra for expected Mean & Threshold Matrices

meanMZ    <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=frMV, values=svMe, labels=labMeMZ, name="meanMZ" )

meanDZ    <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=frMV, values=svMe, labels=labMeDZ, name="meanDZ" )

expMeanMZ <- mxAlgebra( expression= meanMZ + cbind(t(b1%*%Age),t(b1%*%Age))+b2*Sex, name="expMeanMZ" )

expMeanDZ <- mxAlgebra( expression= meanDZ + cbind(t(b1%*%Age),t(b1%*%Age))+b2*Sex, name="expMeanDZ" )

thinMZ    <- mxMatrix( type="Full", nrow=nth, ncol=ntvo, free=TRUE, values=svTh, lbound=lbTh, labels=labThMZ, name="thinMZ" )

thinDZ    <- mxMatrix( type="Full", nrow=nth, ncol=ntvo, free=TRUE, values=svTh, lbound=lbTh, labels=labThDZ, name="thinDZ" )

inc       <- mxMatrix( type="Lower", nrow=nth, ncol=nth, free=FALSE, values=1, name="inc" )

threMZ    <- mxAlgebra( expression= inc %*% thinMZ, name="threMZ" )

threDZ    <- mxAlgebra( expression= inc %*% thinDZ, name="threDZ" )
# Create Algebra for expected Covariance Matrices

covMZ     <- mxMatrix( type="Symm", nrow=ntv, ncol=ntv, free=frCv, values=svVas, lbound=lbVas, labels=labCvMZ, name="covMZ" )

covDZ     <- mxMatrix( type="Symm", nrow=ntv, ncol=ntv, free=frCv, values=svVas, lbound=lbVas, labels=labCvDZ, name="covDZ" )
# Create Data Objects for Multiple Groups

dataMZ    <- mxData( observed=mzDataF, type="raw" )

dataDZ    <- mxData( observed=dzDataF, type="raw" )
# Create Expectation Objects for Multiple Groups

expMZ     <- mxExpectationNormal( covariance="covMZ", means="expMeanMZ", dimnames=selVars, thresholds="threMZ", threshnames=ordVars )

expDZ     <- mxExpectationNormal( covariance="covDZ", means="expMeanDZ", dimnames=selVars, thresholds="threDZ", threshnames=ordVars )

funML     <- mxFitFunctionML()
# Create Model Objects for Multiple Groups

pars      <- list(pathB1,pathB2, inc )

defs      <- list(defAge, defSex)

modelMZ   <- mxModel( name="MZ", pars, defs, meanMZ, expMeanMZ, covMZ, thinMZ, threMZ, dataMZ, expMZ, funML )

modelDZ   <- mxModel( name="DZ", pars, defs, meanDZ, expMeanDZ, covDZ, thinDZ, threDZ, dataDZ, expDZ, funML )

multi     <- mxFitFunctionMultigroup( c("MZ","DZ") )
# Create Confidence Interval Objects

ciCor     <- mxCI( c('MZ.covMZ','DZ.covDZ' ))

ciThre    <- mxCI( c('MZ.threMZ','DZ.threDZ' ))
# Build Saturated Model with Confidence Intervals

model     <- mxModel( "twoSATja", pars, modelMZ, modelDZ, multi, ciCor, ciThre )
# ------------------------------------------------------------------------------

# RUN MODEL
# Run Saturated Model

fit       <- mxTryHardOrdinal( model, intervals=F, extraTries = 11 )

sum       <- summary( fit )

round(fit$output$estimate,4)

fitGofs(fit)
# ------------------------------------------------------------------------------

# RUN SUBMODELS
# Test Significance of Covariates

modelCov  <- mxModel( fit, name="testCov" )

modelCov  <- omxSetParameters( modelCov, label=c("b11","b12", "b21","b22"), free=FALSE, values=0 )

fitCov    <- mxTryHard( modelCov, extraTries = 11 )

mxCompare( fit, fitCov )
# Constrain expected Thresholds to be equal across twin order

# Constrain expected Thresholds, Means and Variances (Continous variable) to be equal across twin order

modelEMTVO  <- mxModel( fit, name="eqThMVarTwin" )

modelEMTVO  <- omxSetParameters( modelEMTVO, label=c(labMeMZ), newlabels=c("meanMZ_V1_Ln","meanMZ_V2"))

modelEMTVO  <- omxSetParameters( modelEMTVO, label=c(labMeDZ), newlabels=c("meanDZ_V1_Ln","meanDZ_V2"))

modelEMTVO  <- omxSetParameters( modelEMTVO, label=c(labThMZ), newlabels=c("t1MZ_V2"))

modelEMTVO  <- omxSetParameters( modelEMTVO, label=c(labThDZ), newlabels=c("t1DZ_V2"))

modelEMTVO  <- omxSetParameters( modelEMTVO, label=c("covMZ_1_1","covMZ_3_3"), newlabels=c("covMZ_1_1"))

modelEMTVO  <- omxSetParameters( modelEMTVO, label=c("covDZ_1_1","covDZ_3_3"), newlabels=c("covDZ_1_1"))

modelEMTVO  <- omxAssignFirstParameters( modelEMTVO )

fitEMTVO    <- mxRun( modelEMTVO )

fitEMTVO    <- mxTryHardOrdinal( modelEMTVO, intervals=F, extraTries=10, exhaustive=F, finetuneGradient=F )

summary(fitEMTVO)

mxCompare(fit, fitEMTVO)

# Constrain expected Thresholds, Means and Variances (Continuous variable) to be equal across twin order and zygosity modelEMTVZ <- mxModel( fitEMTVO, name="eqThMVarZyg" ) modelEMTVZ <- omxSetParameters( modelEMTVZ, label=c("meanMZ_V1_Ln","meanDZ_V1_Ln"), newlabels=c("meanZ_V1_Ln")) modelEMTVZ <- omxSetParameters( modelEMTVZ, label=c("t1MZ_V2","t1DZ_V2"), newlabels=c("t1Z_V2")) modelEMTVZ <- omxSetParameters( modelEMTVZ, label=c("covMZ_1_1","covDZ_1_1"), newlabels=c("covZ_1_1")) modelEMTVZ <- omxAssignFirstParameters( modelEMTVZ ) fitEMTVZ <- mxRun( modelEMTVZ ) fitEMTVZ <- mxTryHardOrdinal( modelEMTVZ, intervals=F, extraTries=10, exhaustive=F, finetuneGradient=F ) summary(fitEMTVZ) mxCompare(fitEMTVO, fitEMTVZ) mxCompare( fit, subs <- list(fitCov, fitEMTVO, fitEMTVZ) )

Please let me know if you need something else and million thanks in advance.

Log in or register to post comments

Replied on Mon, 08/21/2017 - 18:16

neale Joined: Jul 31, 2009

Are you worried about equal means across zygosity?

In your comparison table i don't see major problems. The covariates are hugely significant, the twin 1 vs. twin 2 means are non-significantly different (.342) but there is marginally significant MZ vs. DZ means (.018). Depending on how large your sample size is, this may not be unreasonable. A very small real effect in a very large sample may turn out significant. The only reason it seems not to in the unpaired case is probably just because of throwing about 10% of your data (sample size) away.

Before I go further, let me allow you to expand on your concern. BTW I agree with Rob that FIML with everyone you've got is the way to go.

> mxCompare( fit, subs <- list(fitCov, fitEMTVO, fitEMTVZ) ) base comparison ep minus2LL df AIC diffLL diffdf p 1 twoSATja 28 7345.9564 4031 -716.04362 NA NA NA 2 twoSATja testCov 24 7477.7717 4035 -592.22831 131.8153087 4 1.5927412e-27 3 twoSATja eqThMVarTwin 22 7352.7322 4037 -721.26780 6.7758149 6 3.4207832e-01 4 twoSATja eqThMVarZyg 19 7365.9276 4040 -714.07242 19.9711958 9 1.8091000e-02

Log in or register to post comments

Replied on Fri, 08/25/2017 - 06:04

JuanJMV Joined: Jul 20, 2016

Thank you so much for these

Thank you so much for these really helpful comments I really appreciate it. You are right my sample is big 965 twin pairs and 218 unpaired twins.
Since, the covariates (sex and age) are hugely significant should I think about sex-limitation model? I don’t know when is suitable fit that model. I want to be sure that I am fitting the correct model.

Thank you so much guys again.

Log in or register to post comments

Replied on Tue, 08/29/2017 - 04:11

neale Joined: Jul 31, 2009

Sex limitation is different

Large sex differences in mean do not necessarily imply that variance components will differ. These are different models. Ideally, we should start with a hypothesis about how things work, but an exploratory analysis of GxSex might seem worthwhile since at least one statistic differs. It would also be useful to know about the variances in males vs. females - if the variances differ it is reasonably to ask why - is there more genetic variation in men than women, for example. Note that in multivariate sex limitation modeling there are advantages to a correlated factors model over a Cholesky, because the latter can mask measurement non-invariance between the sexes, and has uncomfortable statistical properties (e.g., changing the order of the variables changes the fit of the model, because it is not saturated across sex, even though it is within each one). See Neale, M.C., Røysamb, E., and Jacobson, K. (2006c). Multivariate genetic analysis of sex limitation and g x e interaction. Twin research and human genetics : the official journal of the International Society for Twin Studies, 9(4):481-89 for details.

Log in or register to post comments

Replied on Wed, 09/13/2017 - 12:36

JuanJMV Joined: Jul 20, 2016

Thank you so much I really

Thank you so much I really appreciate it. Just one more thing, I am working with a dichotomous variable then I think if the mean differs necessarily the variance will differ, am I wrong?
My means are 0.14 for males and 0.28 for females. Should I do an exploratory GxSex model?

Thank you so much again!

Log in or register to post comments

Log in or register to post comments

News

Recent Posts

doubt with missing data

Don't remove incomplete data rows

Thank you so much Rob,

Are you worried about equal means across zygosity?

Thank you so much for these

Sex limitation is different

Thank you so much I really