Model including groups with both binary and continuous/ordinal data

Posted on
Picture of user. ebejer Joined: 03/18/2010
Hi,

I have heard that it is now possible to run a model estimating genetic and environmental effects with twin data that includes separate variables with binary and continuous (or ordinal?) coding.

I have been running a thresholds model with age and sex as covariates (which I will attach) and I would like to include the same behavioural measures from different data sets in the one model.

Two different scales have been used to collect this information (binary and continuousish), and I want to equate the A, C and E paths across datasets with the differing scales. I have been told this will necessitate the inclusion of a group for each dataset. But I am also unsure how to do this.

Is this possible and can you help? - and forgive me if I am missing some important technical detail.

Best wishes
Jane

Replied on Thu, 12/08/2011 - 21:24
Picture of user. neale Joined: 07/31/2009

Jane

Since the scales are different, there is likely to be differences in mean and variance between the two. Possibly, what you want to know is whether the proportions of A C and E are the same for the two scales?

The approach you take depends on whether the two scales were obtained from the same or different pairs of twins (i.e. the same sample for both scales or different samples). In the former case, you would not increase the number of groups, but make a multivariate model. In the latter - different samples - case you would add groups. Equating the proportions of variance could be done at least two ways. The simplest would be to use algebras to compute, e.g., A/(A+C+E) for the two measures, and to add a non-linear constraint that equates the two (all three could be cbind()'ed on each side of the ==, but since they are proportions, two would be sufficient). In the case of measures on the same sample, you would set up a multivariate analysis (say with a Cholesky model) and equate the two diagonal elements of the (now 2x2) A/(A+C+E) matrix and the C/(A+C+E) matrix. Extracting the elements from the diagonal could be done with A[1,1] type notation, so you might try A[1,1]/(A[1,1]+C[1,1}+E[1,1] == A[2,2]/(A[2,2]+C[2,2]+E[2,2]. I think. If it doesn't work then, well then we'll learn something making it do so :).

HTH
Mike

Replied on Sat, 12/10/2011 - 03:26
Picture of user. ebejer Joined: 03/18/2010

In reply to by neale

I am using data from two separate samples so I need to include separate calculation groups. And I do want to compare the genetic and environmental effects across the approximately same variables in the two samples. One variable represents a normal distribution (continuous) and the other is represented with a binary code.

Unfortunately I haven't been able to find a model running both continuous and ordinal /binary data including separate groups. So what I have done is put together a mishmash of script that I though might work - but I don't really understand how the matrices will combine and the use of certain code. I will attach my script.

There is a matrix used to specify the values for each a, c and e path (cholSVnv) I'm not sure why - and I don't understand the abbreviation. I'm also unsure where and how to add additional groups. I thought perhaps as two sets of mz and dz models (one for each sample). And I'm not really up to interpreting the meaning of your comments yet Mike, sorry.

Can you please give me a tip on (1) how to add a calculation group, (2) whether I need a lower cholesky for 2 vars to represent the a, c and e paths (if that's what they are), and (3) what I need to specify in the objective function to include both variables in the model and (4) the order of how the matrices combine to provide the output.

perhaps I'm asking too much?

Jane

Replied on Sat, 12/10/2011 - 14:36
Picture of user. neale Joined: 07/31/2009

In reply to by ebejer

Jane

If you have two univariate scripts, one for the ordinal, and one for the continuous data, I would be tempted to try combining them with something like the following

jointACE <- mxModel(aceOrd, aceCont, mxAlgebraObjective(aceOrd.Objective+aceCont.Objective))

at least this would get you started. In fact I think you are on the right track with your script as is, but you need an mxAlgebraObjective to combine the objective functions from all four groups. Something like

mxAlgebra( expression=MZ1.objective+DZ1.objective+MZ2.objective+DZ2.objective, name="minus2sumloglikelihood" ),
mxAlgebraObjective("minus2sumloglikelihood")

Adding the mxConstraint would be needed to equate the heritabilities, per previous post.

Replied on Mon, 12/12/2011 - 23:21
Picture of user. ebejer Joined: 03/18/2010

In reply to by neale

Hello again,

I have found a model that seems to combine the two separate datasets, however it returns a code red and there are two SD's that are quite large - I have played around with the starting values and the threshold and what I have in the script appears to work best. I am wanting to get some feedback on the 'reasonableness' of the model I have specified and what I might need to change to change the code red status and reduce the standard errors. Any suggestions will be greatly appreciated (I'll attach the output and script below).

Thanks,
Jane

Replied on Wed, 12/14/2011 - 22:18
Picture of user. smedland Joined: 08/04/2009

In reply to by ebejer

We have a version of the script that works now. One data set has continuous data and the other has a binary measure of the same trait. To identify the ordinal model we have fixed e at 1 and estimated a and c.
The only way I can think of to 'equate' the standardised A,C and E estimates is through the mxConstraint function.
This does work and the results look sensible. But because we've used mxConstraint we lose the SE.
We can still get CI but it would be nice to have the se as well.

Can anyone think of an alturnate apporach?

thanks
Sarah (on behalf of Jane)

Replied on Thu, 12/15/2011 - 02:08
Picture of user. neale Joined: 07/31/2009

Set E to 1 for continuous variable also - and estimate an SD parameter for the continuous variable. So you pre- and post-multiply the continuous expCov by diag matrix with sd on diagonal. Then you have the option to equate A and C parameters without nonlinear constraint.
Replied on Fri, 06/15/2018 - 18:41
No user picture. massimiliano.orri Joined: 06/11/2018

Hi,
I am new to openMx, and I cannot figure out what is wrong in my model. I am trying to estimate a bivariate model in which one variable is continuous and the other is binary (for which I preliminarily used mxFactor). When I try to estimate the model, I obtain an error which says that my continuous variable is not categorical and that I should use mxFactor()...
I am pretty sure something is wrong in my model specification.
Here my model. Thanks for any help!
Massimiliano

# Continuous variable
nvC <- 1 # number of continuous variables
ntvC <- nv*2 # number of total continuous variables
selVarsC <- c('varC')

# Binary variable
nvB <- 1 # number of binary variables
ntvB <- nv*2 # number of total binary variables
selVarsB <- c('varB')

# Set Starting Values
svMe <- 7 # start value for means
svTh <- 0
svPa <- .2 # start value for path coefficient
svPe <- .5 # start value for path coefficient for e

# ----------------------------------------------------------------------------------------------------------------------
# PREPARE MODEL

# Create Algebra for expected Mean Matrices
meanG <- mxMatrix( type="Full", nrow=1, ncol=ntvC, free=TRUE, values=svMe, labels=labVars("mean",selVarsC), name="meanG" )

# Create Algebra for expected Thresholds Matrices
threshold <- mxMatrix( type="Full", nrow=1, ncol=ntvB, free=TRUE, values=svTh, labels=labVars("thre", selVarsB), name="threshold" )

# Create Matrices for Variance Components
covA <- mxMatrix( type="Symm", nrow=nv, ncol=nv, free=TRUE, values=valDiag(svPa,nv), labels=labLower("VA",nv), name="VA" )
covC <- mxMatrix( type="Symm", nrow=nv, ncol=nv, free=TRUE, values=valDiag(svPa,nv), labels=labLower("VC",nv), name="VC" )
covE <- mxMatrix( type="Symm", nrow=nv, ncol=nv, free=TRUE, values=valDiag(svPe,nv), labels=labLower("VE",nv), name="VE" )

# Create Algebra for expected Threshold Matrices
expThresh <- mxAlgebra( expression= threshold , name="expThresh" )

# Create Algebra for expected Variance/Covariance Matrices in MZ & DZ twins
covP <- mxAlgebra( expression= VA+VC+VE, name="V" )
covMZ <- mxAlgebra( expression= VA+VC, name="cMZ" )
covDZ <- mxAlgebra( expression= 0.5%x%VA+ VC, name="cDZ" )
expCovMZ <- mxAlgebra( expression= rbind( cbind(V, cMZ), cbind(t(cMZ), V)), name="expCovMZ" )
expCovDZ <- mxAlgebra( expression= rbind( cbind(V, cDZ), cbind(t(cDZ), V)), name="expCovDZ" )

# Create Data Objects for Multiple Groups
dataMZ <- mxData( observed=mzData, type="raw" )
dataDZ <- mxData( observed=dzData, type="raw" )

# Create Expectation Objects for Multiple Groups
expMZ <- mxExpectationNormal( covariance="expCovMZ", means="meanG", dimnames=selVars , thresholds="threshold")
expDZ <- mxExpectationNormal( covariance="expCovDZ", means="meanG", dimnames=selVars , thresholds="threshold")
funML <- mxFitFunctionML()

# Create Model Objects for Multiple Groups
pars <- list( meanG, threshold, covA, covC, covE, covP )
modelMZ <- mxModel( pars, covMZ, expCovMZ, dataMZ, expMZ, funML, name="MZ" )
modelDZ <- mxModel( pars, covDZ, expCovDZ, dataDZ, expDZ, funML, name="DZ" )
multi <- mxFitFunctionMultigroup( c("MZ","DZ") )

# Create Algebra for Standardization
matI <- mxMatrix( type="Iden", nrow=nv, ncol=nv, name="I" )
invSD <- mxAlgebra( expression=solve(sqrt(I*V)), name="iSD" )

# Calculate genetic and environmental correlations
corA <- mxAlgebra( expression=solve(sqrt(I*VA))%&%VA, name ="rA" ) #cov2cor()
corC <- mxAlgebra( expression=solve(sqrt(I*VC))%&%VC, name ="rC" )
corE <- mxAlgebra( expression=solve(sqrt(I*VE))%&%VE, name ="rE" )

# Calculate Phenotypic Correlation

corP <- mxAlgebra (expression=solve(sqrt(I*V)) %*% V %*% solve(sqrt(I*V)), name="rP")

# Create Algebra for Variance Components
rowUV <- rep('UV',nv)
colUV <- rep(c('VA','VC','VE'),each=nv)
estUV <- mxAlgebra( expression=cbind(VA,VC,VE), name="UV", dimnames=list(rowUV,colUV) )
rowSV <- rep('SV',nv)
colSV <- rep(c('SA','SC','SE'),each=nv)
estSV <- mxAlgebra( expression=cbind(VA/V,VC/V,VE/V), name="SV", dimnames=list(rowSV,colSV) )

# Create Confidence Interval Objects
ciUV <- mxCI( c("UV") )
ciSV <- mxCI( c("SV") )

# Build Model with Confidence Intervals
calc <- list( matI, invSD, corA, corC, corE,corP, estUV, estSV, ciUV, ciSV )
modelACE <- mxModel( "twoACEvc", pars, modelMZ, modelDZ, multi, calc )

# ----------------------------------------------------------------------------------------------------------------------
# RUN MODEL

# Run ACE Model
fitACE <- mxRun( modelACE, intervals=T )
sumACE <- summary( fitACE )

Replied on Mon, 06/18/2018 - 10:11
Picture of user. AdminRobK Joined: 01/24/2014

In reply to by massimiliano.orri

You're passing dimnames=selVars as argument to mxExpectationNormal(), but object selVars is nowhere defined in your script. But the fact that your script runs at all means that selVars existed in your R workspace. I bet selVars, however it was defined, doesn't contain the column names for your mxFactor'd binary variable.
Replied on Wed, 06/20/2018 - 11:35
No user picture. massimiliano.orri Joined: 06/11/2018

In reply to by AdminRobK

Thank you very much Rob. You are right, it was indeed because of that. I fixed it.

A follow up question, as I am still struggling specifying this model...I think my problem is that I don't understand how to specify the expectationObject when I have 1 continuous and 1 binary variable. The dimension of the matrices and the dimnames I specified is always wrong.

If I specify only the 2 binary variables (for twin 1 and twin2) in the dimnames option, it seems that I am missing 2 variables, because the expected covariance matrix has 4 rows and columns. I specified as follows:

varsC <- c('y1')
nvC <- 1 # number of continuous variables
ntvC <- nvC*2 # number of total continuous variables
selVarsC <- paste(varsC,c(rep("_T1",nvC),rep("_T2",nvC)),sep="")

varsB <- c('u1')
nvB <- 1 # number of binary variables
ntvB <- nvB*2 # number of total binary variables
selVarsB <- paste(varsB,c(rep("_T1",nvB),rep("_T2",nvB)),sep="")

# Create Matrices for expected Mean Matrices
meanG <- mxMatrix( type="Full", nrow=1, ncol=ntvC, free=TRUE, values=svMe, labels=labVars("mean",selVarsC), name="meanG" )

# Create Matrices for expected Thresholds Matrices
threshold <- mxMatrix( type="Full", nrow=1, ncol=ntvB, free=TRUE, values=svTh, labels=labVars("thre",selVarsB), name="threshold" )

[...]

# Create Expectation Objects for Multiple Groups
expMZ <- mxExpectationNormal( covariance="expCovMZ", means="meanG", dimnames=c(selVarsB), thresholds="threshold")
expDZ <- mxExpectationNormal( covariance="expCovDZ", means="meanG", dimnames=c(selVarsB), thresholds="threshold")

and the error I obtains is:
Error: The expected covariance matrix associated with the expectation function in model 'MZ' is not of the same length as the 'dimnames' argument provided by the expectation function. The 'dimnames' argument is of length 2 and the expected covariance matrix has 4 rows and columns.

If I specify all my variables in the dimnames option (as follows):

expMZ <- mxExpectationNormal( covariance="expCovMZ", means="meanG", dimnames=c(selVarsB,selVarsC), thresholds="threshold")
expDZ <- mxExpectationNormal( covariance="expCovDZ", means="meanG", dimnames=c(selVarsB,selVarsC), thresholds="threshold")

I obtain the following error, of course:
Error: The expected means vector associated with the expectation function in model 'MZ' is not of the same length as the 'dimnames' argument provided by the expectation function. The 'dimnames' argument is of length 4 and the expected means vector has 2 columns.

Should I perhaps have different expectation Objects for continuous and binary variables?
Or how can I specify that 2 variables are referring to the mean matrix while the 2 other variables to the threshold matrix?

I looked for an exemple of the openMx documentation, but I could not resolve the issue so far.

Thanks again for any help!

Replied on Wed, 06/20/2018 - 11:55
No user picture. massimiliano.orri Joined: 06/11/2018

In reply to by massimiliano.orri

Perhaps I should specify the threshnames argument in the mxExpectationNormal function:
It seems that now it's running...


# Create Expectation Objects for Multiple Groups
expMZ <- mxExpectationNormal( covariance="expCovMZ", means="meanG", dimnames=c(selVarsB,selVarsC),
thresholds="threshold", threshnames=selVarsB)
expDZ <- mxExpectationNormal( covariance="expCovDZ", means="meanG", dimnames=c(selVarsB,selVarsC),
thresholds="threshold", threshnames=selVarsB)
funML <- mxFitFunctionML()

Replied on Mon, 06/25/2018 - 14:08
Picture of user. AdminRobK Joined: 01/24/2014

In reply to by massimiliano.orri

Perhaps I should specify the threshnames argument in the mxExpectationNormal function

That's right.

Do I understand correctly that the dimnames you're passing to mxExpectationNormal() are, in order, "u1_T1", "u1_T2", "y1_T1", "y2_T2"? If so, something's wrong. From the way your script is written, it should be both traits for twin #1 followed by both traits for twin #2, e.g. "u1_T1", "y1_T1", "u1_T2", "y1_T2".

I also don't see anything in your script that fixes either the mean or the threshold for the binary trait, nor anything that identifies the scale (variance) of the binary trait. Just to be clear: when you analyze a threshold variable in OpenMx, the variable is treated as though it reflects an unobserved, normally distributed continuum that has been "chopped" into two or more ordered categories. What is really being modeled is that latent Gaussian trait. In the binary case, in order to identify your model, you need to assume some value for either the latent trait's mean or the value of the threshold, and you need to assume a value for some parameter that scales the latent trait. I can't really make any specific suggestions about that until I'm sure that you have the dimnames for your expectations correct. I am actually surprised that your syntax runs with this,

meanG <- mxMatrix( type="Full", nrow=1, ncol=ntvC, free=TRUE, values=svMe, labels=labVars("mean",selVarsC), name="meanG" )

, which only has 2 columns when it ought to have 4 (i.e., the binary variables need model-expected means, too).