You are here

Adding covariates to Cholesky script

10 posts / 0 new
Last post
Mirch's picture
Offline
Joined: 10/26/2018 - 01:21
Adding covariates to Cholesky script

Hi there, I am a new PhD student working on a project with twins for the first time. I've been provided with some older scripts from past students and some example code from a twin analysis workshop in Queensland that I haven't been able to attend yet (which I believe was run by Sarah Medland).

The code from the workshop is for a Cholesky model without covariates. I would like to regress out sex, age, and education from the mean, but otherwise use the same script. I am having trouble incorporating this and can't see what I'm doing wrong based on the other scripts I have that do have covariates (but are not in the same format as what I would like).

This is my edited code from the workshop:

require(OpenMx)
require(psych)
library(tidyverse)
 
# Read in data file 
 
wellbeing <- read.csv("CleanData/FINAL_EC_Cent_twins.csv", header = TRUE)
glimpse(wellbeing)
#summary(wellbeing)
#describe(wellbeing)
 
nv  <- 5          # number of variables
ntv <- 2*nv         # number of variables*max family size
 
# identify variables
Vars    <- c('Z_Wellbeing','Alpha','Beta','Theta','Delta')
selVars <- paste(Vars,c(rep(1,nv),rep(2,nv)),sep="") 
 
# identify covariates
# Covs    <- c('male' 'Age','Education','Beta','Theta','Delta') # use if you want to use same method as for Vars
selCovs <- c('male', 'Age1', 'Education1', 'Age2', 'Education2')
 
#allVars <- c(selVars, selCovs) # list with variabels and covariates
 
# This yields a list of the 8 variables for analysis 
# Satisfaction1   Positivity1   Wellbeing1   Mastery1       Satisfaction2   Positivity2   Wellbeing2   Mastery2
 
mzData <- subset(wellbeing, mz==1, c(selVars, selCovs))
dzData <- subset(wellbeing, mz==0, c(selVars, selCovs))
 
# Create start values 
Stmean <-colMeans(mzData[,1:nv],na.rm=TRUE)
 
# Create Labels for Lower Triangular Matrices
aLabs <- paste("a", do.call(c, sapply(seq(1, nv), function(x){ paste(x:nv, x,sep="") })), sep="")
cLabs <- paste("c", do.call(c, sapply(seq(1, nv), function(x){ paste(x:nv, x,sep="") })), sep="")
eLabs <- paste("e", do.call(c, sapply(seq(1, nv), function(x){ paste(x:nv, x,sep="") })), sep="")
 
# Create Labels for Column and Diagonal Matrices
mLabs       <- paste("m",1:nv,sep="")
 
 
# Specify Cholesky ACE Decomposition 
# -------------------------------------------------------------------------------------------------
 
# Matrices declared to store a, c, and e Path Coefficients
pathA   <- mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE, values=.4, labels=aLabs, name="a" )
pathC   <- mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE, values=.1, labels=cLabs, name="c" )
pathE   <- mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE, values=.2, labels=eLabs, name="e" )
 
# Matrices generated to hold A, C, and E computed Variance Components
covA    <- mxAlgebra( expression=a %*% t(a), name="A" )
covC    <- mxAlgebra( expression=c %*% t(c), name="C" ) 
covE    <- mxAlgebra( expression=e %*% t(e), name="E" )
 
# Algebra to compute total variances and standard deviations (diagonal only)
covP    <- mxAlgebra( expression=A+C+E, name="V" )
matI    <- mxMatrix( type="Iden", nrow=nv, ncol=nv, name="I")
invSD   <- mxAlgebra( expression=solve(sqrt(I*V)), name="iSD")
 
# Algebra to compute Correlations
invSDa<- mxAlgebra( expression=solve(sqrt(I*A)), name="iSDa")
invSDc<- mxAlgebra( expression=solve(sqrt(I*C)), name="iSDc")
invSDe<- mxAlgebra( expression=solve(sqrt(I*E)), name="iSDe")
Rph <- mxAlgebra( expression=iSD  %&% V , name="Phcor")
Rg  <- mxAlgebra( expression=iSDa %&% A , name="Acor")
Rc  <- mxAlgebra( expression=iSDc %&% C , name="Ccor")
Re  <- mxAlgebra( expression=iSDe %&% E , name="Ecor")
 
# Algebras to compute Standardized Variance Components
rowVars <- rep('v',nv)
colVars <- rep(c('A','C','E','h2','c2','e2'),each=nv)
estVars <- mxAlgebra( expression=cbind(A,C,E,A/V,C/V,E/V), name="est", dimnames=list(rowVars,colVars))
 
# Algebra for expected Mean and Variance/Covariance Matrices in MZ & DZ twins
## ----------------------------------------------------------------------
# MeanG   <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=TRUE, values=Stmean, labels=c(mLabs,mLabs), name="expMean" )
 
# Regress covariates out of the mean
# find intercept
meanG <- mxMatrix(type="Full", nrow=1, ncol=ntv, free=TRUE, values=Stmean, labels=c(mLabs,mLabs), name="meanG" )
 
# create matrices to store observed values for each covariate
Male <- mxMatrix(type="Full", nrow=1, ncol=ntv, free=FALSE,
                    labels=c("data.male","data.male"), name="Male") # note I only have 1 column for male, so no need to specify 'male1' and 'male2'
Age <- mxMatrix(type="Full", nrow=1, ncol=ntv, free=FALSE,
                   labels=c("data.Age1","data.Age2"), name="Age")
Edu <- mxMatrix(type="Full", nrow=1, ncol=ntv, free=FALSE,
                   labels=c("data.Education1","data.Education1"), name="Edu")
 
# create matrices to store coefficients
bMale <- mxMatrix(type="Full", nrow=1, ncol=1, free=TRUE,
                 values= .01, label="betaMale", name="bMale")
bAge <- mxMatrix(type="Full", nrow=1, ncol=1, free=TRUE,
                 values= .01, label="betaAge", name="bAge")
bEdu <- mxMatrix(type="Full", nrow=1, ncol=1, free=TRUE,
                 values= .01, label="betaEdu", name="bEdu")
 
# regression algebra for each covariate
# regMale <- mxAlgebra(bMale%*%Male, name="MaleR")
# regAge <- mxAlgebra(bAge%*%Age, name="AgeR")
# regEdu <- mxAlgebra(bEdu%*%Edu, name="EduR")
 
## Add Covariates to the mean
expMean <- mxAlgebra(meanG + (bMale%*%Male) + (bAge%*%Age) + (bEdu%*%Edu), name="expMean")
 
covMZ   <- mxAlgebra( expression= rbind( cbind(A+C+E , A+C),
                                            cbind(A+C , A+C+E)), name="expCovMZ" )
covDZ   <- mxAlgebra( expression= rbind( cbind(A+C+E       , 0.5%x%A+C),
                                            cbind(0.5%x%A+C , A+C+E)), name="expCovDZ" )
 
# Data objects for Multiple Groups
dataMZ  <- mxData( observed=mzData, type="raw" )
dataDZ  <- mxData( observed=dzData, type="raw" )
 
# Objective objects for Multiple Groups
objMZ   <- mxExpectationNormal( covariance="expCovMZ", means="expMean", dimnames=selVars)
objDZ   <- mxExpectationNormal( covariance="expCovDZ", means="expMean", dimnames=selVars)
fit <- mxFitFunctionML()
 
# Combine Groups
pars        <- list( pathA, pathC, pathE, covA, covC, covE, covP, matI, invSD, invSDa,  invSDc, invSDe, Rph, Rg, Rc, Re, estVars ) 
modelMZ <- mxModel( pars, covMZ, expMean, dataMZ, objMZ, fit, name="MZ" )
modelDZ <- mxModel( pars, covDZ, expMean, dataDZ, objDZ, fit, name="DZ" )
fitmg <- mxFitFunctionMultigroup(c('MZ', 'DZ'))
## the below lines were updated using mxFitFunctionMultigroup()
## minus2ll   <- mxAlgebra( expression=MZ.objective + DZ.objective, name="m2LL" )
## obj        <- mxFitFunctionAlgebra( "m2LL" )
 
conf1       <- mxCI (c ('MZ.est[1,13]', 'MZ.est[2,14]', 'MZ.est[3,15]', 'MZ.est[4,16]') )      # h2
conf2       <- mxCI (c ('MZ.est[1,17]', 'MZ.est[2,18]', 'MZ.est[3,19]', 'MZ.est[4,20]') )      # c2
conf3       <- mxCI (c ('MZ.est[1,21]', 'MZ.est[2,22]', 'MZ.est[3,23]', 'MZ.est[4,24]') )      # e2
conf4       <- mxCI (c ('MZ.Acor[2,1]', 'MZ.Acor[3,1]', 'MZ.Acor[4,1]', 'MZ.Acor[3,2]', 'MZ.Acor[4,2]', 'MZ.Acor[4,3]')) #Rg
conf5       <- mxCI (c ('MZ.Ccor[2,1]', 'MZ.Ccor[3,1]', 'MZ.Ccor[4,1]', 'MZ.Ccor[3,2]', 'MZ.Ccor[4,2]', 'MZ.Ccor[4,3]')) #Rc
conf6       <- mxCI (c ('MZ.Ecor[2,1]', 'MZ.Ecor[3,1]', 'MZ.Ecor[4,1]', 'MZ.Ecor[3,2]', 'MZ.Ecor[4,2]', 'MZ.Ecor[4,3]')) #Re
CholAceModel<- mxModel( "CholACE", pars, modelMZ, modelDZ, fitmg, conf1, conf2, conf3, conf4, conf5, conf6)
 
# ----------------------------------------------------------------------------------------------------------------------------------
# (model 2) RUN Cholesky Decomposition ACE Model
 
CholAceFit  <- mxRun(CholAceModel, intervals=F)
CholAceSumm <- summary(CholAceFit)
CholAceSumm 

When I run this, the error I get is:

Error: Unknown expected means name 'expMean' detected in the expectation function of model 'MZ'

I've also tried this:

 # Combine Groups
pars        <- list( pathA, pathC, pathE, covA, covC, covE, covP, matI, invSD, invSDa,  invSDc, invSDe, Rph, Rg, Rc, Re, estVars, bMale, bEdu, bAge, Male, Age, Edu ) 
modelMZ <- mxModel( pars, covMZ, meanG, expMean, dataMZ, objMZ, fit, name="MZ" )
modelDZ <- mxModel( pars, covDZ, meanG, expMean, dataDZ, objDZ, fit, name="DZ" )
fitmg <- mxFitFunctionMultigroup(c('MZ', 'DZ')) 

Which results in the error code:

Error: A definition variable 'data.male' has been declared in model 'CholACE' that does not contain a data set

When I check the names of my data I get:

 > names(mzData)
 [1] "Z_Wellbeing1" "Alpha1"       "Beta1"        "Theta1"       "Delta1"       "Z_Wellbeing2" "Alpha2"      
 [8] "Beta2"        "Theta2"       "Delta2"       "male"         "Age1"         "Education1"   "Age2"        
[15] "Education2"  
> names(dzData)
 [1] "Z_Wellbeing1" "Alpha1"       "Beta1"        "Theta1"       "Delta1"       "Z_Wellbeing2" "Alpha2"      
 [8] "Beta2"        "Theta2"       "Delta2"       "male"         "Age1"         "Education1"   "Age2"        
[15] "Education2"  

I spent hours playing around with it trying to get it to run on Friday and everything I've tried has had an error so any help with the code would be appreciated. This is my first time posting on any code-related forum so please let me know if I need to supply any more details or data.

Thank you!

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
please provide a reproducable example

You'll need to attach a complete example that we can run. Include the R script and a few representative rows of data that will allow us to reproduce the error.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
In this case, I think the

In this case, I think the issue is diagnosable solely from the information in the OP.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
some advice
When I run this, the error I get is:

Error: Unknown expected means name 'expMean' detected in the expectation function of model 'MZ'

Hmm. That's not a very helpful error message, is it? Anyhow, it looks like the problem is that none of expMean's dependencies is going into the MxModel object anyplace. You need to put meanG, Male, Age, Edu, bMale, bAge, and bEdu into the MZ and DZ MxModels somehow. As you discovered, you can't put them into the pars list if you're going to put pars into the container MxModel "CholACE", because the container MxModel contains no dataset of its own.

So, either put meanG, Male, Age, Edu, bMale, bAge, bEdu, and expMean directly into the MZ and DZ MxModels, or put them into pars and don't put pars into the container MxModel.

You will probably also want to re-think how you're setting up the regression onto age, sex, and education level. For instance, you have the phenotypes ordered so that twin #1's block of 5 is followed by twin #2's block of 5, but the labels in your Age MxMatrix aren't consistent with that:

$labels
     [,1]        [,2]        [,3]        [,4]        [,5]        [,6]        [,7]        [,8]        [,9]        [,10]      
[1,] "data.Age1" "data.Age2" "data.Age1" "data.Age2" "data.Age1" "data.Age2" "data.Age1" "data.Age2" "data.Age1" "data.Age2"

Furthermore, you only use "data.Education1", but not "data.Education2", as a label for Edu (is that deliberate?). Finally, each of your matrices of regression slopes is 1x1, and is being post-multiplied by a 1x10 matrix of definition variables. Do you really want the regression slopes w/r/t a given covariate to be constrained equal across all 5 phenotypes? Perhaps you want to try something like this instead?:

# create matrices to store observed values for each covariate
Male <- mxMatrix(type="Full", nrow=1, ncol=2, free=FALSE,
                                 labels=c("data.male","data.male"), name="Male") # note I only have 1 column for male, so no need to specify 'male1' and 'male2'
Age <- mxMatrix(type="Full", nrow=1, ncol=2, free=FALSE,
                                labels=c("data.Age1","data.Age2"), name="Age")
Edu <- mxMatrix(type="Full", nrow=1, ncol=2, free=FALSE,
                                labels=c("data.Education1","data.Education2"), name="Edu")
 
# create matrices to store coefficients
bMale <- mxMatrix(type="Full", nrow=1, ncol=5, free=TRUE,
                                    values= .01, label=paste("betaMale",1:5,sep=""), name="bMale")
bAge <- mxMatrix(type="Full", nrow=1, ncol=5, free=TRUE,
                                 values= .01, label=paste("betaAge",1:5,sep=""), name="bAge")
bEdu <- mxMatrix(type="Full", nrow=1, ncol=5, free=TRUE,
                                 values= .01, label=paste("betaEdu",1:5,sep=""), name="bEdu")
 
 
## Add Covariates to the mean
##  Note the use of the Kronecker-product operator, %x% :
expMean <- mxAlgebra(meanG + (Male%x%bMale) + (Age%x%bAge) + (Edu%x%bEdu), name="expMean")
Mirch's picture
Offline
Joined: 10/26/2018 - 01:21
Thank you so much for your

Thank you so much for your help, Rob.

You are correct, the double "data.Education1" was a mistake (I've swapped one for "data.Education2"), and I do want my covariates to vary across phenotypes. I've implemented those changes and moved the covariates matrices directly into the MZ and DZ models (and am now using the Kronecker-product as you pointed out).

It all runs now, which is excellent. The only thing I'm not sure about is how to fix the labels? I can't seem to replicate your list of age variables (with 10 values) as I only get

> Age$labels
     [,1]        [,2]       
[1,] "data.Age1" "data.Age2"

But I assume this pattern is just repeating x5 for the phenotypes. Could I just read in my variables using:

 > Vars <- c('Z_Wellbeing','Alpha','Beta','Theta','Delta') %>%
+   rep(rep(1), each = 2)
> selVars <- paste(Vars,c(rep(c(1,2),nv)),sep="")
> selVars
 [1] "Z_Wellbeing1" "Z_Wellbeing2" "Alpha1"       "Alpha2"       "Beta1"        "Beta2"        "Theta1"      
 [8] "Theta2"       "Delta1"       "Delta2"   

So that each variable is repeated for each twin? I've implemented this for now.

Thanks again,
Miranda

Mirch's picture
Offline
Joined: 10/26/2018 - 01:21
I should have run this all

I should have run this all the way through before posting. Changing this is definitely doing something different but I don't think it's correct because I got the following rA matrix:

 > # Print the rG, rC, rE matrices (above, in the summary output you will find them with 95% CI)
> CholAceFit$Acor
mxAlgebra 'Acor' 
$formula:  iSDa %&% A 
$result:
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    1    1    1
[2,]    1    1    1    1    1
[3,]    1    1    1    1    1
[4,]    1    1    1    1    1
[5,]    1    1    1    1    1
dimnames: NULL 

The matrix I get with the original code (errors fixed) is:

 mxAlgebra 'Acor' 
$formula:  iSDa %&% A 
$result:
            [,1]        [,2]       [,3]       [,4]       [,5]
[1,]  1.00000000 -0.02157329 -0.1882037 -0.2731677 -0.6819023
[2,] -0.02157329  1.00000000  0.6261026  0.6672235  0.7141160
[3,] -0.18820375  0.62610265  1.0000000  0.5182056  0.6364864
[4,] -0.27316768  0.66722346  0.5182056  1.0000000  0.7815057
[5,] -0.68190235  0.71411600  0.6364864  0.7815057  1.0000000
dimnames: NULL 

The examples I have of multivariate models with covariates (from previous lab members) have done this both ways - one person had variables as c("A1", "A2", "B1", "B2") and another had the variables as c("A1", "B1", "A2", "B2"). Both use the same format when setting up the regression variables (same as what I have used). I didn't use their code because both used a 'classic' Mx style and written in a way that made it quite difficult to add in my additional variables (and it was hard to tell what was going on!)

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
I think you're confused
It all runs now, which is excellent. The only thing I'm not sure about is how to fix the labels? I can't seem to replicate your list of age variables (with 10 values) as I only get

> Age$labels
     [,1]        [,2]       
[1,] "data.Age1" "data.Age2"

But I assume this pattern is just repeating x5 for the phenotypes.

If you're using the syntax I provided you, then Male, Age, and Edu are supposed to only have a two-element vector of labels, as you describe.

Could I just read in my variables using:

 > Vars <- c('Z_Wellbeing','Alpha','Beta','Theta','Delta') %>%
+   rep(rep(1), each = 2)
> selVars <- paste(Vars,c(rep(c(1,2),nv)),sep="")
> selVars
 [1] "Z_Wellbeing1" "Z_Wellbeing2" "Alpha1"       "Alpha2"       "Beta1"        "Beta2"        "Theta1"      
 [8] "Theta2"       "Delta1"       "Delta2"   

So that each variable is repeated for each twin? I've implemented this for now.

You don't want to do that, because if you do, then you'll need to completely respecify your model's covariance structure. For instance, this,

covMZ   <- mxAlgebra( expression= rbind( cbind(A+C+E , A+C),
                                            cbind(A+C , A+C+E)), name="expCovMZ" )
covDZ   <- mxAlgebra( expression= rbind( cbind(A+C+E       , 0.5%x%A+C),
                                            cbind(0.5%x%A+C , A+C+E)), name="expCovDZ" )

only works if the phenotypes are ordered as they were in your opening post. Trust me, the additional effort entailed isn't worth it.

Mirch's picture
Offline
Joined: 10/26/2018 - 01:21
Okay, great! I've gone back

Okay, great! I've gone back to my original running code and will keep the variables in that order.

Thanks for the clarification!

DryLabRebel's picture
Offline
Joined: 03/06/2018 - 01:34
Thanks!

Thanks, this thread was exactly what I needed. I had the same problem!

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
good to hear

Glad to hear it. That's what these forums are for!