Include covariates in multivariate cholesky model

Posted on
No user picture. sealx017 Joined: 08/17/2018

Hi,

I have a dataset of twins on 4 time-points and a covariate which does not vary over time. I can fit the Multivariate cholesky decompositon model with ACE components but stuck at the case where I want to include that covariate at each time-point.

This how I was including the covariate (snp),

defSNP <- mxMatrix( type="Full", nrow=1, ncol=2, free=FALSE, labels=c("data.snp1","data.snp2"), name="snp" ) {note that snp1 corresponds to twin 1 and snp2 corresponds to twin 2)

pathB <- mxMatrix( type="Full", nrow=1, ncol=4, free=TRUE, values=.01, labels=c("b11","b12","b13","b14"), name="b")

meanG <- mxMatrix( type="Full", nrow=1, ncol=8, free=TRUE,
values=svMe, labels=labFull("me",1,4), name="expMean" )
expMean <- mxAlgebra( expression= meanG + cbind(b%*%snp,b%*%snp), name="expMean" )

{have also tried expression=meanG + b%*% snp}

It gives the error,

In model 'CholACE' Optimizer returned a non-zero status code 5. The Hessian at the solution does not appear to be convex (Mx status RED)

and starting values of b as their final estimates and sd 0. Clearly, I have been specifying it wrong, can anybody help?

And another question, how does OpenMx calculates the estimates? In my understanding, these are the MLE's obtained by maximizing a multivariate likelihood? Is that correct?

Replied on Mon, 08/20/2018 - 16:13
Picture of user. AdminRobK Joined: 01/24/2014

It gives the error,

In model 'CholACE' Optimizer returned a non-zero status code 5. The Hessian at the solution does not appear to be convex (Mx status RED)

That's a warning, not an error. If an error had been thrown, `mxRun()` would not have returned a value.

I am surprised your model even ran, because of this:

expMean <- mxAlgebra( expression= meanG + cbind(b%*%snp,b%*%snp), name="expMean" )

Matrix 'b' is 1x4, whereas 'snp' is 1x2. Those two matrices are not conformable for matrix multiplication. The reason why your model actually ran in spite of the matrix nonconformability is probably because of this:

meanG <- mxMatrix( type="Full", nrow=1, ncol=8, free=TRUE,
values=svMe, labels=labFull("me",1,4), name="expMean" )
expMean <- mxAlgebra( expression= meanG + cbind(b%*%snp,b%*%snp), name="expMean" )

You can't give two different objects the same name. Try changing `meanG`'s `name` to "Intercepts" (because that is what it contains), and replace meanG in the definition of `expMean` to `Intercepts`. I bet you gave argument `means="expMean"` to `mxExpectationNormal()` (right?), and OpenMx found the matrix with R symbol `meanG` "first," and used it. That would explain why the regression slopes didn't move from their starting values, and could explain why OpenMx never threw a nonconformability error.

The way to to fix the nonconformability with the fewest changes is probably to change that line to:

expMean <- mxAlgebra( expression= Intercepts + cbind(b*snp[1,1], b*snp[1,2]), name="expMean" )

This new syntax uses elementwise multiplication rather than matrix multiplication. Another possibility would be this:

expMean <- mxAlgebra( expression= Intercepts + (snp %x% b), name="expMean" )

This uses Kronecker multiplication instead.

Be advised that the labels you're providing indicate that each timepoint will have a different regression intercept and a different regression slope onto the SNP. Is that what you want to do?

Replied on Mon, 08/20/2018 - 16:25
No user picture. sealx017 Joined: 08/17/2018

In reply to by AdminRobK

Yes that is what I want to do (different intercept and slope at each time-point)!

I tried your suggestion but getting still the same problem,

name matrix row col Estimate Std.Error A lbound ubound
1 b11 b 1 1 0.01000000 NA !
2 b12 b 1 2 0.01000000 NA !
3 b13 b 1 3 0.01000000 NA !
4 b14 b 1 4 0.01000000 NA !
5 me_1_1 expMean 1 1 0.02109935 6.460310 !
6 me_1_2 expMean 1 2 0.11031838 7.343413
7 me_1_3 expMean 1 3 0.07748710 7.853076 !
8 me_1_4 expMean 1 4 -0.22722456 8.414228 !
9 a_1_1 a 1 1 6.96085244 8.672234 1e-04

Replied on Mon, 08/20/2018 - 16:33
Picture of user. AdminRobK Joined: 01/24/2014

In reply to by sealx017

Yes that is what I want to do (different intercept and slope at each time-point)!

OK, just making sure.

I tried your suggestion but getting still the same problem

To be clear, I'm saying that the syntax defining your model's means should look like this,

meanG <- mxMatrix( type="Full", nrow=1, ncol=8, free=TRUE, values=svMe, labels=labFull("me",1,4), name="Intercepts" )
expMean <- mxAlgebra( expression= Intercepts + cbind(b*snp[1,1], b*snp[1,2]), name="expMean" )

, and that both `meanG` and `expMean` have to be put into your MxModels ("groups"). If that's what you've done, then you'll have to post your full script for me to tell what's going wrong. BTW, what do you get from `mxVersion()`?

Replied on Mon, 08/20/2018 - 16:53
No user picture. sealx017 Joined: 08/17/2018

In reply to by AdminRobK

mxVersion is 2.6.7.

Here is part of my script. Please let me know if you can see obvious errors.


defSNP <- mxMatrix( type="Full", nrow=1, ncol=2, free=FALSE, labels=c("data.snp1","data.snp2"), name="snp" )

pathB <- mxMatrix( type="Full", nrow=1, ncol=nv, free=TRUE, values=.01, labels=c("b11","b12","b13","b14"), name="b")

#pathB <- mxMatrix( type="Full", nrow=1, ncol=1, free=TRUE, values=.01, labels=c("b11"), name="b")

# ------------------------------------------------------------------------------
# PREPARE GENETIC MODEL

# ------------------------------------------------------------------------------
# Cholesky Decomposition ACE Model
# ------------------------------------------------------------------------------
# Set Starting Values
svMe <- c(0,0,0,0) # start value for means
svPa <- valDiag(nv,.6) # start values for parameters on diagonal
lbPa <- valLUDiag(nv,.0001,-10,NA) # lower bounds for parameters on diagonal

# Matrices declared to store a, c, and e Path Coefficients
pathA <- mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE,
values=svPa, labels=labLower("a",nv), lbound=lbPa, name="a" )
pathC <- mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE,
values=svPa, labels=labLower("c",nv), lbound=lbPa, name="c" )
pathE <- mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE,
values=svPa, labels=labLower("e",nv), lbound=lbPa, name="e" )

# Matrices generated to hold A, C, and E computed Variance Components
covA <- mxAlgebra( expression=a %*% t(a), name="A" )
covC <- mxAlgebra( expression=c %*% t(c), name="C" )
covE <- mxAlgebra( expression=e %*% t(e), name="E" )

# Algebra to compute total variances and standard deviations (diagonal only)
covP <- mxAlgebra( expression=A+E, name="V" )
matI <- mxMatrix( type="Iden", nrow=nv, ncol=nv, name="I")
invSD <- mxAlgebra( expression=solve(sqrt(I*V)), name="iSD")

# Algebra for expected Mean and Variance/Covariance Matrices in MZ & DZ twins
meanG <- mxMatrix( type="Full", nrow=1, ncol=8, free=TRUE, values=svMe, labels=labFull("me",1,4), name="Intercepts" )
expMean <- mxAlgebra( expression= Intercepts + snp%*%b, name="expMean" )

covMZ <- mxAlgebra( expression= rbind( cbind(V, A), cbind(A, V)), name="expCovMZ" )
covDZ <- mxAlgebra( expression= rbind( cbind(V, 0.5%x%A), cbind(0.5%x%A, V)), name="expCovDZ" )

# Data objects for Multiple Groups
dataMZ <- mxData( observed=mzData, type="raw" )
dataDZ <- mxData( observed=dzData, type="raw" )

# Objective objects for Multiple Groups
objMZ <- mxFIMLObjective( covariance="expCovMZ", means="expMean", dimnames=selVars )
objDZ <- mxFIMLObjective( covariance="expCovDZ", means="expMean", dimnames=selVars )

# Combine Groups
pars <- list( pathB,pathA, pathE, covA, covE, covP, matI, invSD, expMean, meanG )

modelMZ <- mxModel( pars, covMZ, dataMZ, objMZ, name="MZ" )
modelDZ <- mxModel( pars, covDZ, dataDZ, objDZ, name="DZ" )
minus2ll <- mxAlgebra( expression=MZ.objective + DZ.objective, name="m2LL" )
obj <- mxAlgebraObjective( "m2LL" )
CholAceModel <- mxModel( "CholACE", pars, modelMZ, modelDZ, minus2ll, obj )

CholAceFit <- mxRun(CholAceModel)

Now the error I am getting is : Error: Unknown reference 'snp' detected in the entity 'expMean' in model 'CholACE'

Thanks for the prompt replies!

Replied on Mon, 08/20/2018 - 18:17
Picture of user. AdminRobK Joined: 01/24/2014

In reply to by sealx017

mxVersion is 2.6.7.

The most recent "real" release is 2.9.9. I recommend updating. We repair bugs every release. For right now, for reasons I don't care to explain, I recommend installing our build of OpenMx, rather than CRAN's; that goes double if you use a Mac (note that, from the full output of `mxVersion()`, I would know whether you installed our build or CRAN's, and whether or not you use a Mac).

`defSNP` must be included in the two groups, too. That explains the error you're getting. Redefine `pars` as:

pars <- list( pathB, pathA, pathE, covA, covE, covP, matI, invSD, expMean, meanG, defSNP )

There's no need at all to have `pars` in the container MxModel. In fact, the container need only contain the groups and the omnibus fitfunction. Replace this,

minus2ll <- mxAlgebra( expression=MZ.objective + DZ.objective, name="m2LL" )
obj <- mxAlgebraObjective( "m2LL" )
CholAceModel <- mxModel( "CholACE", pars, modelMZ, modelDZ, minus2ll, obj )

, with this,

obj <- mxFitFunctionMultigroup(c("MZ","DZ"))
CholAceModel <- mxModel( "CholACE", modelMZ, modelDZ, obj )

.

This,

expMean <- mxAlgebra( expression= Intercepts + snp%*%b, name="expMean" )

, still doesn't resolve the nonconformability. 'snp' is 1x2, and 'b' is 1x4. You probably meant to use %x% (the Kronecker operator), rather than %*% (the matrix-multiplication operator), right? That would give a 1x8 result.

I think you should make the elements of `pathC` fixed, rather than free. 'C' is never used in defining 'V', or either of the groups' expected covariance matrices. Consequently, if its elements are free, they will be unidentified parameters.

Per this,

# Objective objects for Multiple Groups
objMZ <- mxFIMLObjective( covariance="expCovMZ", means="expMean", dimnames=selVars )
objDZ <- mxFIMLObjective( covariance="expCovDZ", means="expMean", dimnames=selVars )

, you are indirectly using both `mxExpectationNormal()` and `mxFitFunctionML()`. So, yes, the point estimates maximize a multivariate-normal likelihood function.

Replied on Mon, 08/20/2018 - 23:04
No user picture. sealx017 Joined: 08/17/2018

In reply to by AdminRobK

Are you saying if I define V as A+C+E and the MZ correlation matrix as A+C and DZ correlation as 0.5A+C, i.e
covMZ <- mxAlgebra( expression= rbind( cbind(V, A+C), cbind(A+C, V)), name="expCovMZ" )
covDZ <- mxAlgebra( expression= rbind( cbind(V, 0.5%x%A+C), cbind(0.5%x%A+C, V)), name="expCovDZ" )

there will be identifiability issue?

I did the changes you told like this,

mzData <- subset(testData, zyg2==1, c(selVars,"snp1","snp2"))
dzData <- subset(testData, zyg2==2, c(selVars,"snp1","snp2"))
describe(mzData, skew=F)
describe(dzData, skew=F)
dim(mzData)
dim(dzData)

# Generate Descriptive Statistics
colMeans(mzData,na.rm=TRUE)
colMeans(dzData,na.rm=TRUE)
cov(mzData,use="complete")
cov(dzData,use="complete")
cor(mzData,use="complete")
cor(dzData,use="complete")

# ------------------------------------------------------------------------------
# PREPARE MODEL

# ACE Model
# Create Matrices for Covariates and linear Regression Coefficients
defSNP <- mxMatrix( type="Full", nrow=1, ncol=2, free=FALSE, labels=c("data.snp1","data.snp2"), name="snp" )

pathB <- mxMatrix( type="Full", nrow=1, ncol=nv, free=TRUE, values=.01, labels=c("b11","b12","b13","b14"), name="b")

# ------------------------------------------------------------------------------
# PREPARE GENETIC MODEL

# ------------------------------------------------------------------------------
# Cholesky Decomposition ACE Model
# ------------------------------------------------------------------------------
# Set Starting Values
svMe <- c(0,0,0,0) # start value for means
svPa <- valDiag(nv,.6) # start values for parameters on diagonal
lbPa <- valLUDiag(nv,.0001,-10,NA) # lower bounds for parameters on diagonal

# Matrices declared to store a, c, and e Path Coefficients
pathA <- mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE,
values=svPa, labels=labLower("a",nv), lbound=lbPa, name="a" )
pathC <- mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE,
values=svPa, labels=labLower("c",nv), lbound=lbPa, name="c" )
pathE <- mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE,
values=svPa, labels=labLower("e",nv), lbound=lbPa, name="e" )

# Matrices generated to hold A, C, and E computed Variance Components
covA <- mxAlgebra( expression=a %*% t(a), name="A" )
covC <- mxAlgebra( expression=c %*% t(c), name="C" )
covE <- mxAlgebra( expression=e %*% t(e), name="E" )

# Algebra to compute total variances and standard deviations (diagonal only)
covP <- mxAlgebra( expression=A+E, name="V" )
matI <- mxMatrix( type="Iden", nrow=nv, ncol=nv, name="I")
invSD <- mxAlgebra( expression=solve(sqrt(I*V)), name="iSD")

# Algebra for expected Mean and Variance/Covariance Matrices in MZ & DZ twins
meanG <- mxMatrix( type="Full", nrow=1, ncol=8, free=TRUE, values=svMe, labels=labFull("me",1,4), name="Intercepts" )
expMean <- mxAlgebra( expression= Intercepts + snp%x%b, name="expMean" )
#threG <- mxMatrix( type="Full", nrow=1, ncol=1, free=TRUE, values=svTh, labels="tob", name="threG" )

covMZ <- mxAlgebra( expression= rbind( cbind(V, A), cbind(A, V)), name="expCovMZ" )
covDZ <- mxAlgebra( expression= rbind( cbind(V, 0.5%x%A), cbind(0.5%x%A, V)), name="expCovDZ" )

# Data objects for Multiple Groups
dataMZ <- mxData( observed=mzData, type="raw" )
dataDZ <- mxData( observed=dzData, type="raw" )

# Objective objects for Multiple Groups
objMZ <- mxFIMLObjective( covariance="expCovMZ", means="expMean", dimnames=selVars )
objDZ <- mxFIMLObjective( covariance="expCovDZ", means="expMean", dimnames=selVars )

# Combine Groups
pars <- list( pathB,pathA, pathE, covA, covE, covP, matI, invSD, expMean, meanG,defSNP )

modelMZ <- mxModel( pars, covMZ, dataMZ, objMZ, name="MZ" )
modelDZ <- mxModel( pars, covDZ, dataDZ, objDZ, name="DZ" )

obj <- mxFitFunctionMultigroup(c("MZ","DZ"))
CholAceModel <- mxModel( "CholACE", modelMZ, modelDZ, obj )
# ------------------------------------------------------------------------------
# RUN GENETIC MODEL

# Run Cholesky Decomposition ACE model
CholAceFit <- mxRun(CholAceModel)

It gave me this output which looks completely wrong. I must have been doing something terrible again.

Summary of CholACE

The Hessian at the solution does not appear to be convex. See ?mxCheckIdentification for possible diagnosis (Mx status RED).

free parameters:
name matrix row col Estimate Std.Error A lbound ubound
1 b11 MZ.b 1 1 -27.20759 NA !
2 b12 MZ.b 1 2 -224.11240 NA !
3 b13 MZ.b 1 3 -370.87835 NA !
4 b14 MZ.b 1 4 -112.26553 333.04641 !
5 a_1_1 MZ.a 1 1 4667.34769 86.51661 ! 1e-04
6 a_2_1 MZ.a 2 1 20682.16532 245.52471 ! -10
7 a_3_1 MZ.a 3 1 21722.51086 350.99867 ! -10
8 a_4_1 MZ.a 4 1 15938.90621 508.78834 ! -10
9 a_2_2 MZ.a 2 2 2382.89184 60.56751 ! 1e-04
10 a_3_2 MZ.a 3 2 14565.69906 NA ! -10
11 a_4_2 MZ.a 4 2 15827.37740 376.90650 ! -10
12 a_3_3 MZ.a 3 3 5026.49224 263.77406 ! 1e-04
13 a_4_3 MZ.a 4 3 10470.90915 343.97896 ! -10
14 a_4_4 MZ.a 4 4 4092.21384 112.26283 ! 1e-04
15 e_1_1 MZ.e 1 1 5753.13642 NA ! 1e-04
16 e_2_1 MZ.e 2 1 23778.28209 NA ! -10
17 e_3_1 MZ.e 3 1 20203.34749 345.33080 ! -10
18 e_4_1 MZ.e 4 1 17557.47416 516.47068 ! -10
19 e_2_2 MZ.e 2 2 3975.49983 36.51318 ! 1e-04
20 e_3_2 MZ.e 3 2 11110.07979 208.75055 ! -10
21 e_4_2 MZ.e 4 2 18005.85817 384.88987 ! -10
22 e_3_3 MZ.e 3 3 716.01276 18.64256 ! 1e-04
23 e_4_3 MZ.e 4 3 9470.73575 434.10333 ! -10
24 e_4_4 MZ.e 4 4 608.89909 36.08310 ! 1e-04
25 me_1_1 MZ.Intercepts 1 1 -122.61547 132.30191 !
26 me_1_2 MZ.Intercepts 1 2 -257.22064 524.60315 !
27 me_1_3 MZ.Intercepts 1 3 -401.34008 432.53585 !
28 me_1_4 MZ.Intercepts 1 4 102.38447 NA !

Model Statistics:
| Parameters | Degrees of Freedom | Fit (-2lnL units)
Model: 28 3972 76366.34
Saturated: NA NA NA
Independence: NA NA NA
Number of observations/statistics: 500/4000

** Information matrix is not positive definite (not at a candidate optimum).
Be suspicious of these results. At minimum, do not trust the standard errors.

Information Criteria:
| df Penalty | Parameters Penalty | Sample-Size Adjusted
AIC: 68422.34 76422.34 76425.79
BIC: 51681.92 76540.35 76451.48
CFI: NA
TLI: 1 (also known as NNFI)
RMSEA: 0 [95% CI (NA, NA)]
Prob(RMSEA <= 0.05): NA
To get additional fit indices, see help(mxRefModels)
timestamp: 2018-08-20 22:01:38
Wall clock time: 0.440042 secs
optimizer: CSOLNP
OpenMx version number: 2.9.9
Need help? See help(mxSummary)

Replied on Tue, 08/21/2018 - 10:46
Picture of user. AdminRobK Joined: 01/24/2014

In reply to by sealx017

Concerning identifiability, I realize now that's not a concern, because you don't put `pathC` into your MxModels. Therefore, it doesn't matter if its elements are fixed or free--you're simply not using them.

It gave me this output which looks completely wrong. I must have been doing something terrible again.

Even without knowing the scale your phenotype is on, I'd agree that those results don't look so good. But, I don't see anything obviously wrong with your script. Here are some things you could try:

  • Make sure the model is identified. mxCheckIdentification() can be helpful.
  • Change the start values. You can find plenty of discussion about start values on these forums if you do a content search.
  • Replace mxRun() with mxTryHard().
  • Switch optimizers. mxOption(NULL,"Default optimizer","SLSQP") will switch from CSOLNP to SLSQP. If your OpenMx installation is NPSOL-enabled, mxOption(NULL,"Default optimizer","NPSOL") will switch to NPSOL.
Replied on Tue, 08/21/2018 - 11:35
No user picture. sealx017 Joined: 08/17/2018

In reply to by AdminRobK

Ok I will try that.

Going back to the question of breaking V as, V=A+C+E and specifying MZ cov as A+C and DZ cov as 0.5A+C, will it be unidentifiable in that case? I am asking this because I simulated data from a setup with the above variance-covariance structure i.e [A+C+E A+C; A+C A+C+E] for MZ and [A+C+E 0.5A+C; 0.5A+C A+C+E] for DZ and looked at OpenMx output. Estimate of V looked fine but separately A, C and E estimates were different than the original. Am wondering if this non-uniqueness of A, C and E components is trivial. Or I am doing something wrong.

And also, what does keeping elements free or fixed mean?

Replied on Tue, 08/21/2018 - 12:58
Picture of user. AdminRobK Joined: 01/24/2014

In reply to by sealx017

Going back to the question of breaking V as, V=A+C+E and specifying MZ cov as A+C and DZ cov as 0.5A+C, will it be unidentifiable in that case?

No, I don't see why it would be.

Estimate of V looked fine but separately A, C and E estimates were different than the original.

I'm not sure what you mean by "different from the original", but for a given sample size, the estimate of the total variance will have more statistical precision than any of the variance components.

And also, what does keeping elements free or fixed mean?

Each element of an MxMatrix is either fixed or free, as set by the value provided for argument `free` to `mxMatrix()`. An element that is free is a free parameter the value of which will be changed by the numerical optimizer during its search for a solution. An element that is fixed is treated as constant by the optimizer.

Replied on Mon, 08/20/2018 - 16:18
Picture of user. AdminRobK Joined: 01/24/2014

And another question, how does OpenMx calculates the estimates? In my understanding, these are the MLE's obtained by maximizing a multivariate likelihood? Is that correct?

If you're using `mxFitFunctionML()` (which you probably are), then yes, that's correct.