estimating covariance matrix for saturated multivariate ACE

Posted on
Picture of user. Dorothy Bishop Joined: 02/04/2010

This is an elementary nonurgent question but an answer would help me understand OpenMx better.
In trying to teach myself a bit more I have been playing around with a couple of multivariate ACE twin scripts. There's one by Hermine Maes that I downloaded from:

http://www.vipbg.vcu.edu/~vipbg/OpenMx/MultivariateTwin_MatrixRaw.R

and another I think by Steve Boker that is in the slides from
http://www.vipbg.vcu.edu/~vipbg/OpenMx/OpenMxSession8.pdf

They differ in how they specify covariance matrices for the saturated model:

The Maes script gives:
mxMatrix( type="Lower", nrow=ntv, ncol=ntv, free=TRUE, values=pathstartvalue, name="CholMZ" ),
mxAlgebra( expression=CholMZ %*% t(CholMZ), name="ExpCovMZ" ),
mxAlgebra( expression=diag2vec(ExpCovMZ), name="ExpVarMZ"),

The Boker script gives:
mxMatrix( "Diag", ntv, ntv, free=T, values=meanestvalue, lbound=.001, name="EVmzf" ),
mxMatrix( "Stand", ntv, ntv, free=T, values=.5, lb=-.99, ub=.99, name="ERmzf" ),
mxAlgebra( expression= EVmzf %*% ERmzf %*% EVmzf, name="ECmzf" ),

I assume they are mathematically equivalent, with the Boker script estimating the correlation matrix (ERmzf) with the Stand matrix.
My problem is that when I try to run the Boker script, I get:

Error: The job for model 'multi' exited abnormally with the error message: Expected covariance matrix is not positive definite

I understand what the error message is saying – I'm just wondering why you'd want to estimate the covariance matrix this way, if it can give a 'not positive definite' result?

Or maybe this was an example to demonstrate the perils of not using a regular Cholesky decomposition ?

It's possible that the dataset I'm using is not the one it was written for.

Replied on Thu, 03/25/2010 - 13:16
Picture of user. neale Joined: Jul 31, 2009

Hi Dorothy

I would say that these models are not mathematically equivalent, since, as you have discovered, the correlation + SD approach can generate non-positive definite matrices, whereas the lower triangular (Cholesky) approach cannot, at least as long as the diagonal elements are bounded to be strictly positive.

My sense is that the two models have different intents. One is to estimate the covariance matrix, the other is to estimate the correlation matrix. Both are saturated, in the sense that there are as many parameters used to model the covariance structure as there are observed variances and covariances. They will usually yield the same -2lnL, but the Cholesky is more robust, and it is possible to compute the correlation matrix post-hoc from the Cholesky-generated expected covariance matrix. Note, however, that standard errors of the correlations are obtained directly (via summary()) in the correlation approach, whereas they are not in the Cholesky.

Replied on Thu, 03/25/2010 - 14:42
Picture of user. Dorothy Bishop Joined: Feb 04, 2010

In reply to by neale

Thanks. Admirably clear as always.
I've not enjoyed trying to do the algebra to compute standard errors of terms derived from the estimated paths so can see the wisdom of getting this estimated directly.

Replied on Tue, 03/30/2010 - 09:36
Picture of user. Sanja Joined: Dec 07, 2009

In reply to by Dorothy Bishop

Hi,
I'm trying to estimate a 24x24 covariance matrix using FIML in OpenMx, but am not sure whether OpenMx can handle 24 variables (it runs for about an hour and then kind of crashes). If this is the problem, could classic Mx handle it?
Thanks,
Sanja

Replied on Tue, 03/30/2010 - 09:58
Picture of user. Sanja Joined: Dec 07, 2009

In reply to by Sanja

#Btw I more-less use Hermine's script from Boulder:

library(OpenMx)
library(psych)
source("GenEpiHelperFunctions.R")
nv=12
ntv=nv*2
data=round(t(matrix(scan('s1.dat',skip=1,na.strings='-999'),ntv+1,)),2)
dim(data)

colnames(data)=c('zyg',paste(rep('rak_',6),rep(c('v_','nv_'),each=3),rep('1_',6),rep(c('5','7','10'),2),sep=''),paste(rep(c('wisc_R_','wais_III_'),each=3),rep(c('VCI_','POI_','FDI_'),2),rep(c('1_12','1_18'),each=3),sep=''),paste(rep('rak_',6),rep(c('v_','nv_'),each=3),rep('2_',6),rep(c('5','7','10'),2),sep=''),paste(rep(c('wisc_R_','wais_III_'),each=3),rep(c('VCI_','POI_','FDI_'),2),rep(c('2_12','2_18'),each=3),sep=''))

mzData <- subset(data, data[,1]==1|data[,1]==3,colnames(data)[-1]);dim(mzData)
dzData <- subset(data, data[,1]==2|data[,1]==4|data[,1]==5|data[,1]==6,colnames(data)[-1]);dim(dzData)

# Fit Multivariate Saturated Model
# -----------------------------------------------------------------------
multTwinSatModel <- mxModel("multTwinSat",
mxModel("MZ",
mxMatrix( type="Lower", nrow=ntv, ncol=ntv, free=T, values=5, name="CholMZ" ),
mxAlgebra( expression=CholMZ %*% t(CholMZ), name="expCovMZ" ),
mxMatrix( type="Full", nrow=1, ncol=ntv, free=T, values=20, name="expMeanMZ" ),
mxData( observed=mzData, type="raw" ),
mxFIMLObjective( covariance="expCovMZ", means="expMeanMZ", dimnames=dimnames(mzData)[[2]]),
# Algebras needed for equality constraints
mxAlgebra( expression=expMeanMZ[1,1:nv], name="expMeanMZt1"),
mxAlgebra( expression=expMeanMZ[1,(nv+1):ntv], name="expMeanMZt2"),
mxAlgebra( expression=t(diag2vec(expCovMZ)), name="expVarMZ"),
mxAlgebra( expression=expVarMZ[1,1:nv], name="expVarMZt1"),
mxAlgebra( expression=expVarMZ[1,(nv+1):ntv], name="expVarMZt2")
),
mxModel("DZ",
mxMatrix( type="Lower", nrow=ntv, ncol=ntv, free=T, values=5, name="CholDZ" ),
mxAlgebra( expression=CholDZ %*% t(CholDZ), name="expCovDZ" ),
mxMatrix( type="Full", nrow=1, ncol=ntv, free=T, values=20, name="expMeanDZ" ),
mxData( observed=dzData, type="raw" ),
mxFIMLObjective( covariance="expCovDZ", means="expMeanDZ", dimnames=dimnames(dzData)[[2]]),
# Algebra's needed for equality constraints
mxAlgebra( expression=expMeanDZ[1,1:nv], name="expMeanDZt1"),
mxAlgebra( expression=expMeanDZ[1,(nv+1):ntv], name="expMeanDZt2"),
mxAlgebra( expression=t(diag2vec(expCovDZ)), name="expVarDZ"),
mxAlgebra( expression=expVarDZ[1,1:nv], name="expVarDZt1"),
mxAlgebra( expression=expVarDZ[1,(nv+1):ntv], name="expVarDZt2")
),
mxAlgebra( MZ.objective + DZ.objective, name="minus2sumll" ),
mxAlgebraObjective("minus2sumll")
)

multTwinSatFit <- mxRun(multTwinSatModel)
multTwinSatSumm <- summary(multTwinSatFit)
multTwinSatSumm

Replied on Tue, 03/30/2010 - 09:56
Picture of user. neale Joined: Jul 31, 2009

In reply to by Sanja

So, 24 means and 24*25/2=300 covariances gives 324 parameters altogether. It will certainly take a long time in either OpenMx or ClassicMx. Inverting 24x24 matrices takes a while for each function evaluation, and the optimizer has to invert 324x324 matrices every now and then during its search for the solution.

At present, OpenMx has not been optimized for speed, and it will likely take longer than ClassicMx.