You are here

Cholesky with latent traits

10 posts / 0 new
Last post
Benny's picture
Offline
Joined: 05/15/2020 - 05:00
Cholesky with latent traits
AttachmentSize
PDF icon Model.pdf45.14 KB

Hello, I want to estimate a modified CP model, where I have a Cholesky model with three latent traits (see picture attached).
The d- and e-variables are ordinal (d = 2 categories; e = 3 categories). The s-variable is continuous and in order to integrate it into the LISREL-matrices I define a single factor measurement model with the residual variance set to 0 and the factor loading to 1.
For the ordinal variables I fix the means to zero and the residual variances to 1. I followed the approach proposed here to fix the residual variances of the categorical variables to one instead of the variance of the underlying continuous variable. So I can set the thresholds free.
Do I need an additional constraint for the second order factors?
I get the following error:

All fit attempts resulted in errors - check starting values or model specification

since I have tried to estimate the model with different starting values and I am using mxTryHardOrdinal() I suppose that the specification and not the starting values is the cause of the error.
I used WLS because of the high number of categorical variables: I tried the estimation with FIML but after waiting a long time without results I stopped the process. Since the s-variable has a relative high number of NAs, this may be an issue?

Here is my code:

rm(list=ls())
library(OpenMx)
library(dplyr)
 
# Import Data 
data <- read.csv(file = "data_wide.csv",
                    header = TRUE)
 
# Number of variables 
m <- 6    # lat. end. V.
n <- 18   # lat. ex. V.
p <- 32   # Ind. of lat. end. V.
q <- 0    # Ind. of lat. ex. V. 

# Vectors of variables   
enLV <- c("E","D","S") 
mlab <- c(paste0(enLV,1),paste0(enLV,2))
exLV <- c("A","C","E")
nlab <- c(paste0(exLV[1],enLV,1),paste0(exLV[2],enLV,1),paste0(exLV[3],enLV,1),
          paste0(exLV[1],enLV,2),paste0(exLV[2],enLV,2),paste0(exLV[3],enLV,2))
plab <- c(paste0("e",1:5,"t",1),paste0("d",1:10,"t",1),"st1",paste0("e",1:5,"t",2),paste0("d",1:10,"t",2),"st2")
 
del <- c(plab[c(6:15,22:31)])
ext <- c(plab[c(1:5,17:21)])
 
mzData    <- subset(data, zyg==1, plab)
dzData    <- subset(data, zyg==2, plab)
# define categorical items as mxfactors
  # 1. externalizing items (3 levels)
mzData[ext] <- mxFactor(mzData[ext], levels = 1:3)
dzData[ext] <- mxFactor(dzData[ext], levels = 1:3)
 
  # 2. delinquency items (2 levels)
mzData[del] <- mxFactor(mzData[del], levels = 0:1)
dzData[del] <- mxFactor(dzData[del], levels = 0:1)
 
 
# Define the expected covariance matrix using hte LISREL approach 
pathA <- mxMatrix(type = "Lower", nrow = 3, ncol = 3, byrow = TRUE, 
                  free = c(TRUE,FALSE,FALSE,
                             TRUE,TRUE,FALSE,
                             TRUE,TRUE,TRUE),
                  values = c(.5,0,0,
                             0,.5,0,
                             0,0,.5),
                  labels = c("a11",NA,NA,
                             "a21","a22",NA,
                             "a31","a32","a33"), 
                  lbound = c(.0001,NA,NA,
                              -10,.0001,NA,
                              -10,-10,.0001),
                  name = "A")
pathC <- mxMatrix(type = "Lower", nrow = 3, ncol = 3, byrow = TRUE, 
                  free = c(TRUE,FALSE,FALSE,
                             TRUE,TRUE,FALSE,
                             TRUE,TRUE,TRUE),
                  values = c(.5,0,0,
                             0,.5,0,
                             0,0,.5),
                  labels = c("c11",NA,NA,
                             "c21","a22",NA,
                             "c31","c32","c33"), 
                  lbound = c(.0001,NA,NA,
                              -10,.0001,NA,
                              -10,-10,.0001),
                  name = "C")
pathE <- mxMatrix(type = "Lower", nrow = 3, ncol = 3, byrow = TRUE, 
                  free = c(TRUE,FALSE,FALSE,
                             TRUE,TRUE,FALSE,
                             TRUE,TRUE,TRUE),
                  values = c(1,0,0,
                             0,1,0,
                             0,0,1),
                  labels = c("e11",NA,NA,
                             "e21","e22",NA,
                             "e31","e32","e33"), 
                  lbound = c(.0001,NA,NA,
                              -10,.0001,NA,
                              -10,-10,.0001), 
                  name = "E")
Null <- mxMatrix(type = "Zero", nrow = 3, ncol = 9, name = "Zeros")
Gamma <- mxAlgebra(rbind(cbind(A,C,E,Zeros),
                       cbind(Zeros,A,C,E)), name = "GA")
 
Beta <- mxMatrix(type = "Zero", nrow = m, ncol = m, name = "B")
Id <- mxMatrix(type = "Iden", nrow = m, ncol = m, name = "I")
 
LaExt <- mxMatrix(type = "Full", nrow = 5, ncol = 1, byrow = TRUE,
                    free = c(FALSE,TRUE,TRUE,TRUE,TRUE), 
                    values = c(1,.8,.8,.8,.8), 
                    labels = c("la_ex1","la_ex2","la_ex3","la_ex4","la_ex5"),
                    name = "LExt")
NullExt <- mxMatrix(type = "Zero", nrow = 16-5, ncol = 1, name = "NExt")
 
LaDel <- mxMatrix(type = "Full", nrow = 11, ncol = 1, byrow = TRUE, 
                  free = c(FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE),
                  values = c(1,.8,.8,.8,.8,.8,.8,.8,.8,.8,0),
                  labels = c("la_de1","la_de2","la_de3","la_de4","la_de5","la_de6","la_de7","la_de8","la_de9","la_de10", NA),
                  name = "LDel")
NullDel <- mxMatrix(type = "Zero", nrow = 5, ncol = 1, name = "NDel")
 
LaSt <- mxMatrix(type = "Full", nrow = 1, ncol = 1, free = FALSE, 
                 values = 1, labels = "la_st", name = "LSt")
NullSt <- mxMatrix(type = "Zero", nrow = 16-1, ncol = 1, name = "NSt")
 
NullFull <- mxMatrix(type = "Zero", nrow = 16, ncol = 1, name = "NFull")
 
 
LaY <- mxAlgebra(expression = cbind(rbind(LExt, NExt, NFull),
                                    rbind(NDel, LDel, NFull),
                                    rbind(NSt, LSt, NFull),
                                    rbind(NFull, LExt, NExt),
                                    rbind(NFull, NDel, LDel),
                                    rbind(NFull, NSt, LSt)), 
                 name = "LY")
errorval <- rep(c(rep(0,15),0),2)
errorlab <- rep(c("eex1","eex2","eex3","eex4","eex5",
                "ed1","ed2","ed3","ed4","ed5","ed6","ed7","ed8","ed9","ed10",
                "es"),2)
Thee <- mxMatrix(type = "Diag", nrow = 32, ncol = 32, byrow = FALSE, 
                 free = FALSE, values = errorval, lbound = 0,
                 labels = errorlab, name = "THE")
 
Psi <- mxMatrix(type = "Diag", nrow = 6, ncol = 6, free = FALSE, values = 0, lbound = 0, 
                labels = rep(c("eEx","eD","eS"),2), name = "PS")
 
PhiVar <- mxMatrix(type = "Iden", ncol = 9, nrow = 9, name = "PHV")
PhiCMZ <- mxMatrix(type = "Diag", ncol = 9, nrow = 9, 
                   values = c(1,1,1,1,1,1,0,0,0), name = "PHCMZ")
PhiCDZ <- mxMatrix(type = "Diag", ncol = 9, nrow = 9, 
                   values = c(.5,.5,.5,1,1,1,0,0,0), name = "PHCDZ")
 
PhiMZ <- mxAlgebra(rbind(cbind(PHV,PHCMZ),
                         cbind(PHCMZ,PHV)), name = "PHMZ")
PhiDZ <- mxAlgebra(rbind(cbind(PHV,PHCDZ),
                         cbind(PHCDZ,PHV)), name = "PHDZ")
 
eCovMZ  <- mxAlgebra(expression= LY%*%solve(I-B)%*%(GA%*%PHMZ%*%t(GA)+PS)%*%t(solve(I-B))%*%t(LY)+THE, name="expCovMZ") # Formula: Bollen (1989)
eCovDZ  <- mxAlgebra(expression= LY%*%solve(I-B)%*%(GA%*%PHDZ%*%t(GA)+PS)%*%t(solve(I-B))%*%t(LY)+THE, name="expCovDZ") 
#eCovMZ  <- mxAlgebra(expression= LY%*%I%*%(GA%*%PHMZ%*%t(GA)+PS)%*%I%*%t(LY)+THE, name="expCovMZ") 
#eCovDZ  <- mxAlgebra(expression= LY%*%I%*%(GA%*%PHDZ%*%t(GA)+PS)%*%I%*%t(LY)+THE, name="expCovDZ") 

# Defining the mean matrix 
  # mean of isei 
mean <- mxMatrix(type = "Full", nrow = 1, ncol = 32, 
                 free = rep(c(rep(FALSE,15),TRUE),2), values = rep(c(rep(0,15),40),2), 
                 labels = rep(c(rep(NA,15),"mst"),2), 
                 name = "expMean")
 
# Defining the threshold matrix 
    # Example of logic of premultiplication of thinG with inc ! -> Ensures ordering of thresholds for items with >1 thresholds
  #start <- matrix(c(rep(-1.5,30),rep(1,30)),nrow = 2, ncol = 30, byrow = TRUE)
  #one <- matrix(c(1,0,1,1), nrow = 2, ncol = 2, byrow = TRUE)
  #one%*%start

thDim <- plab[c(1:5, 17:21, 6:15, 22:31)]
thLab <- c(rep(paste0("th1","e",1:5),2),rep(paste0("th1","d",1:10),2),rep(paste0("th2","e",1:5),2),rep(NA,20))
thFree <- c(rep(TRUE,40),rep(FALSE,20))
thVal <- c(rep(-1.5,30),rep(1,10),rep(NA,20))
thBound <- c(rep(-3,30),rep(0.001,10),rep(NA,20))
thinG     <- mxMatrix( type="Full", nrow=2, ncol=30, byrow = TRUE, free=thFree,
                       values=thVal, lbound=thBound, 
                       labels=thLab, name="thinG" ) # Incremental matrix: 1 row = start values of 1. threshold; next rows = increments departing from upper row (lbound necessary so that there is at least a 0.0001 increase! nad no decrease to insure the order!)
inc       <- mxMatrix( type="Lower", nrow=2, ncol=2, free=FALSE, values=1, name="inc" )
threG     <- mxAlgebra( expression= inc %*% thinG, name="expTh" ) 
 
 
# Define data object
dataMZ    <- mxData( observed=mzData, type="raw" )
dataDZ    <- mxData( observed=dzData, type="raw" )
 
# Define expectation objects 
expMZ     <- mxExpectationNormal(covariance="expCovMZ", means="expMean", 
                                thresholds="expTh",
                               dimnames=plab,
                               threshnames = thDim)
expDZ     <- mxExpectationNormal(covariance="expCovDZ", means="expMean", 
                                 thresholds="expTh",
                               dimnames=plab,
                               threshnames = thDim)
# Fit function (WLS)
fitfun     <- mxFitFunctionWLS()
 
# parameters
pars      <- list(pathA, pathC, pathE, Null, Gamma, Beta, Id, 
                  LaExt, NullExt, LaDel, NullDel, LaSt, NullSt, NullFull, LaY, 
                  Thee, Psi, mean, thinG, inc, threG, PhiVar)
 
# group specific model objects
modelMZ   <- mxModel(pars, PhiMZ, PhiCMZ, eCovMZ, dataMZ, expMZ, fitfun, name="MZ")
modelDZ   <- mxModel(pars, PhiDZ, PhiCDZ, eCovDZ, dataDZ, expDZ, fitfun, name="DZ")
multi     <- mxFitFunctionMultigroup( c("MZ","DZ") )
 
# overall model object
modelACE  <- mxModel( "ACE", pars, modelMZ, modelDZ, multi)
 
# run model
mxOption( NULL , 'Default optimizer' , 'SLSQP' )
set.seed(1)
fitACE    <- mxTryHardOrdinal(modelACE, extraTries = 10, exhaustive = TRUE)
sumACE    <- summary( fitACE )
sumACE

Here is the output:

Summary of ACE 
 
Starting values are not feasible. Consider mxTryHard() 
 
free parameters:
      name  matrix row col    Estimate lbound ubound
1      a11       A   1   1  0.66719606  1e-04       
2      a21       A   2   1  0.24639404    -10       
3      a31       A   3   1  0.01989468    -10       
4      a22       A   2   2  0.58462127  1e-04       
5      a32       A   3   2  0.23233684    -10       
6      a33       A   3   3  0.20895600  1e-04       
7      c11       C   1   1  0.38163935  1e-04       
8      c21       C   2   1 -0.23787426    -10       
9      c31       C   3   1  0.05621162    -10       
10     c32       C   3   2  0.09830437    -10       
11     c33       C   3   3  0.65105809  1e-04       
12     e11       E   1   1  0.78303864  1e-04       
13     e21       E   2   1 -0.17010759    -10       
14     e31       E   3   1  0.24441680    -10       
15     e22       E   2   2  0.81938448  1e-04       
16     e32       E   3   2 -0.15604409    -10       
17     e33       E   3   3  1.10636594  1e-04       
18  la_ex2    LExt   2   1  0.92273536              
19  la_ex3    LExt   3   1  0.88631576              
20  la_ex4    LExt   4   1  1.03505741              
21  la_ex5    LExt   5   1  0.95816739              
22  la_de2    LDel   2   1  0.54168182              
23  la_de3    LDel   3   1  0.65248564              
24  la_de4    LDel   4   1  1.02872796              
25  la_de5    LDel   5   1  0.47255096              
26  la_de6    LDel   6   1  0.58154706              
27  la_de7    LDel   7   1  0.92095349              
28  la_de8    LDel   8   1  0.83431902              
29  la_de9    LDel   9   1  0.75920953              
30 la_de10    LDel  10   1  0.74839136              
31     mst expMean   1  16 48.11579465              
32   th1e1   thinG   1   1 -1.72455967     -3       
33   th2e1   thinG   2   1  0.88925538  0.001       
34   th1e2   thinG   1   2 -1.31678786     -3       
35   th2e2   thinG   2   2  1.29860129  0.001       
36   th1e3   thinG   1   3 -1.97153396     -3       
37   th2e3   thinG   2   3  0.81570670  0.001       
38   th1e4   thinG   1   4 -1.31957163     -3       
39   th2e4   thinG   2   4  1.23111741  0.001       
40   th1e5   thinG   1   5 -1.09611506     -3       
41   th2e5   thinG   2   5  1.07165598  0.001       
42   th1d1   thinG   1  11 -0.91010120     -3       
43   th1d2   thinG   1  12 -1.38949672     -3       
44   th1d3   thinG   1  13 -1.66416074     -3       
45   th1d4   thinG   1  14 -0.97191166     -3       
46   th1d5   thinG   1  15 -1.64255135     -3       
47   th1d6   thinG   1  16 -1.15400035     -3       
48   th1d7   thinG   1  17 -1.11629872     -3       
49   th1d8   thinG   1  18 -1.49572992     -3       
50   th1d9   thinG   1  19 -1.97838685     -3       
51  th1d10   thinG   1  20 -1.98501499     -3       
 
Model Statistics: 
               |  Parameters  |  Degrees of Freedom  |  Fit ( units)
       Model:             51                  62399               NA
   Saturated:             NA                     NA               NA
Independence:             NA                     NA               NA
Number of observations/statistics: 2042/62450
 
timestamp: 2020-11-13 08:57:16 
Wall clock time: 2.551282 secs 
OpenMx version number: 2.18.1 
Need help?  See help(mxSummary)
jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
SLSQP

Do you get the same problem using CSOLNP?

Benny's picture
Offline
Joined: 05/15/2020 - 05:00
Same problem

Yes, I get the same warning for CSOLNP, SLSQP and NPSOL.

Benny's picture
Offline
Joined: 05/15/2020 - 05:00
Wrong constraints on residual variances - same problem

I just saw that I wrongly set a zero constraint to the residual variances of the categorical variables. So the object "errorval" should be (constraint to 1 for the categorical variables and to zero for the continuous single-indicator model):

errorval <- rep(c(rep(1,15),0),2)

But the problem remains.

AdminNeale's picture
Offline
Joined: 03/01/2013 - 14:09
Check the Expected Covariance matrices

I'm not sure if it will work with a model that cannot get started, but mxGetExpected() could be used to extract the expected covariance matrix of the measures, and the expected means. If you use eigen() on the expected covariance matrix, you can check for pathologies like whether the matrix is positive definite or not. If it isn't, I recommend lowering starting values for stuff that causes covariance (like A and C) and increasing E. Also check that the expected means are reasonable - ideally within 2SD of the observed mean (where the SD is the square root of the expected variance for that variable under the model).

Essentially, one thing that screws up is when the likelihood of a row of data works out to be zero. If that is the case for any of the data rows, model-fitting will usually fail because the log-likelihood is negative infinity and it's hard to go anywhere from there. As mentioned, zero likelihood can happen with singular or negative-definite expected matrices, or starting values for means that are way off.

Benny's picture
Offline
Joined: 05/15/2020 - 05:00
Positive, but strange eigenvalues

I have run the model now with different starting values for the ACE-paths and the means following your suggestions and calculated the expected covariance matrices and their eigenvalues. Also, I used mxTryHardOrdinal().
The warning remains and the eigenvalues are all positive, but at the same time there are a lot of negative estimated variances in the expected covariance matrix. Also, there is a strange behavior with the eigenvalues. The last 26 values are always identical with the constraints for the residual variances of the observed categorical variables. That is definitely strange but I am not sure what might be the conclusion from this observation.

Besides the starting values I have some suspicions regarding possible causes of the warning:
Data: Some e-variables have a very (!) skew distribution, some only have 2 or 10 observations in the third category. Also there is substantial non-response for the s-variable (~ 450 NAs compared to max. 60 NAs for the categorical variables).

Different modeling approach of the ordinal variables: As far as I understood, there are the following approaches:
2 categories:
a) Fixed mean + fixed variance (-> mxConstraint) (and free residual variances) + free Threshold
b) Fixed mean + fixed residual variance (and free variance) + free Threshold
> 2 categories:
a) Fixed mean + fixed variance (-> mxConstraint) (and free residual variances) + free Thresholds
b) Fixed mean + fixed residual variance (and free variance) + free Thresholds
c) Free mean + free variance (-> mxConstraint) (and free residual variances) + first two thresholds fixed
d) Free mean + free residual variance (and free variance) + first two thresholds fixed
Is that true? So I could try some different options for the ordinal variables.

Coding error: Also, I checked the code several times and I only found one bug regarding the residual variances (see post#4), but at the moment I can't see that there is another. Maybe you see some coding error that I don't see?

Benny's picture
Offline
Joined: 05/15/2020 - 05:00
One more question

It seems that I have fixed the bugs of the model but after having read some papers about the inclusion of categorical indicators in SEM models I have a question due to curiosity: The literatureoften tells me that when I fix the scale of y* to one, the residual variance is not a free parameter anymore but is obtained as a simple remainder (Delta parametrization in mplus-terms). See for example Muthén and Asparouhov (2002). In OpenMx it is possible to estimate the residual variance as a free parameter with a SE when I fix the variance of y* to 1 and it counts as a parameter (not e.g. like in Mplus or lavaan). At the same time the d.f. are the same because the constraint "contributes 1 observed statistic". Why is it like that? It would be nice if you could give me some literature tipps where the rationale behind the OpenMx approach is explained and/or further elaborated.

Benny's picture
Offline
Joined: 05/15/2020 - 05:00
Any hints?

Hi, do you have any hints where I can get more information about the differences in the categorical data handling between OpenMx and Mplus/lavaan with respect to the estimation of the residual variance?

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
?

I don't think there's anything all that profound about the differences. As you noted yourself upthread, there is more than one way to identify the location and scale of the latent trait. In OpenMx, it's entirely up to the user which way to choose. MPlus/lavaan accept a narrower range of choices (I think because they are more strongly influenced by the LISREL tradition than OpenMx is?).

AdminNeale's picture
Offline
Joined: 03/01/2013 - 14:09
Binary variable may be the issue

There’s always a handicap with binary variables that the mean (or threshold) and variance are confounded. If you can use at least 3 category response variables this limitation would be overcome. If it has to be binary, I would recommend setting the variance of the binary variable to 1 via a non-linear constraint, or by expressing the residual as 1 - variance_due_to_other_sources using an mxAlgebra. Then the threshold can be estimated, and I think your model’s under-identification issues will be resolved.