Categorical Data in both independent and dependent variables

Posted on
No user picture. rl Joined: 08/26/2014
I have a data set with six binary variables, which I am trying to determine the temporal relationship. I was using lavaan R package, where they suggested to use dummy variable for endogenous variables (independent) and use ordered for exogenous (dependent variables). I was using the model as described in pdf file. I have gotten following results:

lavaan (0.5-16) converged normally after 31 iterations

Number of observations 51

Estimator DWLS Robust
Minimum Function Test Statistic 5.699 7.295
Degrees of freedom 7 7
P-value (Chi-square) 0.575 0.399
Scaling correction factor 1.006
Shift parameter 1.632
for simple second-order correction (Mplus variant)

Model test baseline model:

Minimum Function Test Statistic 61.153 45.447
Degrees of freedom 14 14
P-value 0.000 0.000

User model versus baseline model:

Comparative Fit Index (CFI) 1.000 0.991
Tucker-Lewis Index (TLI) 1.055 0.981

Root Mean Square Error of Approximation:

RMSEA 0.000 0.029
90 Percent Confidence Interval 0.000 0.153 0.000 0.178
P-value RMSEA <= 0.05 0.654 0.486

According to lavaan since it was using DWLS, AIC and BIC are calculated. However, we had someone analyze the data before for us was using AIC and BIC to compare which temporal relationship is better at explaining our data. So I decided to try OpenMx, I am very new to path analysis and OpenMx, so I am not quite sure if I am treating the variables correctly. I have attached the code and dataset and the model was run and gave me some estimates, however, RMSEA was not computed. And the estimates for each path was different compared to lavaan. At this point, I am not sure how to compare my results from these two methods and if it's even comparable since OpenMx I turned everything into ordinal but in lavaan I used dummy variables. And why is that OpenMx didn't calcualte RMSEA for categorical data.

rl

Replied on Thu, 08/28/2014 - 00:08
Picture of user. mhunter Joined: 07/31/2009

I'm not totally sure what your question is, but here are a few answers.

First, you do not have a lot of observations (rows of data). Around 50 observations is rather small for SEM, especially with the number of parameters you are estimating (21).

Second, you may get different estimates with lavaan using diagonally weighted least squares (DWLS) vs OpenMx using full information maximum likelihood (FIML). They are using very different methods with different assumptions. I would guess they would be ballpark similar, but nowhere near identical.

Third, after the model you made runs, it comes back with NA in many standard errors. This generally indicates a problem either in starting values or in model specification. My bet here is model specification. I'd have to look more closely at the model, but I suspect it might not be identified.

Fourth, OpenMx is not reporting the RMSEA because it wants to save estimation time. For raw data, RMSEA, TLI, and CFI require some comparison models to be fit and these may take time equal to the model you're interested in, so we don't fit them by default. If you're using the Beta (and please do), then we have a helper function to make and run these comparison models.


sat.1 <- omxSaturatedModel(fit.1, run=TRUE)
summary(fit.1, SaturatedLikelihood=sat.1[[1]], IndependenceLikelihood=sat.1[[2]])

Hopefully this helps!

Replied on Thu, 08/28/2014 - 18:03
Picture of user. neale Joined: 07/31/2009

In reply to by mhunter

So the variances have to be fixed at 1 for binary data. Possibly we could be smarter about it (detecting binary variables)? I would use the fix the first two thresholds at 0 & 1 for 3+ category data.

The Cholesky used for the saturated model helps a lot with stability, but is problematic to constrain the diagonal of ltCov%*%t(ltCov) to equal 1. Non-linear constraints can be used:

omxsatmod <- mxModel(omxsatmod$Saturated,mxMatrix("Unit",6,1,name="U"),mxConstraint(U==diag2vec(satCov)))

The independence model has the same problem, but it's simpler (just give it a Stand matrix of the right dimensions with free=F). For now, since it's diagonal,

omxsatfit$Independence$indCov$free<-F
omxsatfit$Independence$indCov$values<-diag(6)

Even then, the model isn't very stable unless we make the lbound for the diagonal elements of ltCov say .01 - if they hit closer to zero, non-positive definiteness can cause havoc.

Replied on Thu, 08/28/2014 - 10:11
Picture of user. neale Joined: 07/31/2009

Hi Rufei

We need a saturated model here. One approach is to allow all covariances to be free parameters, while constraining all variances to 1.0. I note also that there were some means free as well as thresholds, so I repaired that by fixing all means to zero. The variance fixing has to be done because the data are binary. So the saturated model looks like this:


dataD <- read.csv("path input alt1.csv")
vars <- c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3")
dataD$Ab1 <- mxFactor(dataD$Ab1, levels = c(0 , 1))
dataD$Ab2 <- mxFactor(dataD$Ab2, levels = c(0 , 1))
dataD$Ab3 <- mxFactor(dataD$Ab3, levels = c(0 , 1))
dataD$IFNa1 <- mxFactor(dataD$IFNa1, levels = c(0 , 1), ordered = T)
dataD$IFNa2 <- mxFactor(dataD$IFNa2, levels = c(0 , 1), ordered = T)
dataD$IFNa3 <- mxFactor(dataD$IFNa3, levels = c(0 , 1), ordered = T)
fmat <- diag(6)[c(2,3,5,6),]
model.1 <- mxModel("Autoantibody before IFNa activity model",
type = "RAM",
mxData(observed = dataD[, vars], type = "raw"),
#Define variables
manifestVars = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"),
mxMatrix("Iden",6,6,name="I"), mxMatrix("Unit",4,1,name="u4"), mxMatrix("Full",4,6,values=fmat,name="fmat"),mxAlgebra(diag2vec(fmat%&%(F%&%solve(I-A)%&%S)), name="expVars"), mxConstraint(expVars==u4,name="Varcon"),
mxPath(from = "Ab1", to = "IFNa1", arrows = 2, values = 0, free = T, labels = "v1"),
mxPath(from = "Ab2", to = "IFNa2", arrows = 2, values = 0, free = T, labels = "v2"),
mxPath(from = "Ab3", to = "IFNa3", arrows = 2, values = 0, free = T, labels = "v3"),
mxPath(from = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"), arrows = 2, free = c(F, T, T, F, T, T),
values = c(1,1,1,1,1,1), lbound=.01),
#Paths
mxPath(from = c("Ab1", "IFNa1"), to = "IFNa2", arrows = 1, values = 0, free = c(T,T), labels = c("a1", "b1")),
mxPath(from = c("Ab2", "IFNa2"), to = "IFNa3", arrows = 1, values = 0, free = c(T,T), labels = c("a2", "b2")),
mxPath(from = "Ab1", to = "Ab2", arrows = 1, values = 0, free = T, labels = "ab1"),
mxPath(from = "Ab2", to = "Ab3", arrows = 1, values = 0, free = T, labels = "ab2"),
mxPath(from = "IFNa1", to = "IFNa2", arrows = 1, values = 0, free = T, labels = "IFN1"),
mxPath(from = "IFNa2", to = "IFNa3", arrows = 1, values = 0, free = T, labels = "IFN2"),
#Means
mxPath(from = "one", to = c("Ab1", "Ab2", "Ab3", "IFNa1", "IFNa2", "IFNa3"), values = 0,
arrows = 1, free = c(F)),
#Threshold matrix
mxMatrix(type = "Full", nrow = 1, ncol = 6, free = c(T, T, T, T, T, T),
values = c(-0.18173563, -1.27430357,-1.75717640, 0.37933827,-0.01542988, -0.82819033), byrow = T, dimnames = list(c(), c('Ab1','Ab2', 'Ab3', 'IFNa1', 'IFNa2', 'IFNa3')),
name = "Threshold"),
#Means and covaraince matrix
mxRAMObjective(A="A", S="S", F="F", M="M", thresholds = "Threshold")
)
summary(fit.1 <- mxRun(model.1,unsafe=T), SaturatedLikelihood=satfit.1)

We have to add some matrix algebra and nonlinear constraints to force the variances of the dependent (endogenous) variables to 1.0. This is pretty tricky stuff, and I believe that OpenMx should make this much easier than it is. I use a matrix fmat to pluck out the endogenous variables, numbers 2,3,5 and 6.

The model you want to fit looks like this, I think:


fmat <- diag(6)[c(2,3,5,6),]
model.1 <- mxModel("Autoantibody before IFNa activity model",
type = "RAM",
mxData(observed = dataD[, vars], type = "raw"),
#Define variables
manifestVars = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"),
mxMatrix("Iden",6,6,name="I"), mxMatrix("Unit",4,1,name="u4"),
mxMatrix("Full",4,6,values=fmat,name="fmat"),
mxAlgebra(diag2vec(fmat%&%(F%&%solve(I-A)%&%S)), name="expVars"),
mxConstraint(expVars==u4,name="Varcon"),
mxPath(from = "Ab1", to = "IFNa1", arrows = 2, values = 0, free = T, labels = "v1"),
mxPath(from = "Ab2", to = "IFNa2", arrows = 2, values = 0, free = T, labels = "v2"),
mxPath(from = "Ab3", to = "IFNa3", arrows = 2, values = 0, free = T, labels = "v3"),
mxPath(from = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"), arrows = 2, free = c(F, T, T, F, T, T),
values = c(1,1,1,1,1,1), lbound=.01),
#Paths
mxPath(from = c("Ab1", "IFNa1"), to = "IFNa2", arrows = 1, values = 0, free = c(T,T), labels = c("a1", "b1")),
mxPath(from = c("Ab2", "IFNa2"), to = "IFNa3", arrows = 1, values = 0, free = c(T,T), labels = c("a2", "b2")),
mxPath(from = "Ab1", to = "Ab2", arrows = 1, values = 0, free = T, labels = "ab1"),
mxPath(from = "Ab2", to = "Ab3", arrows = 1, values = 0, free = T, labels = "ab2"),
mxPath(from = "IFNa1", to = "IFNa2", arrows = 1, values = 0, free = T, labels = "IFN1"),
mxPath(from = "IFNa2", to = "IFNa3", arrows = 1, values = 0, free = T, labels = "IFN2"),
#Means
mxPath(from = "one", to = c("Ab1", "Ab2", "Ab3", "IFNa1", "IFNa2", "IFNa3"), values = 0,
arrows = 1, free = c(F)),
#Threshold matrix
mxMatrix(type = "Full", nrow = 1, ncol = 6, free = c(T, T, T, T, T, T),
values = c(-0.18173563, -1.27430357,-1.75717640, 0.37933827,-0.01542988, -0.82819033), byrow = T, dimnames = list(c(), c('Ab1','Ab2', 'Ab3', 'IFNa1', 'IFNa2', 'IFNa3')),
name = "Threshold"),
#Means and covaraince matrix
mxRAMObjective(A="A", S="S", F="F", M="M", thresholds = "Threshold")
)
summary(fit.1 <- mxRun(model.1), SaturatedLikelihood=satfit.1)

The output looks like this to me (and I would like the developers to look at it because I am not sure that the chi-squared df are correct:

> summary(fit.1 <- mxRun(model.1), SaturatedLikelihood=satfit.1)
Running Autoantibody before IFNa activity model
Summary of Autoantibody before IFNa activity model

free parameters:
name matrix row col Estimate lbound ubound
1 IFN1 A IFNa2 IFNa1 0.31453547
2 IFN2 A IFNa3 IFNa2 0.63617617
3 a1 A IFNa2 Ab1 0.32507136
4 ab1 A Ab2 Ab1 0.86859883
5 a2 A IFNa3 Ab2 0.32619533
6 ab2 A Ab3 Ab2 0.90341107
7 Autoantibody before IFNa activity model.S[2,2] S IFNa2 IFNa2 0.73627615 0.01
8 Autoantibody before IFNa activity model.S[3,3] S IFNa3 IFNa3 0.24049468 0.01
9 v1 S IFNa1 Ab1 0.28910522
10 v2 S IFNa2 Ab2 0.23711782
11 Autoantibody before IFNa activity model.S[5,5] S Ab2 Ab2 0.24553605 0.01
12 v3 S IFNa3 Ab3 -0.15010803
13 Autoantibody before IFNa activity model.S[6,6] S Ab3 Ab3 0.18384846 0.01
14 Autoantibody before IFNa activity model.Threshold[1,1] Threshold 1 Ab1 -0.15334071
15 Autoantibody before IFNa activity model.Threshold[1,2] Threshold 1 Ab2 -0.60399633
16 Autoantibody before IFNa activity model.Threshold[1,3] Threshold 1 Ab3 -0.73879246
17 Autoantibody before IFNa activity model.Threshold[1,4] Threshold 1 IFNa1 0.39794957
18 Autoantibody before IFNa activity model.Threshold[1,5] Threshold 1 IFNa2 0.01252314
19 Autoantibody before IFNa activity model.Threshold[1,6] Threshold 1 IFNa3 -0.38241874

observed statistics: 316
Constraint 'Varcon' contributes 4 observed statistics.
estimated parameters: 19
degrees of freedom: 297
-2 log likelihood: 312.2739
saturated -2 log likelihood: 308.9829
number of observations: 52
chi-square: X2 ( df=2 ) = 3.290923, p = 0.1929235
Information Criteria:
| df Penalty | Parameters Penalty | Sample-Size Adjusted
AIC: -281.7261 350.2739 NA
BIC: -861.2455 387.3475 327.6815
CFI: NA
TLI: NA
RMSEA: 0.1114124 [95% CI (0, 0.3506192)]
Some of your fit indices are missing.
To get them, fit saturated and independence models, and include them with
summary(yourModel, SaturatedLikelihood=..., IndependenceLikelihood=...).
timestamp: 2014-08-28 10:04:35
wall clock time: 0.335669 secs
OpenMx version number: 2.0.0.3766
Need help? See help(mxSummary)

Finally, note that OpenMx is using a normal theory threshold model here, effectively working with tetrachoric correlations. These may differ from the statistics being used in other software. Some robust methods use Pearson correlations, which may underestimate the latent correlation relative to those of the threshold model.