Categorical Data in both independent and dependent variables

Posted on Tue, 08/26/2014 - 17:15

rl Joined: 08/26/2014

Attachment

Size

model.R

2.79 KB

Models.pdf

19 KB

path input alt1.csv

707 bytes

Forums

OpenMx General Help

I have a data set with six binary variables, which I am trying to determine the temporal relationship. I was using lavaan R package, where they suggested to use dummy variable for endogenous variables (independent) and use ordered for exogenous (dependent variables). I was using the model as described in pdf file. I have gotten following results:

lavaan (0.5-16) converged normally after 31 iterations

Number of observations 51

Estimator DWLS Robust
Minimum Function Test Statistic 5.699 7.295
Degrees of freedom 7 7
P-value (Chi-square) 0.575 0.399
Scaling correction factor 1.006
Shift parameter 1.632
for simple second-order correction (Mplus variant)

Model test baseline model:

Minimum Function Test Statistic 61.153 45.447
Degrees of freedom 14 14
P-value 0.000 0.000

User model versus baseline model:

Comparative Fit Index (CFI) 1.000 0.991
Tucker-Lewis Index (TLI) 1.055 0.981

Root Mean Square Error of Approximation:

RMSEA 0.000 0.029
90 Percent Confidence Interval 0.000 0.153 0.000 0.178
P-value RMSEA <= 0.05 0.654 0.486

According to lavaan since it was using DWLS, AIC and BIC are calculated. However, we had someone analyze the data before for us was using AIC and BIC to compare which temporal relationship is better at explaining our data. So I decided to try OpenMx, I am very new to path analysis and OpenMx, so I am not quite sure if I am treating the variables correctly. I have attached the code and dataset and the model was run and gave me some estimates, however, RMSEA was not computed. And the estimates for each path was different compared to lavaan. At this point, I am not sure how to compare my results from these two methods and if it's even comparable since OpenMx I turned everything into ordinal but in lavaan I used dummy variables. And why is that OpenMx didn't calcualte RMSEA for categorical data.

rl

Replied on Thu, 08/28/2014 - 00:08

mhunter Joined: 07/31/2009

Clarification?

I'm not totally sure what your question is, but here are a few answers.

First, you do not have a lot of observations (rows of data). Around 50 observations is rather small for SEM, especially with the number of parameters you are estimating (21).

Second, you may get different estimates with lavaan using diagonally weighted least squares (DWLS) vs OpenMx using full information maximum likelihood (FIML). They are using very different methods with different assumptions. I would guess they would be ballpark similar, but nowhere near identical.

Third, after the model you made runs, it comes back with NA in many standard errors. This generally indicates a problem either in starting values or in model specification. My bet here is model specification. I'd have to look more closely at the model, but I suspect it might not be identified.

Fourth, OpenMx is not reporting the RMSEA because it wants to save estimation time. For raw data, RMSEA, TLI, and CFI require some comparison models to be fit and these may take time equal to the model you're interested in, so we don't fit them by default. If you're using the Beta (and please do), then we have a helper function to make and run these comparison models.

sat.1 <- omxSaturatedModel(fit.1, run=TRUE) summary(fit.1, SaturatedLikelihood=sat.1[[1]], IndependenceLikelihood=sat.1[[2]])

Hopefully this helps!

Log in or register to post comments

Replied on Thu, 08/28/2014 - 18:03

neale Joined: 07/31/2009

With binary data omxSaturatedModel builds under-identified model

So the variances have to be fixed at 1 for binary data. Possibly we could be smarter about it (detecting binary variables)? I would use the fix the first two thresholds at 0 & 1 for 3+ category data.

The Cholesky used for the saturated model helps a lot with stability, but is problematic to constrain the diagonal of ltCov%*%t(ltCov) to equal 1. Non-linear constraints can be used:
omxsatmod <- mxModel(omxsatmod$Saturated,mxMatrix("Unit",6,1,name="U"),mxConstraint(U==diag2vec(satCov)))

The independence model has the same problem, but it's simpler (just give it a Stand matrix of the right dimensions with free=F). For now, since it's diagonal,
omxsatfit$Independence$indCov$free<-F omxsatfit$Independence$indCov$values<-diag(6)

Even then, the model isn't very stable unless we make the lbound for the diagonal elements of ltCov say .01 - if they hit closer to zero, non-positive definiteness can cause havoc.

Log in or register to post comments

Replied on Fri, 08/29/2014 - 15:36

rl Joined: 08/26/2014

Thank you. Your answer really

Thank you. Your answer really helps. I will have to look closer at my model and I agree with about the sample size.

Log in or register to post comments

Replied on Thu, 08/28/2014 - 10:11

neale Joined: 07/31/2009

Phew, rather more difficult than it should be.

Hi Rufei

We need a saturated model here. One approach is to allow all covariances to be free parameters, while constraining all variances to 1.0. I note also that there were some means free as well as thresholds, so I repaired that by fixing all means to zero. The variance fixing has to be done because the data are binary. So the saturated model looks like this:

dataD <- read.csv("path input alt1.csv") vars <- c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3") dataD$Ab1 <- mxFactor(dataD$Ab1, levels = c(0 , 1)) dataD$Ab2 <- mxFactor(dataD$Ab2, levels = c(0 , 1)) dataD$Ab3 <- mxFactor(dataD$Ab3, levels = c(0 , 1)) dataD$IFNa1 <- mxFactor(dataD$IFNa1, levels = c(0 , 1), ordered = T) dataD$IFNa2 <- mxFactor(dataD$IFNa2, levels = c(0 , 1), ordered = T) dataD$IFNa3 <- mxFactor(dataD$IFNa3, levels = c(0 , 1), ordered = T) fmat <- diag(6)[c(2,3,5,6),] model.1 <- mxModel("Autoantibody before IFNa activity model", type = "RAM", mxData(observed = dataD[, vars], type = "raw"), #Define variables manifestVars = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"), mxMatrix("Iden",6,6,name="I"), mxMatrix("Unit",4,1,name="u4"), mxMatrix("Full",4,6,values=fmat,name="fmat"),mxAlgebra(diag2vec(fmat%&%(F%&%solve(I-A)%&%S)), name="expVars"), mxConstraint(expVars==u4,name="Varcon"), mxPath(from = "Ab1", to = "IFNa1", arrows = 2, values = 0, free = T, labels = "v1"), mxPath(from = "Ab2", to = "IFNa2", arrows = 2, values = 0, free = T, labels = "v2"), mxPath(from = "Ab3", to = "IFNa3", arrows = 2, values = 0, free = T, labels = "v3"), mxPath(from = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"), arrows = 2, free = c(F, T, T, F, T, T), values = c(1,1,1,1,1,1), lbound=.01), #Paths mxPath(from = c("Ab1", "IFNa1"), to = "IFNa2", arrows = 1, values = 0, free = c(T,T), labels = c("a1", "b1")), mxPath(from = c("Ab2", "IFNa2"), to = "IFNa3", arrows = 1, values = 0, free = c(T,T), labels = c("a2", "b2")), mxPath(from = "Ab1", to = "Ab2", arrows = 1, values = 0, free = T, labels = "ab1"), mxPath(from = "Ab2", to = "Ab3", arrows = 1, values = 0, free = T, labels = "ab2"), mxPath(from = "IFNa1", to = "IFNa2", arrows = 1, values = 0, free = T, labels = "IFN1"), mxPath(from = "IFNa2", to = "IFNa3", arrows = 1, values = 0, free = T, labels = "IFN2"), #Means mxPath(from = "one", to = c("Ab1", "Ab2", "Ab3", "IFNa1", "IFNa2", "IFNa3"), values = 0, arrows = 1, free = c(F)), #Threshold matrix mxMatrix(type = "Full", nrow = 1, ncol = 6, free = c(T, T, T, T, T, T), values = c(-0.18173563, -1.27430357,-1.75717640, 0.37933827,-0.01542988, -0.82819033), byrow = T, dimnames = list(c(), c('Ab1','Ab2', 'Ab3', 'IFNa1', 'IFNa2', 'IFNa3')), name = "Threshold"), #Means and covaraince matrix mxRAMObjective(A="A", S="S", F="F", M="M", thresholds = "Threshold") ) summary(fit.1 <- mxRun(model.1,unsafe=T), SaturatedLikelihood=satfit.1)

We have to add some matrix algebra and nonlinear constraints to force the variances of the dependent (endogenous) variables to 1.0. This is pretty tricky stuff, and I believe that OpenMx should make this much easier than it is. I use a matrix fmat to pluck out the endogenous variables, numbers 2,3,5 and 6.

The model you want to fit looks like this, I think:

fmat <- diag(6)[c(2,3,5,6),] model.1 <- mxModel("Autoantibody before IFNa activity model", type = "RAM", mxData(observed = dataD[, vars], type = "raw"), #Define variables manifestVars = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"), mxMatrix("Iden",6,6,name="I"), mxMatrix("Unit",4,1,name="u4"), mxMatrix("Full",4,6,values=fmat,name="fmat"), mxAlgebra(diag2vec(fmat%&%(F%&%solve(I-A)%&%S)), name="expVars"), mxConstraint(expVars==u4,name="Varcon"), mxPath(from = "Ab1", to = "IFNa1", arrows = 2, values = 0, free = T, labels = "v1"), mxPath(from = "Ab2", to = "IFNa2", arrows = 2, values = 0, free = T, labels = "v2"), mxPath(from = "Ab3", to = "IFNa3", arrows = 2, values = 0, free = T, labels = "v3"), mxPath(from = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"), arrows = 2, free = c(F, T, T, F, T, T), values = c(1,1,1,1,1,1), lbound=.01), #Paths mxPath(from = c("Ab1", "IFNa1"), to = "IFNa2", arrows = 1, values = 0, free = c(T,T), labels = c("a1", "b1")), mxPath(from = c("Ab2", "IFNa2"), to = "IFNa3", arrows = 1, values = 0, free = c(T,T), labels = c("a2", "b2")), mxPath(from = "Ab1", to = "Ab2", arrows = 1, values = 0, free = T, labels = "ab1"), mxPath(from = "Ab2", to = "Ab3", arrows = 1, values = 0, free = T, labels = "ab2"), mxPath(from = "IFNa1", to = "IFNa2", arrows = 1, values = 0, free = T, labels = "IFN1"), mxPath(from = "IFNa2", to = "IFNa3", arrows = 1, values = 0, free = T, labels = "IFN2"), #Means mxPath(from = "one", to = c("Ab1", "Ab2", "Ab3", "IFNa1", "IFNa2", "IFNa3"), values = 0, arrows = 1, free = c(F)), #Threshold matrix mxMatrix(type = "Full", nrow = 1, ncol = 6, free = c(T, T, T, T, T, T), values = c(-0.18173563, -1.27430357,-1.75717640, 0.37933827,-0.01542988, -0.82819033), byrow = T, dimnames = list(c(), c('Ab1','Ab2', 'Ab3', 'IFNa1', 'IFNa2', 'IFNa3')), name = "Threshold"), #Means and covaraince matrix mxRAMObjective(A="A", S="S", F="F", M="M", thresholds = "Threshold") ) summary(fit.1 <- mxRun(model.1), SaturatedLikelihood=satfit.1)
The output looks like this to me (and I would like the developers to look at it because I am not sure that the chi-squared df are correct:
> summary(fit.1 <- mxRun(model.1), SaturatedLikelihood=satfit.1) Running Autoantibody before IFNa activity model Summary of Autoantibody before IFNa activity model


free parameters:


1                                                    IFN1 2                                                    IFN2 3                                                      a1 4                                                     ab1 5                                                      a2 6                                                     ab2 7          Autoantibody before IFNa activity model.S[2,2] 8          Autoantibody before IFNa activity model.S[3,3] 9                                                      v1 10                                                     v2 11         Autoantibody before IFNa activity model.S[5,5] 12                                                     v3 13         Autoantibody before IFNa activity model.S[6,6] 14 Autoantibody before IFNa activity model.Threshold[1,1] Threshold 15 Autoantibody before IFNa activity model.Threshold[1,2] Threshold 16 Autoantibody before IFNa activity model.Threshold[1,3] Threshold 17 Autoantibody before IFNa activity model.Threshold[1,4] Threshold 18 Autoantibody before IFNa activity model.Threshold[1,5] Threshold 19 Autoantibody before IFNa activity model.Threshold[1,6] Threshold

name    matrix   row   col    Estimate lbound ubound
 A IFNa2 IFNa1  0.31453547
 A IFNa3 IFNa2  0.63617617
 A IFNa2   Ab1  0.32507136
 A   Ab2   Ab1  0.86859883
 A IFNa3   Ab2  0.32619533
 A   Ab3   Ab2  0.90341107
 S IFNa2 IFNa2  0.73627615   0.01
 S IFNa3 IFNa3  0.24049468   0.01
 S IFNa1   Ab1  0.28910522
 S IFNa2   Ab2  0.23711782
 S   Ab2   Ab2  0.24553605   0.01
 S IFNa3   Ab3 -0.15010803
 S   Ab3   Ab3  0.18384846   0.01
 1   Ab1 -0.15334071
 1   Ab2 -0.60399633
 1   Ab3 -0.73879246
 1 IFNa1  0.39794957
 1 IFNa2  0.01252314
 1 IFNa3 -0.38241874

observed statistics: 316 Constraint 'Varcon' contributes 4 observed statistics. estimated parameters: 19 degrees of freedom: 297 -2 log likelihood: 312.2739 saturated -2 log likelihood: 308.9829 number of observations: 52 chi-square: X2 ( df=2 ) = 3.290923, p = 0.1929235 Information Criteria: | df Penalty | Parameters Penalty | Sample-Size Adjusted AIC: -281.7261 350.2739 NA BIC: -861.2455 387.3475 327.6815 CFI: NA TLI: NA RMSEA: 0.1114124 [95% CI (0, 0.3506192)] Some of your fit indices are missing. To get them, fit saturated and independence models, and include them with summary(yourModel, SaturatedLikelihood=..., IndependenceLikelihood=...). timestamp: 2014-08-28 10:04:35 wall clock time: 0.335669 secs OpenMx version number: 2.0.0.3766 Need help? See help(mxSummary)

Finally, note that OpenMx is using a normal theory threshold model here, effectively working with tetrachoric correlations. These may differ from the statistics being used in other software. Some robust methods use Pearson correlations, which may underestimate the latent correlation relative to those of the threshold model.

Log in or register to post comments

Replied on Fri, 08/29/2014 - 15:31

News

Recent Posts

Categorical Data in both independent and dependent variables

Clarification?

With binary data omxSaturatedModel builds under-identified model

Thank you. Your answer really

Phew, rather more difficult than it should be.

Thank you so much for your

News

Recent Posts