# Categorical Data in both independent and dependent variables

Attachment | Size |
---|---|

model.R | 2.79 KB |

Models.pdf | 19 KB |

path input alt1.csv | 707 bytes |

lavaan (0.5-16) converged normally after 31 iterations

Number of observations 51

Estimator DWLS Robust

Minimum Function Test Statistic 5.699 7.295

Degrees of freedom 7 7

P-value (Chi-square) 0.575 0.399

Scaling correction factor 1.006

Shift parameter 1.632

for simple second-order correction (Mplus variant)

Model test baseline model:

Minimum Function Test Statistic 61.153 45.447

Degrees of freedom 14 14

P-value 0.000 0.000

User model versus baseline model:

Comparative Fit Index (CFI) 1.000 0.991

Tucker-Lewis Index (TLI) 1.055 0.981

Root Mean Square Error of Approximation:

RMSEA 0.000 0.029

90 Percent Confidence Interval 0.000 0.153 0.000 0.178

P-value RMSEA <= 0.05 0.654 0.486

According to lavaan since it was using DWLS, AIC and BIC are calculated. However, we had someone analyze the data before for us was using AIC and BIC to compare which temporal relationship is better at explaining our data. So I decided to try OpenMx, I am very new to path analysis and OpenMx, so I am not quite sure if I am treating the variables correctly. I have attached the code and dataset and the model was run and gave me some estimates, however, RMSEA was not computed. And the estimates for each path was different compared to lavaan. At this point, I am not sure how to compare my results from these two methods and if it's even comparable since OpenMx I turned everything into ordinal but in lavaan I used dummy variables. And why is that OpenMx didn't calcualte RMSEA for categorical data.

rl

## Clarification?

First, you do not have a lot of observations (rows of data). Around 50 observations is rather small for SEM, especially with the number of parameters you are estimating (21).

Second, you may get different estimates with lavaan using diagonally weighted least squares (DWLS) vs OpenMx using full information maximum likelihood (FIML). They are using very different methods with different assumptions. I would guess they would be ballpark similar, but nowhere near identical.

Third, after the model you made runs, it comes back with NA in many standard errors. This generally indicates a problem either in starting values or in model specification. My bet here is model specification. I'd have to look more closely at the model, but I suspect it might not be identified.

Fourth, OpenMx is not reporting the RMSEA because it wants to save estimation time. For raw data, RMSEA, TLI, and CFI require some comparison models to be fit and these may take time equal to the model you're interested in, so we don't fit them by default. If you're using the Beta (and please do), then we have a helper function to make and run these comparison models.

sat.1 <- omxSaturatedModel(fit.1, run=TRUE)

summary(fit.1, SaturatedLikelihood=sat.1[[1]], IndependenceLikelihood=sat.1[[2]])

Hopefully this helps!

Log in or register to post comments

In reply to Clarification? by mhunter

## With binary data omxSaturatedModel builds under-identified model

The Cholesky used for the saturated model helps a lot with stability, but is problematic to constrain the diagonal of ltCov%*%t(ltCov) to equal 1. Non-linear constraints can be used:

omxsatmod <- mxModel(omxsatmod$Saturated,mxMatrix("Unit",6,1,name="U"),mxConstraint(U==diag2vec(satCov)))

The independence model has the same problem, but it's simpler (just give it a Stand matrix of the right dimensions with free=F). For now, since it's diagonal,

omxsatfit$Independence$indCov$free<-F

omxsatfit$Independence$indCov$values<-diag(6)

Even then, the model isn't very stable unless we make the lbound for the diagonal elements of ltCov say .01 - if they hit closer to zero, non-positive definiteness can cause havoc.

Log in or register to post comments

In reply to Clarification? by mhunter

## Thank you. Your answer really

Log in or register to post comments

## Phew, rather more difficult than it should be.

We need a saturated model here. One approach is to allow all covariances to be free parameters, while constraining all variances to 1.0. I note also that there were some means free as well as thresholds, so I repaired that by fixing all means to zero. The variance fixing has to be done because the data are binary. So the saturated model looks like this:

dataD <- read.csv("path input alt1.csv")

vars <- c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3")

dataD$Ab1 <- mxFactor(dataD$Ab1, levels = c(0 , 1))

dataD$Ab2 <- mxFactor(dataD$Ab2, levels = c(0 , 1))

dataD$Ab3 <- mxFactor(dataD$Ab3, levels = c(0 , 1))

dataD$IFNa1 <- mxFactor(dataD$IFNa1, levels = c(0 , 1), ordered = T)

dataD$IFNa2 <- mxFactor(dataD$IFNa2, levels = c(0 , 1), ordered = T)

dataD$IFNa3 <- mxFactor(dataD$IFNa3, levels = c(0 , 1), ordered = T)

fmat <- diag(6)[c(2,3,5,6),]

model.1 <- mxModel("Autoantibody before IFNa activity model",

type = "RAM",

mxData(observed = dataD[, vars], type = "raw"),

#Define variables

manifestVars = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"),

mxMatrix("Iden",6,6,name="I"), mxMatrix("Unit",4,1,name="u4"), mxMatrix("Full",4,6,values=fmat,name="fmat"),mxAlgebra(diag2vec(fmat%&%(F%&%solve(I-A)%&%S)), name="expVars"), mxConstraint(expVars==u4,name="Varcon"),

mxPath(from = "Ab1", to = "IFNa1", arrows = 2, values = 0, free = T, labels = "v1"),

mxPath(from = "Ab2", to = "IFNa2", arrows = 2, values = 0, free = T, labels = "v2"),

mxPath(from = "Ab3", to = "IFNa3", arrows = 2, values = 0, free = T, labels = "v3"),

mxPath(from = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"), arrows = 2, free = c(F, T, T, F, T, T),

values = c(1,1,1,1,1,1), lbound=.01),

#Paths

mxPath(from = c("Ab1", "IFNa1"), to = "IFNa2", arrows = 1, values = 0, free = c(T,T), labels = c("a1", "b1")),

mxPath(from = c("Ab2", "IFNa2"), to = "IFNa3", arrows = 1, values = 0, free = c(T,T), labels = c("a2", "b2")),

mxPath(from = "Ab1", to = "Ab2", arrows = 1, values = 0, free = T, labels = "ab1"),

mxPath(from = "Ab2", to = "Ab3", arrows = 1, values = 0, free = T, labels = "ab2"),

mxPath(from = "IFNa1", to = "IFNa2", arrows = 1, values = 0, free = T, labels = "IFN1"),

mxPath(from = "IFNa2", to = "IFNa3", arrows = 1, values = 0, free = T, labels = "IFN2"),

#Means

mxPath(from = "one", to = c("Ab1", "Ab2", "Ab3", "IFNa1", "IFNa2", "IFNa3"), values = 0,

arrows = 1, free = c(F)),

#Threshold matrix

mxMatrix(type = "Full", nrow = 1, ncol = 6, free = c(T, T, T, T, T, T),

values = c(-0.18173563, -1.27430357,-1.75717640, 0.37933827,-0.01542988, -0.82819033), byrow = T, dimnames = list(c(), c('Ab1','Ab2', 'Ab3', 'IFNa1', 'IFNa2', 'IFNa3')),

name = "Threshold"),

#Means and covaraince matrix

mxRAMObjective(A="A", S="S", F="F", M="M", thresholds = "Threshold")

)

summary(fit.1 <- mxRun(model.1,unsafe=T), SaturatedLikelihood=satfit.1)

We have to add some matrix algebra and nonlinear constraints to force the variances of the dependent (endogenous) variables to 1.0. This is pretty tricky stuff, and I believe that OpenMx should make this much easier than it is. I use a matrix fmat to pluck out the endogenous variables, numbers 2,3,5 and 6.

The model you want to fit looks like this, I think:

fmat <- diag(6)[c(2,3,5,6),]

model.1 <- mxModel("Autoantibody before IFNa activity model",

type = "RAM",

mxData(observed = dataD[, vars], type = "raw"),

#Define variables

manifestVars = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"),

mxMatrix("Iden",6,6,name="I"), mxMatrix("Unit",4,1,name="u4"),

mxMatrix("Full",4,6,values=fmat,name="fmat"),

mxAlgebra(diag2vec(fmat%&%(F%&%solve(I-A)%&%S)), name="expVars"),

mxConstraint(expVars==u4,name="Varcon"),

mxPath(from = "Ab1", to = "IFNa1", arrows = 2, values = 0, free = T, labels = "v1"),

mxPath(from = "Ab2", to = "IFNa2", arrows = 2, values = 0, free = T, labels = "v2"),

mxPath(from = "Ab3", to = "IFNa3", arrows = 2, values = 0, free = T, labels = "v3"),

mxPath(from = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"), arrows = 2, free = c(F, T, T, F, T, T),

values = c(1,1,1,1,1,1), lbound=.01),

#Paths

mxPath(from = c("Ab1", "IFNa1"), to = "IFNa2", arrows = 1, values = 0, free = c(T,T), labels = c("a1", "b1")),

mxPath(from = c("Ab2", "IFNa2"), to = "IFNa3", arrows = 1, values = 0, free = c(T,T), labels = c("a2", "b2")),

mxPath(from = "Ab1", to = "Ab2", arrows = 1, values = 0, free = T, labels = "ab1"),

mxPath(from = "Ab2", to = "Ab3", arrows = 1, values = 0, free = T, labels = "ab2"),

mxPath(from = "IFNa1", to = "IFNa2", arrows = 1, values = 0, free = T, labels = "IFN1"),

mxPath(from = "IFNa2", to = "IFNa3", arrows = 1, values = 0, free = T, labels = "IFN2"),

#Means

mxPath(from = "one", to = c("Ab1", "Ab2", "Ab3", "IFNa1", "IFNa2", "IFNa3"), values = 0,

arrows = 1, free = c(F)),

#Threshold matrix

mxMatrix(type = "Full", nrow = 1, ncol = 6, free = c(T, T, T, T, T, T),

values = c(-0.18173563, -1.27430357,-1.75717640, 0.37933827,-0.01542988, -0.82819033), byrow = T, dimnames = list(c(), c('Ab1','Ab2', 'Ab3', 'IFNa1', 'IFNa2', 'IFNa3')),

name = "Threshold"),

#Means and covaraince matrix

mxRAMObjective(A="A", S="S", F="F", M="M", thresholds = "Threshold")

)

summary(fit.1 <- mxRun(model.1), SaturatedLikelihood=satfit.1)

The output looks like this to me (and I would like the developers to look at it because I am not sure that the chi-squared df are correct:

> summary(fit.1 <- mxRun(model.1), SaturatedLikelihood=satfit.1)

Running Autoantibody before IFNa activity model

Summary of Autoantibody before IFNa activity model

free parameters:

name matrix row col Estimate lbound ubound

1 IFN1 A IFNa2 IFNa1 0.31453547

2 IFN2 A IFNa3 IFNa2 0.63617617

3 a1 A IFNa2 Ab1 0.32507136

4 ab1 A Ab2 Ab1 0.86859883

5 a2 A IFNa3 Ab2 0.32619533

6 ab2 A Ab3 Ab2 0.90341107

7 Autoantibody before IFNa activity model.S[2,2] S IFNa2 IFNa2 0.73627615 0.01

8 Autoantibody before IFNa activity model.S[3,3] S IFNa3 IFNa3 0.24049468 0.01

9 v1 S IFNa1 Ab1 0.28910522

10 v2 S IFNa2 Ab2 0.23711782

11 Autoantibody before IFNa activity model.S[5,5] S Ab2 Ab2 0.24553605 0.01

12 v3 S IFNa3 Ab3 -0.15010803

13 Autoantibody before IFNa activity model.S[6,6] S Ab3 Ab3 0.18384846 0.01

14 Autoantibody before IFNa activity model.Threshold[1,1] Threshold 1 Ab1 -0.15334071

15 Autoantibody before IFNa activity model.Threshold[1,2] Threshold 1 Ab2 -0.60399633

16 Autoantibody before IFNa activity model.Threshold[1,3] Threshold 1 Ab3 -0.73879246

17 Autoantibody before IFNa activity model.Threshold[1,4] Threshold 1 IFNa1 0.39794957

18 Autoantibody before IFNa activity model.Threshold[1,5] Threshold 1 IFNa2 0.01252314

19 Autoantibody before IFNa activity model.Threshold[1,6] Threshold 1 IFNa3 -0.38241874

`observed statistics: 316`

Constraint 'Varcon' contributes 4 observed statistics.

estimated parameters: 19

degrees of freedom: 297

-2 log likelihood: 312.2739

saturated -2 log likelihood: 308.9829

number of observations: 52

chi-square: X2 ( df=2 ) = 3.290923, p = 0.1929235

Information Criteria:

| df Penalty | Parameters Penalty | Sample-Size Adjusted

AIC: -281.7261 350.2739 NA

BIC: -861.2455 387.3475 327.6815

CFI: NA

TLI: NA

RMSEA: 0.1114124 [95% CI (0, 0.3506192)]

Some of your fit indices are missing.

To get them, fit saturated and independence models, and include them with

summary(yourModel, SaturatedLikelihood=..., IndependenceLikelihood=...).

timestamp: 2014-08-28 10:04:35

wall clock time: 0.335669 secs

OpenMx version number: 2.0.0.3766

Need help? See help(mxSummary)

Finally, note that OpenMx is using a normal theory threshold model here, effectively working with tetrachoric correlations. These may differ from the statistics being used in other software. Some robust methods use Pearson correlations, which may underestimate the latent correlation relative to those of the threshold model.

Log in or register to post comments

In reply to Phew, rather more difficult than it should be. by neale

## Thank you so much for your

Log in or register to post comments