You are here

Categorical Data in both independent and dependent variables

6 posts / 0 new
Last post
rl's picture
rl
Offline
Joined: 08/26/2014 - 15:42
Categorical Data in both independent and dependent variables
AttachmentSize
Binary Data model.R2.79 KB
PDF icon Models.pdf19 KB
File path input alt1.csv707 bytes

I have a data set with six binary variables, which I am trying to determine the temporal relationship. I was using lavaan R package, where they suggested to use dummy variable for endogenous variables (independent) and use ordered for exogenous (dependent variables). I was using the model as described in pdf file. I have gotten following results:

lavaan (0.5-16) converged normally after 31 iterations

Number of observations 51

Estimator DWLS Robust
Minimum Function Test Statistic 5.699 7.295
Degrees of freedom 7 7
P-value (Chi-square) 0.575 0.399
Scaling correction factor 1.006
Shift parameter 1.632
for simple second-order correction (Mplus variant)

Model test baseline model:

Minimum Function Test Statistic 61.153 45.447
Degrees of freedom 14 14
P-value 0.000 0.000

User model versus baseline model:

Comparative Fit Index (CFI) 1.000 0.991
Tucker-Lewis Index (TLI) 1.055 0.981

Root Mean Square Error of Approximation:

RMSEA 0.000 0.029
90 Percent Confidence Interval 0.000 0.153 0.000 0.178
P-value RMSEA <= 0.05 0.654 0.486

According to lavaan since it was using DWLS, AIC and BIC are calculated. However, we had someone analyze the data before for us was using AIC and BIC to compare which temporal relationship is better at explaining our data. So I decided to try OpenMx, I am very new to path analysis and OpenMx, so I am not quite sure if I am treating the variables correctly. I have attached the code and dataset and the model was run and gave me some estimates, however, RMSEA was not computed. And the estimates for each path was different compared to lavaan. At this point, I am not sure how to compare my results from these two methods and if it's even comparable since OpenMx I turned everything into ordinal but in lavaan I used dummy variables. And why is that OpenMx didn't calcualte RMSEA for categorical data.

rl

mhunter's picture
Offline
Joined: 07/31/2009 - 15:26
Clarification?

I'm not totally sure what your question is, but here are a few answers.

First, you do not have a lot of observations (rows of data). Around 50 observations is rather small for SEM, especially with the number of parameters you are estimating (21).

Second, you may get different estimates with lavaan using diagonally weighted least squares (DWLS) vs OpenMx using full information maximum likelihood (FIML). They are using very different methods with different assumptions. I would guess they would be ballpark similar, but nowhere near identical.

Third, after the model you made runs, it comes back with NA in many standard errors. This generally indicates a problem either in starting values or in model specification. My bet here is model specification. I'd have to look more closely at the model, but I suspect it might not be identified.

Fourth, OpenMx is not reporting the RMSEA because it wants to save estimation time. For raw data, RMSEA, TLI, and CFI require some comparison models to be fit and these may take time equal to the model you're interested in, so we don't fit them by default. If you're using the Beta (and please do), then we have a helper function to make and run these comparison models.

sat.1 <- omxSaturatedModel(fit.1, run=TRUE)
summary(fit.1, SaturatedLikelihood=sat.1[[1]], IndependenceLikelihood=sat.1[[2]])

Hopefully this helps!

neale's picture
Offline
Joined: 07/31/2009 - 15:14
With binary data omxSaturatedModel builds under-identified model

So the variances have to be fixed at 1 for binary data. Possibly we could be smarter about it (detecting binary variables)? I would use the fix the first two thresholds at 0 & 1 for 3+ category data.

The Cholesky used for the saturated model helps a lot with stability, but is problematic to constrain the diagonal of ltCov%*%t(ltCov) to equal 1. Non-linear constraints can be used:

 omxsatmod <- mxModel(omxsatmod$Saturated,mxMatrix("Unit",6,1,name="U"),mxConstraint(U==diag2vec(satCov)))

The independence model has the same problem, but it's simpler (just give it a Stand matrix of the right dimensions with free=F). For now, since it's diagonal,

omxsatfit$Independence$indCov$free<-F
omxsatfit$Independence$indCov$values<-diag(6)

Even then, the model isn't very stable unless we make the lbound for the diagonal elements of ltCov say .01 - if they hit closer to zero, non-positive definiteness can cause havoc.

rl's picture
rl
Offline
Joined: 08/26/2014 - 15:42
Thank you. Your answer really

Thank you. Your answer really helps. I will have to look closer at my model and I agree with about the sample size.

neale's picture
Offline
Joined: 07/31/2009 - 15:14
Phew, rather more difficult than it should be.

Hi Rufei

We need a saturated model here. One approach is to allow all covariances to be free parameters, while constraining all variances to 1.0. I note also that there were some means free as well as thresholds, so I repaired that by fixing all means to zero. The variance fixing has to be done because the data are binary. So the saturated model looks like this:

dataD <- read.csv("path input alt1.csv")
vars <- c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3")
dataD$Ab1 <- mxFactor(dataD$Ab1, levels = c(0 , 1))
dataD$Ab2 <- mxFactor(dataD$Ab2, levels = c(0 , 1))
dataD$Ab3 <- mxFactor(dataD$Ab3, levels = c(0 , 1))
dataD$IFNa1 <- mxFactor(dataD$IFNa1, levels = c(0 , 1), ordered = T)
dataD$IFNa2 <- mxFactor(dataD$IFNa2, levels = c(0 , 1), ordered = T)
dataD$IFNa3 <- mxFactor(dataD$IFNa3, levels = c(0 , 1), ordered = T)
fmat <- diag(6)[c(2,3,5,6),]
model.1 <- mxModel("Autoantibody before IFNa activity model", 
                   type = "RAM",
                   mxData(observed = dataD[, vars], type = "raw"),
                   #Define variables
                   manifestVars = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"),
                   mxMatrix("Iden",6,6,name="I"), mxMatrix("Unit",4,1,name="u4"), mxMatrix("Full",4,6,values=fmat,name="fmat"),mxAlgebra(diag2vec(fmat%&%(F%&%solve(I-A)%&%S)), name="expVars"), mxConstraint(expVars==u4,name="Varcon"),
                   mxPath(from = "Ab1", to = "IFNa1", arrows = 2, values = 0, free = T, labels = "v1"),
                   mxPath(from = "Ab2", to = "IFNa2", arrows = 2, values = 0, free = T, labels = "v2"),
                   mxPath(from = "Ab3", to = "IFNa3", arrows = 2, values = 0, free = T, labels = "v3"),
                   mxPath(from = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"), arrows = 2, free = c(F, T, T, F, T, T),
                          values = c(1,1,1,1,1,1), lbound=.01),
                   #Paths
                   mxPath(from = c("Ab1", "IFNa1"), to = "IFNa2", arrows = 1, values = 0, free = c(T,T), labels = c("a1", "b1")),
                   mxPath(from = c("Ab2", "IFNa2"), to = "IFNa3", arrows = 1, values = 0, free = c(T,T), labels = c("a2", "b2")),
                   mxPath(from = "Ab1", to = "Ab2", arrows = 1, values = 0, free = T, labels = "ab1"),
                   mxPath(from = "Ab2", to = "Ab3", arrows = 1, values = 0, free = T, labels = "ab2"),
                   mxPath(from = "IFNa1", to = "IFNa2", arrows = 1, values = 0, free = T, labels = "IFN1"),
                   mxPath(from = "IFNa2", to = "IFNa3", arrows = 1, values = 0, free = T, labels = "IFN2"),
                   #Means
                   mxPath(from = "one", to = c("Ab1", "Ab2", "Ab3", "IFNa1", "IFNa2", "IFNa3"), values = 0,
                          arrows = 1, free = c(F)),
                   #Threshold matrix
                   mxMatrix(type = "Full", nrow = 1, ncol = 6, free = c(T, T, T, T, T, T),
                            values = c(-0.18173563, -1.27430357,-1.75717640,  0.37933827,-0.01542988,  -0.82819033), byrow = T, dimnames = list(c(), c('Ab1','Ab2', 'Ab3', 'IFNa1', 'IFNa2', 'IFNa3')),
                            name = "Threshold"),
                   #Means and covaraince matrix
                   mxRAMObjective(A="A", S="S", F="F", M="M", thresholds = "Threshold")
)
summary(fit.1 <- mxRun(model.1,unsafe=T), SaturatedLikelihood=satfit.1)

We have to add some matrix algebra and nonlinear constraints to force the variances of the dependent (endogenous) variables to 1.0. This is pretty tricky stuff, and I believe that OpenMx should make this much easier than it is. I use a matrix fmat to pluck out the endogenous variables, numbers 2,3,5 and 6.

The model you want to fit looks like this, I think:

fmat <- diag(6)[c(2,3,5,6),]
model.1 <- mxModel("Autoantibody before IFNa activity model", 
                   type = "RAM",
                   mxData(observed = dataD[, vars], type = "raw"),
                   #Define variables
                   manifestVars = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"),
                   mxMatrix("Iden",6,6,name="I"), mxMatrix("Unit",4,1,name="u4"),
                     mxMatrix("Full",4,6,values=fmat,name="fmat"),
                     mxAlgebra(diag2vec(fmat%&%(F%&%solve(I-A)%&%S)), name="expVars"),
                     mxConstraint(expVars==u4,name="Varcon"),
                   mxPath(from = "Ab1", to = "IFNa1", arrows = 2, values = 0, free = T, labels = "v1"),
                   mxPath(from = "Ab2", to = "IFNa2", arrows = 2, values = 0, free = T, labels = "v2"),
                   mxPath(from = "Ab3", to = "IFNa3", arrows = 2, values = 0, free = T, labels = "v3"),
                   mxPath(from = c("IFNa1", "IFNa2", "IFNa3", "Ab1", "Ab2", "Ab3"), arrows = 2, free = c(F, T, T, F, T, T),
                          values = c(1,1,1,1,1,1), lbound=.01),
                   #Paths
                   mxPath(from = c("Ab1", "IFNa1"), to = "IFNa2", arrows = 1, values = 0, free = c(T,T), labels = c("a1", "b1")),
                   mxPath(from = c("Ab2", "IFNa2"), to = "IFNa3", arrows = 1, values = 0, free = c(T,T), labels = c("a2", "b2")),
                   mxPath(from = "Ab1", to = "Ab2", arrows = 1, values = 0, free = T, labels = "ab1"),
                   mxPath(from = "Ab2", to = "Ab3", arrows = 1, values = 0, free = T, labels = "ab2"),
                   mxPath(from = "IFNa1", to = "IFNa2", arrows = 1, values = 0, free = T, labels = "IFN1"),
                   mxPath(from = "IFNa2", to = "IFNa3", arrows = 1, values = 0, free = T, labels = "IFN2"),
                   #Means
                   mxPath(from = "one", to = c("Ab1", "Ab2", "Ab3", "IFNa1", "IFNa2", "IFNa3"), values = 0,
                          arrows = 1, free = c(F)),
                   #Threshold matrix
                   mxMatrix(type = "Full", nrow = 1, ncol = 6, free = c(T, T, T, T, T, T),
                            values = c(-0.18173563, -1.27430357,-1.75717640,  0.37933827,-0.01542988,  -0.82819033), byrow = T, dimnames = list(c(), c('Ab1','Ab2', 'Ab3', 'IFNa1', 'IFNa2', 'IFNa3')),
                            name = "Threshold"),
                   #Means and covaraince matrix
                   mxRAMObjective(A="A", S="S", F="F", M="M", thresholds = "Threshold")
)
summary(fit.1 <- mxRun(model.1), SaturatedLikelihood=satfit.1)

The output looks like this to me (and I would like the developers to look at it because I am not sure that the chi-squared df are correct:

> summary(fit.1 <- mxRun(model.1), SaturatedLikelihood=satfit.1)
Running Autoantibody before IFNa activity model 
Summary of Autoantibody before IFNa activity model 
 
free parameters:
                                                     name    matrix   row   col    Estimate lbound ubound
1                                                    IFN1         A IFNa2 IFNa1  0.31453547              
2                                                    IFN2         A IFNa3 IFNa2  0.63617617              
3                                                      a1         A IFNa2   Ab1  0.32507136              
4                                                     ab1         A   Ab2   Ab1  0.86859883              
5                                                      a2         A IFNa3   Ab2  0.32619533              
6                                                     ab2         A   Ab3   Ab2  0.90341107              
7          Autoantibody before IFNa activity model.S[2,2]         S IFNa2 IFNa2  0.73627615   0.01       
8          Autoantibody before IFNa activity model.S[3,3]         S IFNa3 IFNa3  0.24049468   0.01       
9                                                      v1         S IFNa1   Ab1  0.28910522              
10                                                     v2         S IFNa2   Ab2  0.23711782              
11         Autoantibody before IFNa activity model.S[5,5]         S   Ab2   Ab2  0.24553605   0.01       
12                                                     v3         S IFNa3   Ab3 -0.15010803              
13         Autoantibody before IFNa activity model.S[6,6]         S   Ab3   Ab3  0.18384846   0.01       
14 Autoantibody before IFNa activity model.Threshold[1,1] Threshold     1   Ab1 -0.15334071              
15 Autoantibody before IFNa activity model.Threshold[1,2] Threshold     1   Ab2 -0.60399633              
16 Autoantibody before IFNa activity model.Threshold[1,3] Threshold     1   Ab3 -0.73879246              
17 Autoantibody before IFNa activity model.Threshold[1,4] Threshold     1 IFNa1  0.39794957              
18 Autoantibody before IFNa activity model.Threshold[1,5] Threshold     1 IFNa2  0.01252314              
19 Autoantibody before IFNa activity model.Threshold[1,6] Threshold     1 IFNa3 -0.38241874              
 
observed statistics:  316 
Constraint 'Varcon' contributes 4 observed statistics. 
estimated parameters:  19 
degrees of freedom:  297 
-2 log likelihood:  312.2739 
saturated -2 log likelihood:  308.9829 
number of observations:  52 
chi-square:  X2 ( df=2 ) = 3.290923,  p = 0.1929235
Information Criteria: 
      |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
AIC:      -281.7261               350.2739                       NA
BIC:      -861.2455               387.3475                 327.6815
CFI: NA 
TLI: NA 
RMSEA:  0.1114124  [95% CI (0, 0.3506192)]
Some of your fit indices are missing.
  To get them, fit saturated and independence models, and include them with
  summary(yourModel, SaturatedLikelihood=..., IndependenceLikelihood=...). 
timestamp: 2014-08-28 10:04:35 
wall clock time: 0.335669 secs 
OpenMx version number: 2.0.0.3766 
Need help?  See help(mxSummary) 

Finally, note that OpenMx is using a normal theory threshold model here, effectively working with tetrachoric correlations. These may differ from the statistics being used in other software. Some robust methods use Pearson correlations, which may underestimate the latent correlation relative to those of the threshold model.

rl's picture
rl
Offline
Joined: 08/26/2014 - 15:42
Thank you so much for your

Thank you so much for your help.