Measurement invariance in CFA with ordinal data (converegence issues)

I have been trying to replicate measurement invariance for the CFA model with ordinal data (theta parametrization) in Kline (2015) but have failed due to convergence issues (even after roughly adjusting starting values for thresholds based on lavaan output). I also tried mxAutoStart(). I am wondering if I am missing something.
library(OpenMx)
radloffdata <- read.table("https://www.guilford.com/add/kline/radloff-lavaan.txt", sep="\t", header = FALSE)
colnames(radloffdata) <- c("x1", "x2", "x3", "x4", "x5", "g")
radloffdata[,1] <- ordered(radloffdata[,1])
radloffdata[,2] <- ordered(radloffdata[,2])
radloffdata[,3] <- ordered(radloffdata[,3])
radloffdata[,4] <- ordered(radloffdata[,4])
radloffdata[,5] <- ordered(radloffdata[,5])
rawData1 <- subset(radloffdata, g == 1)
rawData2 <- subset(radloffdata, g == 2)
# common parameters
alphas <- mxPath(from = "one", to = c("x1", "x2", "x3", "x4", "x5"), arrows = 1,
free = FALSE, values = 0)
varDep <- mxPath(from = "depression", arrows = 2, free = TRUE, values = .5)
lambdas <- mxPath(from = "depression", to = c("x1", "x2", "x3", "x4", "x5"), arrows = 1,
free = c(FALSE, TRUE, TRUE, TRUE, TRUE), values = c(1, 1, 1, 1, 1))
# model for group 1
dataRaw1 <- mxData(observed = rawData1, type = "raw")
kappa1 <- mxPath(from = "one", to = "depression", arrows = 1, free = FALSE, values = 0, labels = "kappa1")
thetas1 <- mxPath(from = c("x1", "x2", "x3", "x4", "x5"), arrows = 2, free = FALSE, values = 1)
thresholds1 <- mxThreshold(vars = c("x1", "x2", "x3", "x4", "x5"), nThresh = c(3, 3, 3, 3, 3),
free = TRUE, values = c(1, 1.5, 2.5,
1.5, 2, 2.5,
1, 2, 2.5,
.5, 1.5, 2,
1, 2, 2.5),
labels = c("t11", "t12", "t13", "t21", "t22", "t23", "t31", "t32", "t33",
"t41", "t42", "t43", "t51", "t52", "t53"))
m1 <- mxModel("m1", type = "RAM",
manifestVars = c("x1", "x2", "x3", "x4", "x5"), latentVars = "depression",
dataRaw1, alphas, kappa1, lambdas, thetas1, varDep, thresholds1)
# model for group 2
dataRaw2 <- mxData(observed = rawData2, type = "raw")
kappa2 <- mxPath(from = "one", to = "depression", arrows = 1, free = TRUE, values = .5, labels = "kappa2")
thetas2 <- mxPath(from = c("x1", "x2", "x3", "x4", "x5"), arrows = 2, free = TRUE, values = .7)
thresholds2 <- mxThreshold(vars = c("x1", "x2", "x3", "x4", "x5"), nThresh = c(3, 3, 3, 3, 3),
free = TRUE, values = c(1, 1.5, 2,
1.5, 2.2, 2.7,
1, 1.5, 2,
.5, .7, 1,
1, 1.5, 2),
labels = c("t11", "t12", "t2_13", "t21", "t2_22", "t2_23", "t31", "t2_32", "t2_33",
"t41", "t2_42", "t2_43", "t51", "t2_52", "t2_53"))
m2 <- mxModel("m2", type = "RAM",
manifestVars = c("x1", "x2", "x3", "x4", "x5"), latentVars = "depression",
dataRaw2, alphas, kappa2, lambdas, thetas2, varDep, thresholds2)
# combine and run models
MGOrd <- mxModel("MG Configural Invariance",
m1, m2,
mxFitFunctionMultigroup(c("m1.fitfunction", "m2.fitfunction")))
MGOrdFit <- mxRun(MGOrd)
MGOrdRef <- mxRefModels(MGOrdFit, run = TRUE)
summary(MGOrdFit, refModels = MGOrdRef)
lavaan code for the same example
Log in or register to post comments
mxVersion() ?
Log in or register to post comments
In reply to mxVersion() ? by AdminRobK
mxVersion() output
> mxVersion()
OpenMx version: 2.11.5 [GIT v2.11.5]
R version: R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32
Default optimiser: CSOLNP
NPSOL-enabled?: No
OpenMP-enabled?: No
Warning message:
package ‘OpenMx’ was built under R version 3.5.1
Log in or register to post comments
Hi
I think there are two problems. One is that the model specified is not identified. This can be deduced from the number of parameters being greater than the number of parameters for the saturated model. The main reason for under-identification is that when thresholds are estimated, it is not possible to simultaneously estimate residual variances. Mehta PD, Neale, M.C., Flay BR: Squeezing interval change from ordinal panel data: latent growth curves with ordinal outcomes. Psychol Methods 2004; 9:301-333 describes how to avoid the issue by fixing the first two thresholds to arbitrary values, which identifies means and variances. I have modified your script to fix the thresholds (0 and 1 are more commonly used for this purpose but the values you supplied work as well and it's a matter of a simple transform to get back to the original z-score scale for the thresholds if needed). You may wish to liberate parameters for item means as well - zero is probably a poor choice :) and likely contributes to the poor fit of the model.
The second issue is that the starting values were not that great, and with ordinal data it can be difficult to get to the solution precisely on the first try. Therefore I substituted mxTryHard() for mxRun(), which seemed to solve the optimization issue. There's an mxTryHardOrdinal() which might have worked better, but I forgot to use it here.
radloffdata <- read.table("https://www.guilford.com/add/kline/radloff-lavaan.txt", sep="\t", header = FALSE)
colnames(radloffdata) <- c("x1", "x2", "x3", "x4", "x5", "g")
radloffdata[,1] <- ordered(radloffdata[,1])
radloffdata[,2] <- ordered(radloffdata[,2])
radloffdata[,3] <- ordered(radloffdata[,3])
radloffdata[,4] <- ordered(radloffdata[,4])
radloffdata[,5] <- ordered(radloffdata[,5])
rawData1 <- subset(radloffdata, g == 1)
rawData2 <- subset(radloffdata, g == 2)
# common parameters
alphas <- mxPath(from = "one", to = c("x1", "x2", "x3", "x4", "x5"), arrows = 1,
free = FALSE, values = 0)
varDep <- mxPath(from = "depression", arrows = 2, free = TRUE, values = .5)
lambdas <- mxPath(from = "depression", to = c("x1", "x2", "x3", "x4", "x5"), arrows = 1,
free = c(FALSE, TRUE, TRUE, TRUE, TRUE), values = c(1, 1, 1, 1, 1))
# model for group 1
dataRaw1 <- mxData(observed = rawData1, type = "raw")
kappa1 <- mxPath(from = "one", to = "depression", arrows = 1, free = FALSE, values = 0, labels = "kappa1")
thetas1 <- mxPath(from = c("x1", "x2", "x3", "x4", "x5"), arrows = 2, free = FALSE, values = 1)
thresholds1 <- mxThreshold(vars = c("x1", "x2", "x3", "x4", "x5"), nThresh = c(3, 3, 3, 3, 3),
free = c(rep(c(F,F,T),5)), values = c(1, 1.5, 2.5,
1.5, 2, 2.5,
1, 2, 2.5,
.5, 1.5, 2,
1, 2, 2.5),
labels = c("t11", "t12", "t13", "t21", "t22", "t23", "t31", "t32", "t33",
"t41", "t42", "t43", "t51", "t52", "t53"))
m1 <- mxModel("m1", type = "RAM",
manifestVars = c("x1", "x2", "x3", "x4", "x5"), latentVars = "depression",
dataRaw1, alphas, kappa1, lambdas, thetas1, varDep, thresholds1)
# model for group 2
dataRaw2 <- mxData(observed = rawData2, type = "raw")
kappa2 <- mxPath(from = "one", to = "depression", arrows = 1, free = T, values = .5, labels = "kappa2")
thetas2 <- mxPath(from = c("x1", "x2", "x3", "x4", "x5"), arrows = 2, free = TRUE, values = .7)
thresholds2 <- mxThreshold(vars = c("x1", "x2", "x3", "x4", "x5"), nThresh = c(3, 3, 3, 3, 3),
free = c(rep(c(F,F,T),5)), values = c(1, 1.5, 2,
1.5, 2.2, 2.7,
1, 1.5, 2,
.5, .7, 1,
1, 1.5, 2),
labels = c("t11", "t12", "t2_13", "t21", "t2_22", "t2_23", "t31", "t2_32", "t2_33",
"t41", "t2_42", "t2_43", "t51", "t2_52", "t2_53"))
m2 <- mxModel("m2", type = "RAM",
manifestVars = c("x1", "x2", "x3", "x4", "x5"), latentVars = "depression",
dataRaw2, alphas, kappa2, lambdas, thetas2, varDep, thresholds2)
# combine and run models
MGOrd <- mxModel("MG Configural Invariance",
m1, m2,
mxFitFunctionMultigroup(c("m1.fitfunction", "m2.fitfunction")))
MGOrdFit <- mxTryHard(MGOrd)
MGOrdRef <- mxRefModels(MGOrdFit, run = TRUE)
summary(MGOrdFit, refModels = MGOrdRef)
Log in or register to post comments
Should each submodel be identified in its own?
Does that mean each model (m1 and m2) should be identified in its own in OpenMx (and not lavaan or Mplus??).
If we fix first two thresholds for each item and in each group, would this still produce the result for configural invariance since thresholds are not freely estimated?
I am sorry to trouble you with this but just wondering if there is something unique to OpenMx when multiple groups are of interest, or if it is something I overlooked but could not identify.
Log in or register to post comments
In reply to Should each submodel be identified in its own? by metinbulus
lavaan syntax / output
radloffdata <- read.table("https://www.guilford.com/add/kline/radloff-lavaan.txt", sep="\t", header = FALSE)
colnames(radloffdata) <- c("x1", "x2", "x3", "x4", "x5", "g")
radloffdata$x1 <- mxFactor(radloffdata$x1, levels = c(0, 1, 2, 3))
radloffdata$x2 <- mxFactor(radloffdata$x2, levels = c(0, 1, 2, 3))
radloffdata$x3 <- mxFactor(radloffdata$x3, levels = c(0, 1, 2, 3))
radloffdata$x4 <- mxFactor(radloffdata$x4, levels = c(0, 1, 2, 3))
radloffdata$x5 <- mxFactor(radloffdata$x5, levels = c(0, 1, 2, 3))
library(lavaan)
# model 1
# configural invariance
# includes just the minimal constraints needed for identification
Model1 <- "
dep =~ c(1, 1)*x1 + x2 + x3 + x4 + x5
x1 | c(t11, t11)*t1 + c(t12, t12)*t2 + t3
x2 | c(t21, t21)*t1 + t2 + t3
x3 | c(t31, t31)*t1 + t2 + t3
x4 | c(t41, t41)*t1 + t2 + t3
x5 | c(t51, t51)*t1 + t2 + t3
dep ~~ NA*dep
dep ~ c(0, NA)*1
x1 ~~ c(1, NA)*x1
x2 ~~ c(1, NA)*x2
x3 ~~ c(1, NA)*x3
x4 ~~ c(1, NA)*x4
x5 ~~ c(1, NA)*x5
"
outModel1 <- cfa(Model1, data=radloffdata, group="g", parameterization="theta", estimator="wlsmv")
summary(outModel1, fit=TRUE, standardized=TRUE, rsquare=TRUE)
lavaan 0.6-3 ended normally after 176 iterations
Optimization method NLMINB
Number of free parameters 46
Number of equality constraints 6
Number of observations per group
1 2004
2 248
Estimator DWLS Robust
Model Fit Test Statistic 14.222 25.162
Degrees of freedom 10 10
P-value (Chi-square) 0.163 0.005
Scaling correction factor 0.575
Shift parameter for each group:
1 0.372
2 0.046
for simple second-order correction (Mplus variant)
Chi-square for each group:
1 8.786 15.658
2 5.436 9.504
Model test baseline model:
Minimum Function Test Statistic 3408.088 2704.055
Degrees of freedom 20 20
P-value 0.000 0.000
User model versus baseline model:
Comparative Fit Index (CFI) 0.999 0.994
Tucker-Lewis Index (TLI) 0.998 0.989
Robust Comparative Fit Index (CFI) NA
Robust Tucker-Lewis Index (TLI) NA
Root Mean Square Error of Approximation:
RMSEA 0.019 0.037
90 Percent Confidence Interval 0.000 0.040 0.019 0.055
P-value RMSEA <= 0.05 0.995 0.878
Robust RMSEA NA
90 Percent Confidence Interval NA NA
Standardized Root Mean Square Residual:
SRMR 0.025 0.025
Parameter Estimates:
Information Expected
Information saturated (h1) model Unstructured
Standard Errors Robust.sem
Group 1 [1]:
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
dep =~
x1 1.000 0.767 0.609
x2 1.118 0.112 9.963 0.000 0.858 0.651
x3 1.637 0.154 10.629 0.000 1.255 0.782
x4 1.006 0.089 11.265 0.000 0.771 0.611
x5 1.577 0.151 10.441 0.000 1.210 0.771
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
dep 0.000 0.000 0.000
.x1 0.000 0.000 0.000
.x2 0.000 0.000 0.000
.x3 0.000 0.000 0.000
.x4 0.000 0.000 0.000
.x5 0.000 0.000 0.000
Thresholds:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
x1|t1 (t11) 0.973 0.047 20.639 0.000 0.973 0.772
x1|t2 (t12) 1.789 0.065 27.575 0.000 1.789 1.420
x1|t3 2.362 0.085 27.839 0.000 2.362 1.874
x2|t1 (t21) 1.376 0.063 21.940 0.000 1.376 1.044
x2|t2 2.033 0.080 25.291 0.000 2.033 1.543
x2|t3 2.469 0.095 25.929 0.000 2.469 1.874
x3|t1 (t31) 0.868 0.061 14.273 0.000 0.868 0.541
x3|t2 1.848 0.088 20.956 0.000 1.848 1.152
x3|t3 2.413 0.106 22.714 0.000 2.413 1.503
x4|t1 (t41) 0.364 0.037 9.739 0.000 0.364 0.288
x4|t2 1.263 0.048 26.433 0.000 1.263 1.000
x4|t3 1.894 0.061 31.121 0.000 1.894 1.500
x5|t1 (t51) 0.876 0.060 14.479 0.000 0.876 0.558
x5|t2 1.965 0.093 21.158 0.000 1.965 1.252
x5|t3 2.687 0.120 22.476 0.000 2.687 1.712
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
dep 0.588 0.085 6.885 0.000 1.000 1.000
.x1 1.000 1.000 0.630
.x2 1.000 1.000 0.576
.x3 1.000 1.000 0.388
.x4 1.000 1.000 0.627
.x5 1.000 1.000 0.406
Scales y*:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
x1 0.794 0.794 1.000
x2 0.759 0.759 1.000
x3 0.623 0.623 1.000
x4 0.792 0.792 1.000
x5 0.637 0.637 1.000
R-Square:
Estimate
x1 0.370
x2 0.424
x3 0.612
x4 0.373
x5 0.594
Group 2 [2]:
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
dep =~
x1 1.000 0.633 0.630
x2 1.267 0.228 5.563 0.000 0.802 0.603
x3 1.661 0.473 3.510 0.000 1.051 0.653
x4 0.531 0.133 3.993 0.000 0.336 0.588
x5 1.190 0.226 5.269 0.000 0.753 0.760
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
dep 0.295 0.156 1.893 0.058 0.466 0.466
.x1 0.000 0.000 0.000
.x2 0.000 0.000 0.000
.x3 0.000 0.000 0.000
.x4 0.000 0.000 0.000
.x5 0.000 0.000 0.000
Thresholds:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
x1|t1 (t11) 0.973 0.047 20.639 0.000 0.973 0.968
x1|t2 (t12) 1.789 0.065 27.575 0.000 1.789 1.780
x1|t3 2.153 0.154 13.951 0.000 2.153 2.142
x2|t1 (t21) 1.376 0.063 21.940 0.000 1.376 1.034
x2|t2 2.352 0.269 8.758 0.000 2.352 1.768
x2|t3 2.532 0.310 8.160 0.000 2.532 1.903
x3|t1 (t31) 0.868 0.061 14.273 0.000 0.868 0.539
x3|t2 1.658 0.339 4.891 0.000 1.658 1.031
x3|t3 2.055 0.494 4.160 0.000 2.055 1.277
x4|t1 (t41) 0.364 0.037 9.739 0.000 0.364 0.635
x4|t2 0.762 0.152 5.016 0.000 0.762 1.331
x4|t3 0.943 0.212 4.456 0.000 0.943 1.648
x5|t1 (t51) 0.876 0.060 14.479 0.000 0.876 0.883
x5|t2 1.573 0.241 6.518 0.000 1.573 1.587
x5|t3 1.998 0.353 5.656 0.000 1.998 2.015
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
dep 0.401 0.163 2.455 0.014 1.000 1.000
.x1 0.610 0.195 3.125 0.002 0.610 0.603
.x2 1.128 0.481 2.345 0.019 1.128 0.637
.x3 1.484 1.186 1.251 0.211 1.484 0.573
.x4 0.214 0.141 1.517 0.129 0.214 0.655
.x5 0.416 0.261 1.597 0.110 0.416 0.423
Scales y*:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
x1 0.995 0.995 1.000
x2 0.751 0.751 1.000
x3 0.621 0.621 1.000
x4 1.748 1.748 1.000
x5 1.008 1.008 1.000
R-Square:
Estimate
x1 0.397
x2 0.363
x3 0.427
x4 0.345
x5 0.577
Log in or register to post comments
bug in mxRefModels() ?
OpenMx version: 2.11.5.95 [GIT v2.11.5-95-gbef086ef-dirty]
R version: R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu
Default optimizer: CSOLNP
NPSOL-enabled?: Yes
OpenMP-enabled?: Yes
No. You can indeed use cross-group equality constraints to identify the overall fitted model, and it appears that what you've done actually is sufficient to identify the model. At least, the `mxCheckIdentification()` function says that your model is identified (at the start values and at the dubious solution). I also tried using `mxCheckIdentification()` to check your model using the v2.11.5 release, but it appears there was a bug in that function present in the release but repaired in the source repository since then. But, there appears to be a bug in `mxRefModels()` as well, which has NOT been repaired since v2.11.5. The saturated model in your case should have 50 free parameters, not 35: fifteen thresholds and 10 polychorics in each of 2 groups. But here are the free parameters in the "saturated" model:
free parameters:
name matrix row col Estimate lbound ubound
1 Saturated m1.satCov[1,2] Saturated m1.satCov x1 x2 0.4211160
2 Saturated m1.satCov[1,3] Saturated m1.satCov x1 x3 0.4658516
3 Saturated m1.satCov[2,3] Saturated m1.satCov x2 x3 0.4776376
4 Saturated m1.satCov[1,4] Saturated m1.satCov x1 x4 0.3975597
5 Saturated m1.satCov[2,4] Saturated m1.satCov x2 x4 0.4093033
6 Saturated m1.satCov[3,4] Saturated m1.satCov x3 x4 0.4575572
7 Saturated m1.satCov[1,5] Saturated m1.satCov x1 x5 0.4202486
8 Saturated m1.satCov[2,5] Saturated m1.satCov x2 x5 0.4876613
9 Saturated m1.satCov[3,5] Saturated m1.satCov x3 x5 0.6360375
10 Saturated m1.satCov[4,5] Saturated m1.satCov x4 x5 0.4631449
11 x1ThrDev1 Saturated m1.thresholdDeviations 1 x1 0.7544172 -Inf
12 x1ThrDev2 Saturated m1.thresholdDeviations 2 x1 0.6649991 0.01
13 x1ThrDev3 Saturated m1.thresholdDeviations 3 x1 0.4499984 0.01
14 x2ThrDev1 Saturated m1.thresholdDeviations 1 x2 1.0062369 -Inf
15 x2ThrDev2 Saturated m1.thresholdDeviations 2 x2 0.5285165 0.01
16 x2ThrDev3 Saturated m1.thresholdDeviations 3 x2 0.2991975 0.01
17 x3ThrDev1 Saturated m1.thresholdDeviations 1 x3 0.5143386 -Inf
18 x3ThrDev2 Saturated m1.thresholdDeviations 2 x3 0.5868987 0.01
19 x3ThrDev3 Saturated m1.thresholdDeviations 3 x3 0.3138043 0.01
20 x4ThrDev1 Saturated m1.thresholdDeviations 1 x4 0.2883979 -Inf
21 x4ThrDev2 Saturated m1.thresholdDeviations 2 x4 0.7056041 0.01
22 x4ThrDev3 Saturated m1.thresholdDeviations 3 x4 0.4837302 0.01
23 x5ThrDev1 Saturated m1.thresholdDeviations 1 x5 0.5507459 -Inf
24 x5ThrDev2 Saturated m1.thresholdDeviations 2 x5 0.6843228 0.01
25 x5ThrDev3 Saturated m1.thresholdDeviations 3 x5 0.4456533 0.01
26 Saturated m2.satCov[1,2] Saturated m2.satCov x1 x2 0.5204439
27 Saturated m2.satCov[1,3] Saturated m2.satCov x1 x3 0.3353616
28 Saturated m2.satCov[2,3] Saturated m2.satCov x2 x3 0.3702252
29 Saturated m2.satCov[1,4] Saturated m2.satCov x1 x4 0.3284878
30 Saturated m2.satCov[2,4] Saturated m2.satCov x2 x4 0.3170401
31 Saturated m2.satCov[3,4] Saturated m2.satCov x3 x4 0.3350630
32 Saturated m2.satCov[1,5] Saturated m2.satCov x1 x5 0.4858893
33 Saturated m2.satCov[2,5] Saturated m2.satCov x2 x5 0.3613383
34 Saturated m2.satCov[3,5] Saturated m2.satCov x3 x5 0.4695005
35 Saturated m2.satCov[4,5] Saturated m2.satCov x4 x5 0.4674975
What's happening in the "saturated" model is that all the thresholds are constrained equal across groups. I verified that by examining
MGOrdRef$Saturated$'Saturated m1'$thresholdDeviations
andMGOrdRef$Saturated$'Saturated m2'$thresholdDeviations
. Both had the same set of labels.So, now I don't think that identification is your problem. I think it's just a matter of getting an analysis of ordinal data to converge (which can be tricky). I have a few tips that I'll put in a subsequent post.
Log in or register to post comments
In reply to bug in mxRefModels() ? by AdminRobK
fixed
Log in or register to post comments
My mistake
Log in or register to post comments
convergence advice for analysis of ordinal data
Here are some things you can do (they are not necessarily mutually exclusive):
1. SADMVN, the algorithm OpenMx uses for the multivariate-normal probability integral, can be given an upper limit ,'mvnRelEps', to how much error is acceptable in the probabilities it reports to OpenMx. You can reduce this limit (i.e., request less error from SADMVN) with something like
MGOrd <- mxOption(MGOrd, 'mvnRelEps', mxOption(MGOrd, 'mvnRelEps')/5)
before you run your MxModel. Reducing 'mvnRelEps' will make SADMVN, and therefore OpenMx, run more slowly. Be careful: it is possible to set 'mvnRelEps' so low as to make it mathematically impossible for SADMVN to satisfy it. In that case, OpenMx will warn you about it, but maybe after running for a very long time.
2. You can replace `mxRun()` with
mxTryHardOrdinal()
.3. You can run your model through
mxAutoStart()
before running it.Log in or register to post comments