Identification of CFA with continuous vs. ordinal variables

Posted on
Picture of user. pehkawn Joined: 05/24/2020
As a step to building a larger SEM, I am currently trying to fit a MIMIC CFA with ordinal indicators.

As demonstrated in the sample script and dataset, the model (using ML) with continuous variables the model is identified and no error is thrown.
If I recode variables as ordinal, and add thresholds to the model I get the error messages differing based on fit function.

1. Continuous models provides estimate sizes within a range that what would seem reasonable, the model is locally identified, and no error messages are returned.
2. An equivalent ordinal model, with with added thresholds, fit with ML, returns the error: "`Information matrix is not positive definite (not at a candidate optimum).
Be suspicious of these results. At minimum, do not trust the standard errors.`" Fit function is locally identified (`mxCheckIdentification()`), but the estimates are unresonable.

3. An equivalent model fit with WLS, returns the same error as fitting the ordinal model with WLS. Only now I also the following error from `mxCheckIdentification()` and `mxStandardizeRAMPaths`: "`Error in solve.default(I - A) :
system is computationally singular: reciprocal condition number = 1.92319e-31`"

I am trying to understand what these error messages mean, and why they occur with ordinal variables and not with continuous data? Is there a way avoid this from happening? For example, does the data lack sufficient power to handle adding thresholds?

---
###### **Supplementary information:**
The larger SEM I'm trying to model, this factor is the main predictor. Since it has multiple factors with ordinal indicators, WLS is realistically the only way I can fit the larger model. The SEM model keep throwing error messages, and estimates that are both disproportionate and inconsistent when I try fit the data with this factor and its ordinal indicators. The problem seem to be with this particular factor (see attached script and dataset), as the same errors are thrown when I run it as a CFA. Also, the SEM will not throw an error if this factor and indicators are either removed or its indicators are added as continuous.

Replied on Mon, 04/25/2022 - 18:34
Picture of user. AdminHunter Joined: 03/01/2013

Thank you for the full script! It really helps to see what is going on.

If I understand the intended model correctly, then you have x1, x2, and x5 as predictors of a latent factor L1. L1 has indicators x3 and x4.

The script does not run as is. First, you need to load('df_cont'), not readRDS. Second, it's missing man_var. I assumed you intended to have residual variances on the manifest variables, and added these paths. Third, all of the residual means were fixed to 1. For the continuous variable model, this causes terrible misfit which leads to many problems. Fourth, the starting values were not particularly reasonable, so I modified them. Fifth, it's not clear to me how you intend to identify the ordinal variables. Some strategies for the ordinal variables might not work for the binary ones. Sixth, for speed, I'd recommend trying the WLS ordinal model before the ML ordinal model. Seventh, you might want to consider x1, x2, and x5 as "exogenous" predictors with the data.-style syntax instead of assuming any distribution for them. You could dummy code them if you think they are not really continuous. Eighth, I don't know what a "standardized" (e.g., via standardize RAM paths) MIMIC model implies about the regression effects going into the latent factors. Ninth, I wouldn't call the regression weights for the variables that predict the latent factor "loadings". It's confusing to call them that in the comments.

I've attached a revised script that fixes some of these problems, but not all of them. Does that take you a step closer to a solution?

Replied on Tue, 04/26/2022 - 08:57
Picture of user. pehkawn Joined: 05/24/2020

In reply to by AdminHunter

Thanks for your detailed reply. Your revisions definitely helped. The errors thrown in my initial model is now gone.

Regarding your first and second points, thanks for pointing out the code errors and lack of specification. These were mainly caused when trying to extract my relevant code to something presentable here.

1. I initially tried uploading the data as RDS, but the web site doesn't accept '.rds' files. I forgot to change from `readRDS` to `load`.
2. I did put the residual variances in there was there, but I must have accidentally removed it during edit.

I still have some questions related to your subsequent points, however. I have listed comments and questions corresponding to your numbering:

3. Since the indicators are actually ordinal, I originally fixed the residual means to **0**, and the *residual variances were fixed to 1* (although `man_var` was unintentionally dropped from the sample script I sent) according to the [example on ordinal model specification in the documentation](https://vipbg.vcu.edu/vipbg/OpenMx2/docs//OpenMx/latest/Ordinal_Path.html#ordinal-data). By fixing the means and residual variances to 0 and 1, respectively, the intention was to coerce the latent continuous variables underlying the observed ordinal variables to similar scale for easier comparison. Am I making the correct assumptions here? (The continuous model was only intended as a intermediary step to see if it could help model fit, but I can see that fixing the means would be problematic in this case.) As an alternative to fixing the means, I noticed you fixed one threshold per variable instead. I presume this is necessary for identification, but how does that affect threshold estimation? This leads me to your fourth point;
4. I get the starting values are likely to be off. Again, I was just following a tutorial on the subject. Most tutorials I've seen just use seemingly arbitrary values, which is why I ran the model using `mxAutoStart` to help find better starting values. Could you recommend some good practices / rules-of-thumb for setting reasonable starting values?
5. Could you elaborate on what strategies would be good for identifying ordinal and binary variables?
6. I can see this would be appropriate. With regards to point 4, isn't this sort of what `mxAutoStart` does?
7. Unfortunately, I'm not familiar with `data.`-style syntax. Could you refer a tutorial, provide an example, or similar?
8. [Kline (2016)](https://books.google.no/books?id=Q61ECgAAQBAJ&lpg=PP1&ots=jFin3pz9sg&dq=kline%202016%20principles%20and%20practice%20structural%20equation%20modeling&lr&pg=PP1#v=onepage&q=kline%202016%20principles%20and%20practice%20structural%20equation%20modeling&f=false) states that *in a standardized solution where all variables have unit variance (1.0), standardized pattern coefficients for simple indicators (they depend on a single factor) are estimated Pearson correlations. In this case, squared standardized pattern coefficients are proportions of explained variance.* However, I gather from the context that this is for *effect indicators*. Due to opposite directionality, would the Pearson correlation for a formative indicator then be the path coefficient's inverse ($\frac{1}{\lambda}$)?
9. Thanks for the correction. I've been gathering information from multiple sources, and unfortunately, there is variable use of terminology between sources, and somewhat between SEM and CFA. Kline (2016) recommends the term *path coefficient*, so I can use this term from now on.

Replied on Fri, 04/29/2022 - 11:13
Picture of user. AdminNeale Joined: 03/01/2013

In reply to by pehkawn

Hi

For #4, starting values, I usually try to set them up so that i) expected means are about right; ii) expected variances are about right, maybe a bit bigger than sample variances; and iii) expected covariances are close to zero. The last of these is unlikely to be close to the observed covariances, but I still like this set of starting values. For one, it is unlikely that an extreme outlier is found.. Two, the expected covariance matrix is near diagonal, and thus strongly positive definite... Optimization gets into difficulties when the expected covariance matrix isn't positive definite. With zero covariances between variables, the likelihood is simply the product of the univariate likelihoods, which will normally be greater than zero and a good place for optimization to begin. Openmx has helper functions for this sort of thing, to inspect expected covariances/means you can use omxGetExpected(model, "covariance") for example, and eigen(omxGetExpected(model, "covariance")).