Using Correlation Matrices v. Raw Data as Input

Attachment	Size
estimates.txt [6]	1.71 KB

While it's relevant for both the Behavioral Genetics and OpenMX General Help forums, the overall question is rather general so I figured it should be posted here.

I'm currently working on fitting a multivariate twin model with ordinal data using OpenMX. The idea is eventually to look at common and independent pathways for a model with 8 ordinal measures. Before scaling up to this, I was initially working with just 5 of the variables. When trying to fit a ACE model, it took roughly 10 minutes to run and gave me a code RED as output. I then ran Ryne's start value loop (THANK YOU) with a few modifications to see if I could either clear the code red or at least find a best model over the 50 iterations - every run resulted in a code red and the -2ll varied across all runs.

I scaled it back again, to 3 variables, to find similar issues. Running time wasn't as long, but I was still getting code red. This time, I have 8 of the 50 starting value runs coming up error-free or code GREEN with the same answer, so I can at least feel confident about the estimates I'm getting from that best model.

My coworker was asking if I could run using the polychoric correlation matrix instead of raw data to combat some of these issues. I generated MZ and DZ correlation matrices using hetcor in the polychor package. You can see how they compare to the correlation matrices from the Saturated model:

MZ Correlation from Saturated Model
var1A var2A var3A var1B var2B var3B
1A 1.000 0.256 0.284 0.626 0.051 0.343
2A 0.256 1.000 0.448 0.107 0.418 0.383
3A 0.284 0.448 1.000 0.079 0.287 0.721
1B 0.626 0.107 0.079 1.000 0.285 0.326
2B 0.051 0.418 0.287 0.285 1.000 0.543
3B 0.343 0.383 0.721 0.326 0.543 1.000

MZ Correlation from hetcor
var1A var2A var3A var1B var2B var3B
1A 1.000 0.279 0.374 0.627 0.075 0.380
2A 0.279 1.000 0.436 0.101 0.421 0.356
3A 0.374 0.436 1.000 0.143 0.265 0.733
1B 0.627 0.101 0.143 1.000 0.264 0.325
2B 0.075 0.421 0.265 0.264 1.000 0.549
3B 0.380 0.356 0.733 0.325 0.549 1.000

Running with the correlation matrices took a matter of seconds and didn't give an error code. (Estimates comparing the "best" model after a series of 50 start value runs to those from the correlation matrix run are attached. There are differences of >.1 in some estimates...)

Finally, my question: Is it valid to use the correlation matrix like this even though we have the raw data? I'm a bit concerned given the difference in estimates, but I know a lot of people fit structural equation models using correlation or covariance matrices. Any insight would be appreciated.

We've already tried similar with just 4 variables and I couldn't get rid of the code red or get agreement in our -2ll from the start value loop. I see this problem getting much much worse as we add in variables, so it would be lovely if we could spend a fraction of the time running correlation matrices instead.