Attachment | Size |
---|---|
Saturated_Beta_OpenMx.R | 15.31 KB |
Hi,
I am trying to fit a saturated twin model to 9 ordinal variables (representing age at observation), in which participants have either 1, 2 or 3 observations each. Thus, there is quite a bit of missing data.
When running the model I get the following error: Sat5.fitfunction is not finite.
Here is info about version and platform:
OpenMx version: 2.0.1.4157
R version: R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32
Default optimiser: NPSOL
Does this mean that there could be a problem due to too many NA’s in the data? I expected FIML to take care of this. I saw another thread #3940 about the same issue, but couldn’t really cut out what was causing the error.
I tried the CSOLNP optimizer and mxTryHard() with the same result. Could it be bad starting values that’s causing the error in my case?
When running it in OpenMx1.4-3060 I get the following error message: Objective function returned a value of NaN at iteration 23.63.
I have attached the script in case that helps to clarify anything.
Many thanks,
Karoline
Let me make sure I understand. You have 9 ordinal variables, each of which, according to your script, has 7 levels (6 thresholds). Each participant has non-missing data for no fewer than 1 and no more than 3 of the 9 variables. Finally, participants are clustered in twin pairs. Is this right?
If I understand correctly, then yes, I would not be at all surprised if your problems are due to lots of missing data. The model may be empirically unidentified if, for example, there is not a single person who has non-missing data for both variables #5 and #6, since the data would therefore contain no information whatsoever about the polychoric correlation between those two variables. The data might also be too sparse to adequately estimate 7*9 = 63 thresholds per zygosity-sex group. Basically, my suspicion is that this fully saturated model is trying to estimate more parameters than is realistic for the given dataset. How big is the sample and the zygosity-sex groups?
There are a few things you could try:
Edit: That should be 6*9=54 thresholds per zygosity-sex group.
Hi Rob and Mike,
Thanks for the great replies. I have now tried several things.
After this procedure the model did run, although with code red. But I thought I'd try to see if different starting values would help. Or try the Cholesky approach.
Thanks,
Karoline
Hi Karoline
I agree with Rob that using a Stand matrix type would be better. I'd typically start correlations at zero not .8 because that is further from being non-positive definite. This is likely the issue with the model going to places where the likelihood is not a positive number. It may prove difficult to avoid this altogether with direct estimation of the correlations as you have it, because it doesn't take much for them to get 'out of sync' with each other (if A and B correlate .99, A and C correlate .8 but B and C correlate -.5 it would not be consistent).
One way to avoid this is to use the Cholesky decomposition (this was the original purpose of the Cholesky). You could consider fixing the first two thresholds for every variable at zero and 1, estimating means, and estimating the covariance matrix as L %% t(L) where L is a lower triangular matrix. The results you'd get would not be standardized, but they could easily be standardized after the fact (or even in the script) as stnd(L %% t(L)). Given reasonable starting values, the model should run without trouble.
A disadvantage is that standard errors would be of the cholesky parameters, not the correlations. Two options might be viable to overcome this. One would be to request confidence intervals (but this might take a long time to run). Another might be to use the correlation estimates as starting values for the standardized model you first thought of. This is easier than it seems, since they could be calculated after the first step and plugged in directly in the second. A third would be to get some very clever person like Rob to figure out what the SE's of a correlation matrix should be, given the SE's on its unstandardized Cholesky.
Hi Karoline
I agree with Rob that using a Stand matrix type would be better. I'd typically start correlations at zero not .8 because that is further from being non-positive definite. This is likely the issue with the model going to places where the likelihood is not a positive number. It may prove difficult to avoid this altogether with direct estimation of the correlations as you have it, because it doesn't take much for them to get 'out of sync' with each other (if A and B correlate .99, A and C correlate .8 but B and C correlate -.5 it would not be consistent).
One way to avoid this is to use the Cholesky decomposition (this was the original purpose of the Cholesky). You could consider fixing the first two thresholds for every variable at zero and 1, estimating means, and estimating the covariance matrix as L %% t(L) where L is a lower triangular matrix. The results you'd get would not be standardized, but they could easily be standardized after the fact (or even in the script) as stnd(L %% t(L)). Given reasonable starting values, the model should run without trouble.
A disadvantage is that standard errors would be of the cholesky parameters, not the correlations. Two options might be viable to overcome this. One would be to request confidence intervals (but this might take a long time to run). Another might be to use the correlation estimates as starting values for the standardized model you first thought of. This is easier than it seems, since they could be calculated after the first step and plugged in directly in the second. A third would be to get some very clever person like Rob to figure out what the SE's of a correlation matrix should be, given the SE's on its unstandardized Cholesky.