You are here

Mis-specification and model fit interpretation of univariate ACE

4 posts / 0 new
Last post
quino's picture
Joined: 03/12/2020 - 08:02
Mis-specification and model fit interpretation of univariate ACE

Dear all,

I am new to statistical modelling for genetic analysis and after conducting a round of univariate analysis (prior to a future multivariate one) using the umxACE function I have a few questions I would like to ask for help with.

I have a data set consisting of 20 variables obtained from a neuroimaging study with a not very large -for twins studies standards- sample size (MZ = 132 pairs, DZ=72 pairs). I regressed out sex, age and other relevant variables and following methodological literature (ADE models for traits with a MZ correlation more then twice the same correlation for DZ), I estimated all possible univariate models (ACE,CE, AE, E, same for ADE). Since I was asked to report fit measures besides AIC and -2LL, I used the mxRefModels function and only for SOME variables I got the warning:

In computeFitStatistics(likelihood, DoF, chi, chiDoF, retval[["numObs"]], :
Your model may be mis-specified (and fit worse than an independence model), or you may be using the wrong independence model

I had a similar problem before and fixed it by changing the optimizer but this time the problem persists. Furthermore, why does it only appear for some and not all variables? I'd be glad if I could get some input on how to solve this.

In addition, for some of the models another message appears:

In model 'CE' Optimizer returned a non-zero status code 5. The Hessian at the solution does not appear to be convex. See ?mxCheckIdentification for possible diagnosis (Mx status RED)

Once again, I dont understand why such a problem occurs for only a small set of variables and models. I'd appreciate help on this regard.

Finally, despite the previous warnings, I ran all models (results are shown in the attache image). Supposing that I'd like to choose the right model for a publication, how should this be done? I have seen that some people choose the model with the smallest AIC value whereas others select the one that has a smaller AIC value AND is not significantly different (p>0.05 AIC different test) from the original model. In other words, what is the selection criteria and what fit measures should be reported?

I should also mention that the original measures were in some cases very noisy, leading to very low and in some cases negative correlations between siblings.

Thank you very much in advance.


tbates's picture
Joined: 07/31/2009 - 14:25
mxRefModels doesn't understand twin models

mxRefModels doesn't understand twin models (see text from the help below).

IMHO, the appropriate saturated model for twin models is the umxACEv() model.

You might get some of what you(r reviewer) wants from umxFitIndices() which got a nice upgrade from @BrentonWiernik and contains dozens of fit indices.

One potentially important limitation of the mxRefModels function is for behavior-genetic models. If variables 'x', 'y', and 'z' are measured on twins 1 and 2 creating the modeled variables 'x1', 'y1', 'z1', 'x2', 'y2', 'z2', then this function may not create the intended saturated or independence models. In particular, the means of 'x1' and 'x2' are estimated separately. Similarly, the covariance of 'x1' with 'y1' and 'x2' with 'y2' are allowed be be distinct: cov(x1, y1) != cov{x2, y2}. Moreover, the cross-twin covariances are estimated: e.g. cov(x1, y2) != 0.

AdminNeale's picture
Joined: 03/01/2013 - 14:09
Saturated models

I disagree about using umxACEv as a saturated model for the purpose of evaluating fit. This model does help to understand how well an ACE-type model can possibly fit, but that is different from a model where every variance, covariance and mean has its own free parameter. The value of this model is that the degrees of freedom associated with duplicate statistics - the variances of T1, T2 for MZ and DZ are all predicted to be the same. So are the means. In principle, one should only see failure to equate these variances and covariances at a nominal alpha level. A large decrease in the likelihood of this model informs us about the data, which is a Good Thing. Usually such failures arise because the multivariate normality assumption is wrong. For continuous variables it is relatively easy to find the source of the problem. It can be tested for ordinal data (using the multinomial where every cell frequency but one has its own estimated parameter).

However, if there are definition variables, the equivalent of a saturated model can become degenerate (more parameters than statistics) although some limited cases clearly have solutions. IMO, work on a saturated model for models with definition variables would make a good thesis topic for someone.

quino's picture
Joined: 03/12/2020 - 08:02

Thank you both for your helpful insights. I am working with continuous variables and interestingly, the warning appears only for a few variables out of the 20 group. I would say that for those with the lowest intra MZ and intra DZ correlations and highest incomplete pair information given that some measurements were removed during outlier analysis. Based on your feedback I am trying to use the umxFitIndices function. However, since it has not been fully tested on non-RAM models, I am encountering several difficulties, mostly due to differences in the model-S4-object structure and the fact that within it, there are 2 data frames (one for MZ and another one for DZ). While trying to adjust the function to work with ACE models I realized that for some calculations it makes use of the mxRefModels function (according to your answer @tbates not a good idea for ACE models), yielding similar results. In fact, the resulting Chi estimation is the same as with mxRefModels. How should I approach this situation? also related: putting aside the indices that I have been asked to provide, wouldnt it be sufficient to work only with AIC and -2LL as fit estimates and keep paths that dont worsen the AIC fit significantly?

Thank you very much,