You are here

Interpretation of chi-square goodness-of-fit

12 posts / 0 new
Last post
Karin's picture
Offline
Joined: 02/11/2011 - 07:37
Interpretation of chi-square goodness-of-fit

I am fitting ACE and ADE families of models and comparing them to the fully saturated model. I am getting results like this:

observed statistics: 2965
estimated parameters: 4
degrees of freedom: 2961
-2 log likelihood: 848.4525
saturated -2 log likelihood: 844.858
number of observations: 2266
chi-square: 3.594549
p: 1

I am a bit confused by the interpretation of the chi-squared statistic and its degrees of freedom. Before, I have seen output from classic Mx where the df for the chi-square statistic would be three for an ACE model, because it had six observed statistics (i.e. four variances and two covariances from the MZ and DZ covariance matrices) and three estimated parameters (not estimating means).

Here, the df are given as 2961 and all of the p-values for the different models round to 1, because of the very high df. This makes the chi-squared test virtually useless for assessing model fit. Am I missing something here? Is there anything else I could do?

Thankyou

Karin

neale's picture
Offline
Joined: 07/31/2009 - 15:14
Probably the wrong df

So the df for the chi-squared should really be the difference between the number of parameters of the saturated model and the number of parameters of the fitted model. If OpenMx is not using that number, then the p-value is incorrect.

mxCompare() might be used to check this out. I suspect that you have 8 parameters in the saturated models (2 x 3 covariance parameters and 2 x 1 mean parameters), so the df for the reported chi-sq should be 4. Indeed it looks like the value of 1 is wrong and this is a bug.

Karin's picture
Offline
Joined: 02/11/2011 - 07:37
mxCompare works

Thanks for the recommendation for mxCompare. That seems to work:

     base comparison ep minus2LL   df       AIC   diffLL diffdf         p

1 univTwinSat 10 844.8580 2955 -5065.142 NA NA NA
2 univTwinSat univACE 4 848.4525 2961 -5073.547 3.594549 6 0.7313508

Looks like my df is 6....

Thankyou

Karin

CharlesD's picture
Offline
Joined: 04/30/2013 - 11:05
As far as I can see this is

As far as I can see this is still a problem with the summary() output...

mhunter's picture
Offline
Joined: 07/31/2009 - 15:26
Fixed in revision 3536

Hi Charles,

Thanks for pointing this out. The degrees of freedom reporting was a bit confusing. The degrees of freedom reported in summary() are basically "number of observed statistics" minus "number of estimated parameters". These are not the same as the degrees of freedom for the Chi-square test, which are "number of estimated parameters in the saturated model" minus "number of estimated parameters in the fitted model", or equivalently "model degrees of freedom" minus "saturated degrees of freedom". In future versions of OpenMx, summary() will report the degrees of freedom used for the Chi-square test along side the Chi-square value to avoid this confusion.

CharlesD's picture
Offline
Joined: 04/30/2013 - 11:05
Great, looks better... but I

Great, looks better... but I think number of observed statistics is also coming up wrong, which is throwing things off. I was just running a model for 30 subjects with 2 variables at each time point and 6 time points, and observed statistics is outputting as 360... when as far as I can see this should be (6*2)^2 + 6 ? Am I thinking about this wrong?

AdminNeale's picture
Offline
Joined: 03/01/2013 - 14:09
Not wrong for summary statistics

Hi Charles

You're not thinking about it wrong, so much as OpenMx is thinking about it differently. With raw data, OpenMx counts every observation in the dataset as a statistic. The rationale for this is that it is possible to add numerous individual-level parameters to a model, moderators of paths for example, which would drive the degrees of freedom negative, despite there being plenty of information to estimate the parameters of the model. Note also that if all the observations in a column (i.e., for a variable) were missing, the traditional way of computing df would also be incorrect. Yet the model being fitted might again be perfectly reasonable to fit.

Note also that there are models other than those for means and covariances that can be specified with OpenMx, and the metric of degrees of freedom in terms of the number of observed data points is a good one to hold constant across covariance structure and other types of statistical model, since this will always be "sensical".

CharlesD's picture
Offline
Joined: 04/30/2013 - 11:05
Ok, sure. Then this is fine,

Ok, sure. Then this is fine, but the chi sq df are not. I've been adding a $SaturatedLikelihood value to the fit myself, but I can't see where the df of 149 that I get for the chi sq test are taken from? Can I set this somehow? Everything comes out fine with mxCompare...

mhunter's picture
Offline
Joined: 07/31/2009 - 15:26
Use mxCompare

mxCompare is the preferred way to compare models, even in the null hypothesis significance testing way in which you compare the saturated model to your candidate model as is done by the Chi-Squared test for goodness of fit.

If you want summary to work for this purpose, first check out the help page. Unfortunately, this help page is too hard to get to. Second, give summary your saturated model. Alternatively, give summary the saturated likelihood and the saturated degrees of freedom. In code this looks like

?mxSummary
YourFittedModel # mxModel that has been run
YourSatModel # saturated mxModel that has been run
summary(YourFittedModel, SaturatedLikelihood=YourSatModel)
# Proper Chi-Square results as of svn revision 3536
 
# Alternative
SatLike # single number, saturated likelihood
SatDoF# single number, saturated degrees of freedom (number of observed stats minus number of estimated params)
summary(YourFittedModel, SaturatedLikelihood=SatLike, SaturatedDoF=SatDoF)
# Proper Chi-Square results whenever your saturated likelihood and degrees of freedom are correct

HTH!

CharlesD's picture
Offline
Joined: 04/30/2013 - 11:05
Thanks. Yes, the summary

Thanks. Yes, the summary help is troublesome, I was just reading the code to try understanding... There is a slot in the fit object that works for SaturatedLikelihood, but when I tried SaturatedDoF this didn't seem to help. It would be convenient if I could add such a parameter to the fit object, is this possible and I'm just doing it wrong, or not implemented? Not a big deal, just a small nicety :)

AdminHunter's picture
Offline
Joined: 03/01/2013 - 11:03
In general we don't support

In general we don't support or recommend modifying the slots of OpenMx objects directly. Sometimes it is possible, but not other times and in either case it is liable to be broken in the future. Sorry!

CharlesD's picture
Offline
Joined: 04/30/2013 - 11:05
This makes sense :) cheers

This makes sense :) cheers