You are here

chi-square degrees of freedom

5 posts / 0 new
Last post
kgrimm's picture
Offline
Joined: 08/04/2009 - 15:59
chi-square degrees of freedom

> mxVersion()
OpenMx version: 2.0.0.3838
R version: R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32
Default optimiser: CSOLNP

I'm using SaturatedLikelihood= and IndependenceLikelihood= commands to obtain global fit indices and noticed that the degrees of freedom associated with the chi-square statistic is incorrect. The fit of the saturated model is:

observed statistics: 2221
estimated parameters: 35
degrees of freedom: 2186
-2 log likelihood: 15732.61
number of observations: 933

and the fit of the fitted model is:

observed statistics: 2221
estimated parameters: 3
degrees of freedom: 2218
-2 log likelihood: 17491.9
saturated -2 log likelihood: 15732.61
number of observations: 933
chi-square: X2 ( df=752 ) = 1759.291, p = 1.751331e-82

The degrees of freedom for the chi-square statistic should be difference in the degrees of freedom for the two model (2218-2186=32), but the reported df for the chi-square is 752 and this leads to an incorrect RMSEA.

mhunter's picture
Offline
Joined: 07/31/2009 - 15:26
Model or number?

Well, that's no good! The Chi-squared DoF are computed as DoF-satDoF, so it seems that the satDoF are not being processed correctly. When using SaturatedLikelihood=, are you giving it the saturated model or a number (the -2 log likelihood of the saturated model)?

When giving it the model, summary() should extract the degrees of freedom from that model. When using the number, summary uses your model to figure out what the DoF for the saturated model should be. One of these may be broken, based on your feedback. Are there any definition variables in your model?

A quick fix would be to also give summary the saturated degrees of freedom with the argument SaturatedDoF=. So you'd have something like

summary(yourModelRun, SaturatedLikelihood=, SaturatedDoF=, IndependenceLikelihood=)

The mxRefModels() function handles this better, but only works for some models (just about anything unless it has definition variables, but see its help page).

ref <- mxRefModels(yourModelRun, run=TRUE)
summary(yourModelRun, refModels=ref)

Hopefully, something in this helps!

Cheers,
Mike Hunter

kgrimm's picture
Offline
Joined: 08/04/2009 - 15:59
Model

Thanks Mike. I'm giving the model. First, I specify the saturated model -

sat.math.omx <- mxModel("Saturated Model",
type="RAM", mxData(observed=nlsy_math_wide, type="raw" ),
manifestVars=c("math2","math3","math4","math5","math6","math7","math8"),

variance and covariance paths

mxPath(from=c("math2","math3","math4","math5","math6","math7","math8"),
arrows=2, free=TRUE, connect='unique.pairs',
values = c(80, 0, 0, 0, 0, 0, 0,
80, 0, 0, 0, 0, 0,
80, 0, 0, 0, 0,
80, 0, 0, 0,
80, 0, 0,
80, 0,
80)),

means and intercepts

mxPath(from="one", to=c("math2","math3","math4","math5","math6","math7","math8"),
arrows=1, free=TRUE, values=40)

) # close model

sat.math.fit = mxRun(sat.math.omx)

and then I'm running my model (ng.math.fit) and then specifying

summary(ng.math.fit, SaturatedLikelihood=sat.math.fit)

I tried mxRefModels, but it's having trouble (potentially due to the large amounts of incomplete data). Using SaturatedLikelihood= and SaturatedDoF= works fine. If I specify the SaturatedLikelihood, but don't specify SaturatedDoF, then it reports incorrect degrees of freedom.

So, it seems like it's picking up the wrong information.

Thanks,
Kevin

mhunter's picture
Offline
Joined: 07/31/2009 - 15:26
Bug

Thanks for the further information! I have a hunch about the problem. I think we're using all the variables in your data to calculate the saturated DoF. What does the following give you?

ncol(nlsy_math_wide)

I'm betting it's 39. How did I know?!? There are two bugs that I found. First, when summary is given SaturatedLikelihood=SomeMxModel, OpenMx does not grab the degrees of freedom from that model. I think it should, so I'm calling that a bug. Second, the calculation of Saturated DoF (when it is not given as an argument to summary or part of the refModels argument) is incorrect when only some of the variables in the data are used. For example, if you model 4 continuous (non-ordinal) variables but have 5 in your mxData object, then the saturated DoF are calculated as

n <- 5 #number of variables in data
m <- 4 #number of used variables in data
n*(n-1)/2 + 2*m

This is wrong. It should be m(m-1)/2 + 2m. Given your observed statistics (2221) and discovery of this bug, I was able to calculate

DoF <- 2221 - 3 #obs stats minus estimated params
n <- 39 #number of variables that must be in data
m <- 7 #number of variables used in model
satDoF <- 2221 - (n*(n-1)/2 + 2*m)
DoF-satDoF #Reported incorrectly as 752 by OpenMx.  Should be 32

Thanks for pointing this out! I'll patch in a fix soon, and it will be a part of the next release of OpenMx.

Cheers,
Mike Hunter

kgrimm's picture
Offline
Joined: 08/04/2009 - 15:59
Correct on all accounts

Hi Mike. I input a reduced dataset with only the variables used in the model and everything came out correctly. Thanks for taking a look at this, finding the solution, and patching in a fix.
Kevin