Should the degrees of freedom depend on how the data are input?
Here is the output for the raw data:
observed statistics: 120
estimated parameters: 9
degrees of freedom: 111
-2 log likelihood: 894.0595
saturated -2 log likelihood: NA
number of observations: 20
chi-square: NA
p: NA
AIC (Mx): 672.0595
BIC (Mx): 280.7666
adjusted BIC:
RMSEA: NA
timestamp: 2010-11-09 16:20:59
frontend time: 0.3233688 secs
backend time: 0.03640819 secs
independent submodels time: 8.201599e-05 secs
wall clock time: 0.359859 secs
cpu time: 0.359859 secs
openmx version number: 1.0.2-1497
Note that the chi-square and RMSEA are not estimated.
If instead I input the covariance matrix and means vector:
observed statistics: 27
estimated parameters: 9
degrees of freedom: 18
-2 log likelihood: 645.686
saturated -2 log likelihood: 574.1632
number of observations: 20
chi-square: 71.52271
p: 2.492757e-08
AIC (Mx): 35.52271
BIC (Mx): 8.799766
adjusted BIC:
RMSEA: 0.3855829
timestamp: 2010-11-09 16:23:32
frontend time: 0.1366770 secs
backend time: 0.01501584 secs
independent submodels time: 8.106232e-05 secs
wall clock time: 0.1517739 secs
cpu time: 0.1517739 secs
openmx version number: 1.0.2-1497
The parameter estimates are not exactly the same between the two runs but similar (not sure why, only difference is the input format) but the degrees of freedom are different (what I would expect when I input the covariance matrix) and the chi-square and RMSEA are computed.
What am I missing?
Yes, there are different
Degrees of freedom for both sets of data are defined as the number of observed statistics minus the number of free parameters (which is constant). With k variables and n rows, the observed statistics for covariance data are defined as (k*(k+1))/2, with an extra k statistics for the means. Observed statistics for raw data are found by adding up the number of non-missing observations for each variables, which will be n*k when there is no missing data. The degrees of freedom are different because the datasets are very different; the moment matrices are sufficient for any linear relationship between the variables, but there's a lot more you can do with the extra information in the raw data. This is a notable difference between OpenMx and other programs for SEM; the raw data df are consistent with a GLM approach, whereas other programs treat every model is a structural equation model.
You get more statistics with the moment matrices because some fit statistics (chi square and RMSEA) depend on comparison with a fully saturated model. With the covariance matrix and means vector, there's really only one version of the saturated model and it has an analytic solution. With raw data, one could define several versions of "saturated" models, including the SEM saturated model but also models that contain all possible parameters. People are free to specify whatever saturated model they want, estimate it as a new MxModel object and supply either its likelihood or the fitted MxModel to the summary function.
Log in or register to post comments
In reply to Yes, there are different by Ryne
I appreciate the in-depth
Log in or register to post comments
In reply to I appreciate the in-depth by rabil
"If I can get chi-square and
Great question. Despite the relative simplicity of each of the group models (where simplicity means "we can calculate your saturated model"), combining them makes the model too complex for OpenMx to make assumptions about your saturated model. The objective function for your multiple-group model is the mxAlgebraObjective, which depends on an MxAlgebra that sums the objectives of your individual groups. There are potential dependencies across groups and different datasets across groups. The shortish answer is that beyond the simple single-model case with moment-matrix data, there are options as to what the saturated model could be, so we don't assume one.
You can always supply your own saturated model for comparison. In a simple multiple-group, you'd do something kinda like this:
modelA <- mxModel("Sat1",
mxData(group1, "cov", group1mean, 60),
mxMatrix("Symm", 2, 2, TRUE, diag(2), name="cov1"),
mxMatrix("Full", 1, 2, TRUE, name="mean1"),
mxMLObjective("cov1", "mean1", dimnames=dimnames(group1)[[1]])
)
modelB <- mxModel("Sat2",
mxData(group2, "cov", group2mean, 60),
mxMatrix("Symm", 2, 2, TRUE, diag(2), name="cov2"),
mxMatrix("Full", 1, 2, TRUE, name="mean2"),
mxMLObjective("cov2", "mean2", dimnames=dimnames(group2)[[1]])
)
satModel <- mxModel("Saturated",
modelA, modelB,
mxAlgebra(Sat1.objective + Sat2.objective, name="obj"),
mxAlgebraObjective("obj")
)
satRes <- mxRun(satModel)
satRes now contains a saturated model that you can compare your two-group model to. If your fitted model is in an objected called 'multipleGroup', you can compare them like so:
summary(multipleGroup, SaturatedLikelihood=satRes)
Log in or register to post comments
In reply to Yes, there are different by Ryne
I can understand how the
Log in or register to post comments
In reply to I can understand how the by rabil
Typically, the number of
Conversely, one could imagine a raw dataset on two variables in which the data are:
X Y
.5 NA
.3 NA
NA .1
NA .6
In this case there is no information about the covariance between X and Y. True, using summary statistics would not make a lot of sense here (except perhaps as two means and two variances).
Nevertheless, I think you are advocating for counting the number of statistics based on the number of means and covariances, and for certain purposes I agree that this would be helpful - putting goodness of fit statistics on the same metric. Doing so would, however, make it possible to specify a fully identified model in which the number of degrees of freedom would be negative. This also has its problems.
Log in or register to post comments