Hi,

Is there any guidance on how people should be getting the fit of their models, esp multi group models?

running an ACE type model, the summary doesn't calculate any fit statistics, nor does it seem to know how many observations were being made etc so df is just -est parameters etc.

summary(fit) name matrix row col parameter estimate error estimate 1 all.a_c 1 1 2.384409e-01 0.22324048 2 all.a_c 2 1 4.199091e-01 0.21043198 3 all.a_c 3 1 2.366656e-01 0.17050014 4 all.c_c 1 1 4.794349e-01 0.16473834 5 all.c_c 2 1 1.533741e-01 0.14155519 6 all.c_c 3 1 1.806579e-01 0.13491689 7 all.e_c 1 1 1.838531e-01 0.08764916 8 all.e_c 2 1 3.122362e-01 0.12072371 9 all.e_c 3 1 4.914389e-01 0.15620607 10 all.a 1 1 4.452644e-01 0.20181557 11 all.a 2 2 3.546483e-01 0.19978474 12 all.a 3 3 4.331486e-01 0.07694248 13 all.c 1 1 1.310573e-05 1.85878064 14 all.c 2 2 -1.006976e-06 0.56912795 15 all.c 3 3 -2.730591e-06 0.56363373 16 all.e 1 1 5.441768e-01 0.05386086 17 all.e 2 2 4.934105e-01 0.07569937 18 all.e 3 3 4.127164e-01 0.17458587 19 Trait1mean all.expMean 1 1 2.497818e+00 0.07429617 20 Trait2mean all.expMean 1 2 2.976677e+00 0.06878513 21 Trait3mean all.expMean 1 3 2.082244e+00 0.05968714 Observed statistics: 0 Estimated parameters: 21 Degrees of freedom: -21 -2 log likelihood: 8259.418 Saturated -2 log likelihood: Chi-Square: p: AIC (Mx): BIC (Mx): adjusted BIC: RMSEA:

The submodels don't know this either:

> summary(fit@submodels$MZ)

Observed statistics:

Estimated parameters:

Degrees of freedom:

-2 log likelihood:

Saturated -2 log likelihood:

Chi-Square:

p:

AIC (Mx):

BIC (Mx):

adjusted BIC:

RMSEA:

bump... please :-)

OK by the end of the day I should have finished square-bracket substitution and then I can take a look at this issue. Some questions that need to be answered: let's say I have two submodels that use a FIML objective function, and then a top model that uses an MxAlgebra objective function. Do I compute fit statistics for the submodels? Do I compute fit statistics for the top model? How do I compute fit statistics for the top model with an arbitrary algebra as the objective function (pretend you don't know it's a "+")?

warning on: I am not a great person to ask here...

> Do I compute fit statistics for the sub-models?

That would be helpful for people wanting to see straightforwardly which parts of the supermodel were contributing to bad fit.

> Do I compute fit statistics for the top model?

That's the goal.

> How do I compute fit when the top model has an arbitrary algebra as the objective?

I think in the first instance it would be fine to assume the user knew what they were doing when specifying the objective, so its likelihood is correctly scaled.

A concrete example using the openmx script:

trunk/models/passing/univACEP.R

and its mx 1 counterpart

trunk/models/passing/mx-scripts/univACE.mx

mx 1.x allows the user to pass in a -2LL and df from a saturated model, and reports the following:

<

pre> Your model has 4 estimated parameters and 1777 Observed statistics

-2 times log-likelihood of data >>> 4067.663 Degrees of freedom >>>>>>>>>>>>>>>> 1773

Saturated model fit* >>>>>>>>>>> 4055.935 Saturated model df* >>>>>>>>>>> 1767 Difference Chi-squared >>>>>>>> 11.728 Difference d.f. >>>>>>>>>>>>>>> 6 Probability >>>>>>>>>>>>>>>>>>>> .068 Akaike's Information Criterion > -.272 * Saturated model statistic supplied by user

<

pre>

OpenMx reports

Would be good to get the Observed statistics right, which would flow through to DF

Hi,

I agree, at least we should get the number of observed statistics, and the df right. It should also be possible to get those independently out of the summary using, eg mxEval, so that one can estimate Chi square and p values using R commands.

Ah. Tim Bates' example was very helpful, along with the input from Mike Neale. There is some partial support for multigroup models checked into the subversion repository. Run

to view the new output. Several questions remain:

<

ul>

<

ul>

I think it is reasonable to compute the total number of statistics being used. What is tricky is the mixture distribution case, in which the same data are used multiple times (and mentioned in each of the components of the mixture). So probably it is necessary to check that each dataset is not the same as a previous one.

How to do this check cleanly is not clear. It was pretty simple in Mx1 because the components of a mixture were always specified in one data group, which had one dataset attached to it. In OpenMx things are a good deal more flexible; the same or different datasets could be applied to different components of a mixture. Ordinarily, it would not be a mixture distribution if different datasets are being applied. Thus mixture distributions could in principle be used in a different way in OpenMx - whether this is a good or bad thing is open to question. So, it is not sufficient to just examine the dataframe and variables within it that are being used for data for a particular model. In principle, the dataframe could be named differently and could have variables with different names, yet be exactly the same data. So I would recommend some form of is.samebloodything(dataframe1,dataframe2) function, which would test if the datasets are same by dataframe name and variables. If these are different we can then perform the more costly check that they are physically identical (same values down each column). Note, however that this check is still not sufficient, because the columns could be reordered from one frame to the next. So a loop over all columns in dataframe1 needs to be compared to all columns in dataframe2. In the event that there is a partial match (say column 2 in dataframe1 is the same as column3 in dataframe2 but otherwise everything is unique) then this number of statistics should not be added to the total count. Phew, quite expensive at times, this additional flexibility... Luckily such tests only have to be carried out once for each model.

might be worth just making this a limitation: if you are doing mixtures, and you want the fit computed correctly, don't shuffle your data columns, rows, or dataframe names when you are using the same data :-)