fit statistics for multigroup models?

Posted on Sat, 09/19/2009 - 17:30

tbates Joined: 07/31/2009

Forums

Hi,
Is there any guidance on how people should be getting the fit of their models, esp multi group models?
running an ACE type model, the summary doesn't calculate any fit statistics, nor does it seem to know how many observations were being made etc so df is just -est parameters etc.

summary(fit)
         name      matrix row col parameter estimate error estimate
1             all.a_c   1   1       2.384409e-01     0.22324048
2             all.a_c   2   1       4.199091e-01     0.21043198
3             all.a_c   3   1       2.366656e-01     0.17050014
4             all.c_c   1   1       4.794349e-01     0.16473834
5             all.c_c   2   1       1.533741e-01     0.14155519
6             all.c_c   3   1       1.806579e-01     0.13491689
7             all.e_c   1   1       1.838531e-01     0.08764916
8             all.e_c   2   1       3.122362e-01     0.12072371
9             all.e_c   3   1       4.914389e-01     0.15620607
10              all.a   1   1       4.452644e-01     0.20181557
11              all.a   2   2       3.546483e-01     0.19978474
12              all.a   3   3       4.331486e-01     0.07694248
13              all.c   1   1       1.310573e-05     1.85878064
14              all.c   2   2      -1.006976e-06     0.56912795
15              all.c   3   3      -2.730591e-06     0.56363373
16              all.e   1   1       5.441768e-01     0.05386086
17              all.e   2   2       4.934105e-01     0.07569937
18              all.e   3   3       4.127164e-01     0.17458587
19 Trait1mean all.expMean   1   1       2.497818e+00     0.07429617
20 Trait2mean all.expMean   1   2       2.976677e+00     0.06878513
21 Trait3mean all.expMean   1   3       2.082244e+00     0.05968714

Observed statistics:  0 
Estimated parameters:  21 
Degrees of freedom:  -21 
-2 log likelihood:  8259.418 
Saturated -2 log likelihood:  
Chi-Square:  
p:  
AIC (Mx):  
BIC (Mx):  
adjusted BIC: 
RMSEA:

The submodels don't know this either:
> summary(fit@submodels$MZ)
Observed statistics:
Estimated parameters:
Degrees of freedom:
-2 log likelihood:
Saturated -2 log likelihood:
Chi-Square:
p:
AIC (Mx):
BIC (Mx):
adjusted BIC:
RMSEA:

Replied on Thu, 09/24/2009 - 12:39

tbates Joined: Jul 31, 2009

bump... please :-)

Log in or register to post comments

Replied on Thu, 09/24/2009 - 12:47

mspiegel Joined: Jul 31, 2009

OK by the end of the day I

OK by the end of the day I should have finished square-bracket substitution and then I can take a look at this issue. Some questions that need to be answered: let's say I have two submodels that use a FIML objective function, and then a top model that uses an MxAlgebra objective function. Do I compute fit statistics for the submodels? Do I compute fit statistics for the top model? How do I compute fit statistics for the top model with an arbitrary algebra as the objective function (pretend you don't know it's a "+")?

Log in or register to post comments

Replied on Fri, 09/25/2009 - 08:06

tbates Joined: Jul 31, 2009

warning on: I am not a great

warning on: I am not a great person to ask here...

> Do I compute fit statistics for the sub-models?

That would be helpful for people wanting to see straightforwardly which parts of the supermodel were contributing to bad fit.

> Do I compute fit statistics for the top model?
That's the goal.

> How do I compute fit when the top model has an arbitrary algebra as the objective?

I think in the first instance it would be fine to assume the user knew what they were doing when specifying the objective, so its likelihood is correctly scaled.

Log in or register to post comments

Replied on Tue, 10/06/2009 - 11:08

tbates Joined: Jul 31, 2009

A concrete example using the

A concrete example using the openmx script:
trunk/models/passing/univACEP.R
and its mx 1 counterpart
trunk/models/passing/mx-scripts/univACE.mx

mx 1.x allows the user to pass in a -2LL and df from a saturated model, and reports the following:

Your model has    4 estimated parameters and   1777 Observed statistics
  
 -2 times log-likelihood of data >>>  4067.663
 Degrees of freedom >>>>>>>>>>>>>>>>      1773
  
 Saturated model fit* >>>>>>>>>>>  4055.935
 Saturated model df*  >>>>>>>>>>>      1767
 Difference Chi-squared  >>>>>>>>    11.728
 Difference d.f.  >>>>>>>>>>>>>>>         6
 Probability >>>>>>>>>>>>>>>>>>>>      .068
 Akaike's Information Criterion >     -.272
 * Saturated model statistic supplied by user

OpenMx reports
Observed statistics:  0 
Estimated parameters:  4 
Degrees of freedom:  -4 
-2 log likelihood:  4067.663 
Saturated -2 log likelihood:  
Chi-Square:  
p:  
AIC (Mx):  
BIC (Mx):  
adjusted BIC: 
RMSEA: 
Would be good to get the Observed statistics right, which would flow through to DF

Log in or register to post comments

Replied on Fri, 12/11/2009 - 05:58

irebollo Joined: Sep 24, 2009

Hi, I agree, at least we

Hi,
I agree, at least we should get the number of observed statistics, and the df right. It should also be possible to get those independently out of the summary using, eg mxEval, so that one can estimate Chi square and p values using R commands.

Log in or register to post comments

Replied on Wed, 01/06/2010 - 18:46

mspiegel Joined: Jul 31, 2009

Ah. Tim Bates' example was

Ah. Tim Bates' example was very helpful, along with the input from Mike Neale. There is some partial support for multigroup models checked into the subversion repository. Run

summary(twinACEFit, SaturatedLikelihood=4055.935)

to view the new output. Several questions remain:

How to calculate the number of observations? The current approach is to sum up the number of observations across all data sets. If two data sets are identical, I should probably not count them twice (TODO). However if two data sets contain some identical columns and some non-identical columns, what to do? The number of observations may be manually specified using a 'numObs=' argument to the summary function. Note that this is a harder problem then calculating degrees of freedom (where each column can be checked independently). For calculating degrees of freedom I used Mike Neale's suggestions.
How to compute the F value? The current approach is to use '-2 log-likelihood' if all datasets are raw, or 'chi' if all datasets are covariance matrices, or 'NA' otherwise.
How do I use the saturated model degrees of freedom?
How do I correct the probability and AIC calculations?

Log in or register to post comments

Replied on Mon, 01/04/2010 - 09:49

neale Joined: Jul 31, 2009

I think it is reasonable to

I think it is reasonable to compute the total number of statistics being used. What is tricky is the mixture distribution case, in which the same data are used multiple times (and mentioned in each of the components of the mixture). So probably it is necessary to check that each dataset is not the same as a previous one.

How to do this check cleanly is not clear. It was pretty simple in Mx1 because the components of a mixture were always specified in one data group, which had one dataset attached to it. In OpenMx things are a good deal more flexible; the same or different datasets could be applied to different components of a mixture. Ordinarily, it would not be a mixture distribution if different datasets are being applied. Thus mixture distributions could in principle be used in a different way in OpenMx - whether this is a good or bad thing is open to question. So, it is not sufficient to just examine the dataframe and variables within it that are being used for data for a particular model. In principle, the dataframe could be named differently and could have variables with different names, yet be exactly the same data. So I would recommend some form of is.samebloodything(dataframe1,dataframe2) function, which would test if the datasets are same by dataframe name and variables. If these are different we can then perform the more costly check that they are physically identical (same values down each column). Note, however that this check is still not sufficient, because the columns could be reordered from one frame to the next. So a loop over all columns in dataframe1 needs to be compared to all columns in dataframe2. In the event that there is a partial match (say column 2 in dataframe1 is the same as column3 in dataframe2 but otherwise everything is unique) then this number of statistics should not be added to the total count. Phew, quite expensive at times, this additional flexibility... Luckily such tests only have to be carried out once for each model.

Log in or register to post comments

Replied on Mon, 01/04/2010 - 09:57

tbates Joined: Jul 31, 2009

might be worth just making

might be worth just making this a limitation: if you are doing mixtures, and you want the fit computed correctly, don't shuffle your data columns, rows, or dataframe names when you are using the same data :-)

Log in or register to post comments

Log in or register to post comments

News

Recent Posts

fit statistics for multigroup models?

bump... please :-)

OK by the end of the day I

warning on: I am not a great

A concrete example using the

Hi, I agree, at least we

Ah. Tim Bates' example was

I think it is reasonable to

might be worth just making

News

Recent Posts