# Computing and Reporting fit statistics

18 posts / 0 new
Offline
Joined: 07/31/2009 - 14:25
Computing and Reporting fit statistics

A place to discuss models current and desired og fit, df, AIC, etc and how to make basic functionality happen

Offline
Joined: 07/31/2009 - 14:25
Several models do not report

Several models do not report fit statistics. What should be happening here? Do we need to attach them here the summary function can be improved, or should summary report the problems it found that stopped it computing the fit (couldn't be sure of df etc.) and then users can manually set these parameters in the model (df=xxx, ) to allow the existing summary to work?

Offline
Joined: 07/31/2009 - 14:25
Front page factor model in

Front page factor model in pathicCov, matricCov, and matricRaw forms as an example of differences in fit (models attached).

One-factor pathic model of covariance data

    # Observed statistics:  16
# Estimated parameters:  10
# Degrees of freedom:  6
# -2 log likelihood:  -3648.281
# Saturated -2 log likelihood:  -3655.665
# Chi-Square:  7.384002
# p:  0.2867893
# AIC (Mx):  -4.615998
# BIC (Mx):  -14.95182
# RMSEA:  0.02147869


Now a Matrix model of the cov data

    # Observed statistics:  16
# Estimated parameters:  0
# Degrees of freedom:  16
# -2 log likelihood:  972.155
# Saturated -2 log likelihood:  -3655.665
# Chi-Square:  4627.819
# p:  0
# AIC (Mx):  4595.819
# BIC (Mx):  2264.193
# RMSEA:  0.7592611


NB: same model of the same data as the pathic, but RMSEA now suggests terrible fit, p differs etc. etc.

Next try a matrix model with full information

    # Observed statistics:  2500
# Estimated parameters:  0
# Degrees of freedom:  2500
# -2 log likelihood:  216183.3
# Saturated -2 log likelihood:
# Chi-Square:
# p:
# AIC (Mx):  211183.3
# BIC (Mx):  100323.4
# RMSEA:  0


Same model, but raw data. RMSEA now improbably = 0, most statistics not computed

Offline
Joined: 07/30/2009 - 14:03
Are you sure these are

Are you sure these are actually the same models? The non-pathic models are reporting 0 estimated parameters. That's just a count of the number of elements in the parameter vector passed to the estimator. Zero free parameters means that the estimator just calculated the -2LL of the starting values and returned. Fit stats will be horrible for such a model (unless you are very lucky with your starting values!)

Offline
Joined: 07/30/2009 - 14:03
I tried running the code in

I tried running the code in ML_vs_FIML.R and the ML matrix example came up with the same results as the pathic example. The FIML example returned an error because "names" was not defined.

Once I set

names <- manifests


your code ran and came up with the same parameter estimates and reasonable fit statistics.

Observed statistics:  2500
Estimated parameters:  15
Degrees of freedom:  2485
-2 log likelihood:  934.096
Saturated -2 log likelihood:
Chi-Square:
p:
AIC (Mx):  -4035.904
BIC (Mx):  -7254.603
RMSEA:  0

Offline
Joined: 07/31/2009 - 14:25
Hi Steve, Thanks for running

Hi Steve,
Thanks for running the script. I did it again from a clean boot at home and all is well. Can't see how the cov matrix version ever got 0 estimated but as you say... it runs fine.

For the FIML, I must have had the "names" variable set from some other model... A lesson in ensuring all used variables are set in the job, not just hanging around. oops.

For the missing Saturated -2 log likelihood in the FIML model, could mxRun() get this by estimating a saturated model (for means, variances, and covariances of the data)? Then we could have Chi-Square in the FIML model?

Offline
Joined: 07/30/2009 - 14:03
Interesting proposal. Does

Interesting proposal. Does anyone have objections?

Offline
Joined: 07/31/2009 - 15:14
There are reasons that Mx1

There are reasons that Mx1 does not fit the saturated model for FIML. One is because fitting the saturated model (a free parameter for every mean, variance and covariance) is computationally intensive. Ideally it should be done only once. Mx1 has two ways of using the results of such a model: Option issat, which assumes that later in the same script there is a model of interest for which chi-sq fit will be computed; and ii) option saturated=-2lnL,df which allows the user to supply fit statistics which have been computed in a separate job.

Two, when there are definition variables, what the saturated model actually is becomes obscure, since any parameter of the model might be any polynomial function of any or all of the definition variables.

Note that there isn't a shortcut to computing the likelihood when there are missing data, although it is possible to supply decent starting values as long as the data are MCAR; when data are MAR the observed means, variances and covariances are not the same as the MLE's.

I am not sure whether we should just have an mxCompare(model1,model2,...) function to compare models, or if an additional argument to mxModel(compare=list) might allow for comparative statistics to be computed on the fly. I lean towards mxCompare, albeit at a cost of additional commands.

Offline
Joined: 07/30/2009 - 14:03
I would like to see an

I would like to see an mxCompare function. As you say, one of the things it could do is compare to a saturated model.

Tim Brick and I had a conversation yesterday about the saturated model problem. I continue to think that the obscure cases are only maybe 5% of actual usage in SEM. So, we should expect to need to override something that can be automatically computed, and we need to throw an error when a person asks for automatic computation and there are potential pitfalls such as definition variables in the model. For the vast majority of SEMers, an auto saturated model computation (expensive though it might be) would be useful. Whether it should be on by default is another question about which I remain unsure.

Offline
Joined: 07/31/2009 - 14:25
Especially as the saturated

Especially as the saturated model can be computed in parallel, so on machines where the user has a few spare cores, will add no time.

Maybe something like

saturated=(T|F) as default, and if the user provides a numeric value, that is used as the saturated value of the objective?

Offline
Joined: 09/24/2009 - 15:41
Hi Mike, I'm finally starting

Hi Mike,
I'm finally starting with OpenMx, and I must say that I love it!
Have you given any thoughts to the mxCompare function?
In the meantime, is there a way to get the degrees of freedom using mxEval, in the same way that we get the LL. If that's possible then one can simply compute the LRT and the difference in degrees of freedom, with the corresponding p value using R commands.

Thanks!

Offline
Joined: 07/31/2009 - 14:25
Dear Irene, You can get the

Dear Irene,
You can get the degrees of freedom from summary().
assuming you have a fitted model in "fit"
> a = summary(fit)
> names(a)
[1] "parameters" "Minus2LogLikelihood" "SaturatedLikelihood" "numObs"
[5] "estimatedParameters" "observedStatistics" "degreesOfFreedom" "Chi"
[9] "p" "AIC.Mx" "BIC.Mx" "RMSEA"
[13] "dataSummary" "frontendTime" "backendTime" "mxVersion"

> a\$Minus2LogLikelihood
1066

Offline
Joined: 07/31/2009 - 14:25
hmm. In at work again and

hmm. In at work again and running the covariance model. Building today's R, shutting down and reloading R.app (today's build)... I get a model with 0 df.

require(OpenMx);
data(demoOneFactor)
factorModel <- mxModel("One Factor",
mxMatrix("Full", 5, 1, values=0.2, free=T, name="A"),
mxMatrix("Symm", 1, 1, values=1,   free=F, name="L"),
mxMatrix("Diag", 5, 5, values=1,   free=T, name="U"),
mxAlgebra(A %*% L %*% t(A) + U, name="R", dimnames = list(names(demoOneFactor), names(demoOneFactor)) ),
mxMLObjective("R"),
mxData(cov(demoOneFactor), type="cov", numObs=500)
)
summary(mxRun(factorModel))

Running One Factor
x1               x2               x3               x4               x5
Min.   :0.1985   Min.   :0.2000   Min.   :0.2312   Min.   :0.2784   Min.   :0.3156
1st Qu.:0.2000   1st Qu.:0.2917   1st Qu.:0.2925   1st Qu.:0.3515   1st Qu.:0.4019
Median :0.2312   Median :0.2925   Median :0.3740   Median :0.4061   Median :0.4574
Mean   :0.2447   Mean   :0.3075   Mean   :0.3522   Mean   :0.4261   Mean   :0.4813
3rd Qu.:0.2784   3rd Qu.:0.3515   3rd Qu.:0.4061   3rd Qu.:0.5333   3rd Qu.:0.5611
Max.   :0.3156   Max.   :0.4019   Max.   :0.4574   Max.   :0.5611   Max.   :0.6703

Observed statistics:  16
Estimated parameters:  0
Degrees of freedom:  16
-2 log likelihood:  972.155
Saturated -2 log likelihood:  -3655.665
Chi-Square:  4627.819
p:  0
AIC (Mx):  4595.819
BIC (Mx):  2264.193
RMSEA:  0.7592611


get the same result just copying the front page ALA'+U model into R and pressing return... Baffled, as it worked at home (intel laptop) last night.
Can some other people try this please?

Offline
Joined: 07/30/2009 - 14:03
Just did an svn up cd

Just did an

svn up
cd trunk
make clean install
R

That installed rev 803.

Copied the model from your post.

I got exactly what I'd expect:

[6]sboker@localhost:svn/OpenMx/trunk % R

R version 2.9.2 (2009-08-24)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> require(OpenMx);
snowfall 1.70 initialized: sequential execution, one CPU.

> data(demoOneFactor)
> factorModel <- mxModel("One Factor",
+       mxMatrix("Full", 5, 1, values=0.2, free=T, name="A"),
+       mxMatrix("Symm", 1, 1, values=1,   free=F, name="L"),
+       mxMatrix("Diag", 5, 5, values=1,   free=T, name="U"),
+       mxAlgebra(A %*% L %*% t(A) + U, name="R", dimnames = list(names(demoOneFactor), names(demoOneFactor)) ),
+       mxMLObjective("R"),
+       mxData(cov(demoOneFactor), type="cov", numObs=500)
+ )
> summary(mxRun(factorModel))
Running One Factor
x1               x2               x3               x4
Min.   :0.1985   Min.   :0.2000   Min.   :0.2312   Min.   :0.2784
1st Qu.:0.2000   1st Qu.:0.2917   1st Qu.:0.2925   1st Qu.:0.3515
Median :0.2312   Median :0.2925   Median :0.3740   Median :0.4061
Mean   :0.2447   Mean   :0.3075   Mean   :0.3522   Mean   :0.4261
3rd Qu.:0.2784   3rd Qu.:0.3515   3rd Qu.:0.4061   3rd Qu.:0.5333
Max.   :0.3156   Max.   :0.4019   Max.   :0.4574   Max.   :0.5611
x5
Min.   :0.3156
1st Qu.:0.4019
Median :0.4574
Mean   :0.4813
3rd Qu.:0.5611
Max.   :0.6703

name matrix row col parameter estimate error estimate
1        A   1   1         0.39715251    0.010964307
2        A   2   1         0.50366157    0.012783503
3        A   3   1         0.57724193    0.014400946
4        A   4   1         0.70277434    0.016884658
5        A   5   1         0.79625074    0.018766326
6        U   1   1         0.04081420    0.001953551
7        U   2   2         0.03801999    0.001723759
8        U   3   3         0.04082719    0.002185719
9        U   4   4         0.03938707    0.002396026
10       U   5   5         0.03628712    0.002584026

Observed statistics:  16
Estimated parameters:  10
Degrees of freedom:  6
-2 log likelihood:  -3648.281
Saturated -2 log likelihood:  -3655.665
Chi-Square:  7.384002
p:  0.2867893
AIC (Mx):  -4.615998
BIC (Mx):  -14.95182
RMSEA:  0.02147869


Offline
Joined: 07/31/2009 - 14:25
yes, and works fine or me at

yes, and works fine or me at home (intel macbook).

I wonder if it is something about the PowerPC (that's my work machine) NPSOL?

Offline
Joined: 07/31/2009 - 15:24
No it's not a PPC issue.

No it's not a PPC issue. Your model has been specified with no free parameters. I took the original model and replaced all the free parameters with fixed parameters and I got the same summary statistics as you reported when running the model. Try executing rm(list=ls()) in your workspace before running the model (I'm assuming you don't store critical data or code in your workspace). Hmm, I believe you had another forum post where you assigned T <- FALSE as an example of R weirdness. If that assignment was done in your workspace, that would explain the behavior you are seeing.

Offline
Joined: 07/31/2009 - 14:25
hi mike, just came here to

hi mike, just came here to report that exact solution... beat me too it.

This has to be a great reason for stopping using variables as native booleans in example code...

Offline
Joined: 07/30/2009 - 14:03
The reason I did it on the

The reason I did it on the front page code is... well... it's the front page and I was trying to fit a whole factor model into one column.

I think we should have a short check script for R newbies (and wicked beta testers who do bad things on purpose).

Something like

T==TRUE
F==FALSE
c(1,2,3)==1:3
t(matrix(c(1,2,3,4), 2, 2))==matrix(c(1,3,2,4),2,2)


and so forth so that if you reassign 'T' or 't' or 'F' or 'c', etc. we can catch it.

I mean, we can be verbose and protect people against 'T', but we can't protect people against 't'. So, I'm not sure I even buy the logic of never ever using 'T'.

Instead, let's put together an omnibus script that we recommend bug reporters run to verify that they have a working instance of .RData.