Computing and Reporting fit statistics

Posted on Fri, 09/04/2009 - 10:40

tbates Joined: 07/31/2009

Forums

R Functions and User Interface

A place to discuss models current and desired og fit, df, AIC, etc and how to make basic functionality happen

Replied on Fri, 09/04/2009 - 11:51

tbates Joined: Jul 31, 2009

Several models do not report

Several models do not report fit statistics. What should be happening here? Do we need to attach them here the summary function can be improved, or should summary report the problems it found that stopped it computing the fit (couldn't be sure of df etc.) and then users can manually set these parameters in the model (df=xxx, ) to allow the existing summary to work?

Log in or register to post comments

Replied on Fri, 09/04/2009 - 12:56

tbates Joined: Jul 31, 2009

Front page factor model in

Front page factor model in pathicCov, matricCov, and matricRaw forms as an example of differences in fit (models attached).

One-factor pathic model of covariance data

	# Observed statistics:  16 
	# Estimated parameters:  10 
	# Degrees of freedom:  6 
	# -2 log likelihood:  -3648.281 
	# Saturated -2 log likelihood:  -3655.665 
	# Chi-Square:  7.384002 
	# p:  0.2867893 
	# AIC (Mx):  -4.615998 
	# BIC (Mx):  -14.95182 
	# adjusted BIC: 
	# RMSEA:  0.02147869

Now a Matrix model of the cov data

	# Observed statistics:  16 
	# Estimated parameters:  0 
	# Degrees of freedom:  16 
	# -2 log likelihood:  972.155 
	# Saturated -2 log likelihood:  -3655.665 
	# Chi-Square:  4627.819 
	# p:  0 
	# AIC (Mx):  4595.819 
	# BIC (Mx):  2264.193 
	# adjusted BIC: 
	# RMSEA:  0.7592611

NB: same model of the same data as the pathic, but RMSEA now suggests terrible fit, p differs etc. etc.

Next try a matrix model with full information

	# Observed statistics:  2500 
	# Estimated parameters:  0 
	# Degrees of freedom:  2500 
	# -2 log likelihood:  216183.3 
	# Saturated -2 log likelihood:  
	# Chi-Square:   
	# p:   
	# AIC (Mx):  211183.3 
	# BIC (Mx):  100323.4 
	# adjusted BIC: 
	# RMSEA:  0

Same model, but raw data. RMSEA now improbably = 0, most statistics not computed

Log in or register to post comments

Replied on Sat, 09/05/2009 - 11:47

Steve Joined: Jul 30, 2009

Are you sure these are

Are you sure these are actually the same models? The non-pathic models are reporting 0 estimated parameters. That's just a count of the number of elements in the parameter vector passed to the estimator. Zero free parameters means that the estimator just calculated the -2LL of the starting values and returned. Fit stats will be horrible for such a model (unless you are very lucky with your starting values!)

Log in or register to post comments

Replied on Sat, 09/05/2009 - 12:21

Steve Joined: Jul 30, 2009

I tried running the code in

I tried running the code in ML_vs_FIML.R and the ML matrix example came up with the same results as the pathic example. The FIML example returned an error because "names" was not defined.

Once I set

names <- manifests

your code ran and came up with the same parameter estimates and reasonable fit statistics.

Observed statistics:  2500 
Estimated parameters:  15 
Degrees of freedom:  2485 
-2 log likelihood:  934.096 
Saturated -2 log likelihood:  
Chi-Square:   
p:   
AIC (Mx):  -4035.904 
BIC (Mx):  -7254.603 
adjusted BIC: 
RMSEA:  0

Log in or register to post comments

Replied on Sat, 09/05/2009 - 16:51

tbates Joined: Jul 31, 2009

Hi Steve, Thanks for running

Hi Steve,
Thanks for running the script. I did it again from a clean boot at home and all is well. Can't see how the cov matrix version ever got 0 estimated but as you say... it runs fine.

For the FIML, I must have had the "names" variable set from some other model... A lesson in ensuring all used variables are set in the job, not just hanging around. oops.

For the missing Saturated -2 log likelihood in the FIML model, could mxRun() get this by estimating a saturated model (for means, variances, and covariances of the data)? Then we could have Chi-Square in the FIML model?

Log in or register to post comments

Replied on Sat, 09/05/2009 - 21:48

Steve Joined: Jul 30, 2009

Interesting proposal. Does

Interesting proposal. Does anyone have objections?

Log in or register to post comments

Replied on Tue, 09/08/2009 - 15:23

neale Joined: Jul 31, 2009

There are reasons that Mx1

There are reasons that Mx1 does not fit the saturated model for FIML. One is because fitting the saturated model (a free parameter for every mean, variance and covariance) is computationally intensive. Ideally it should be done only once. Mx1 has two ways of using the results of such a model: Option issat, which assumes that later in the same script there is a model of interest for which chi-sq fit will be computed; and ii) option saturated=-2lnL,df which allows the user to supply fit statistics which have been computed in a separate job.

Two, when there are definition variables, what the saturated model actually is becomes obscure, since any parameter of the model might be any polynomial function of any or all of the definition variables.

Note that there isn't a shortcut to computing the likelihood when there are missing data, although it is possible to supply decent starting values as long as the data are MCAR; when data are MAR the observed means, variances and covariances are not the same as the MLE's.

I am not sure whether we should just have an mxCompare(model1,model2,...) function to compare models, or if an additional argument to mxModel(compare=list) might allow for comparative statistics to be computed on the fly. I lean towards mxCompare, albeit at a cost of additional commands.

Log in or register to post comments

Replied on Tue, 09/08/2009 - 16:42

Steve Joined: Jul 30, 2009

I would like to see an

I would like to see an mxCompare function. As you say, one of the things it could do is compare to a saturated model.

Tim Brick and I had a conversation yesterday about the saturated model problem. I continue to think that the obscure cases are only maybe 5% of actual usage in SEM. So, we should expect to need to override something that can be automatically computed, and we need to throw an error when a person asks for automatic computation and there are potential pitfalls such as definition variables in the model. For the vast majority of SEMers, an auto saturated model computation (expensive though it might be) would be useful. Whether it should be on by default is another question about which I remain unsure.

Log in or register to post comments

Replied on Wed, 09/09/2009 - 11:24

tbates Joined: Jul 31, 2009

Especially as the saturated

Especially as the saturated model can be computed in parallel, so on machines where the user has a few spare cores, will add no time.

Maybe something like

saturated=(T|F) as default, and if the user provides a numeric value, that is used as the saturated value of the objective?

Log in or register to post comments

Replied on Fri, 12/11/2009 - 05:26

irebollo Joined: Sep 24, 2009

Hi Mike, I'm finally starting

Hi Mike,
I'm finally starting with OpenMx, and I must say that I love it!
Have you given any thoughts to the mxCompare function?
In the meantime, is there a way to get the degrees of freedom using mxEval, in the same way that we get the LL. If that's possible then one can simply compute the LRT and the difference in degrees of freedom, with the corresponding p value using R commands.

Thanks!

Log in or register to post comments

Replied on Thu, 02/11/2010 - 10:36

tbates Joined: Jul 31, 2009

Dear Irene, You can get the

Dear Irene,
You can get the degrees of freedom from summary().
assuming you have a fitted model in "fit"
> a = summary(fit)
> names(a)
[1] "parameters" "Minus2LogLikelihood" "SaturatedLikelihood" "numObs"
[5] "estimatedParameters" "observedStatistics" "degreesOfFreedom" "Chi"
[9] "p" "AIC.Mx" "BIC.Mx" "RMSEA"
[13] "dataSummary" "frontendTime" "backendTime" "mxVersion"

> a$Minus2LogLikelihood
1066

Log in or register to post comments

Replied on Mon, 09/07/2009 - 09:57

tbates Joined: Jul 31, 2009

hmm. In at work again and

hmm. In at work again and running the covariance model. Building today's R, shutting down and reloading R.app (today's build)... I get a model with 0 df.

require(OpenMx);
data(demoOneFactor)
factorModel <- mxModel("One Factor",
      mxMatrix("Full", 5, 1, values=0.2, free=T, name="A"),
      mxMatrix("Symm", 1, 1, values=1,   free=F, name="L"),
      mxMatrix("Diag", 5, 5, values=1,   free=T, name="U"),
      mxAlgebra(A %*% L %*% t(A) + U, name="R", dimnames = list(names(demoOneFactor), names(demoOneFactor)) ),
      mxMLObjective("R"),
      mxData(cov(demoOneFactor), type="cov", numObs=500)
)
summary(mxRun(factorModel))

Running One Factor 
       x1               x2               x3               x4               x5        
 Min.   :0.1985   Min.   :0.2000   Min.   :0.2312   Min.   :0.2784   Min.   :0.3156  
 1st Qu.:0.2000   1st Qu.:0.2917   1st Qu.:0.2925   1st Qu.:0.3515   1st Qu.:0.4019  
 Median :0.2312   Median :0.2925   Median :0.3740   Median :0.4061   Median :0.4574  
 Mean   :0.2447   Mean   :0.3075   Mean   :0.3522   Mean   :0.4261   Mean   :0.4813  
 3rd Qu.:0.2784   3rd Qu.:0.3515   3rd Qu.:0.4061   3rd Qu.:0.5333   3rd Qu.:0.5611  
 Max.   :0.3156   Max.   :0.4019   Max.   :0.4574   Max.   :0.5611   Max.   :0.6703  

Observed statistics:  16 
Estimated parameters:  0 
Degrees of freedom:  16 
-2 log likelihood:  972.155 
Saturated -2 log likelihood:  -3655.665 
Chi-Square:  4627.819 
p:  0 
AIC (Mx):  4595.819 
BIC (Mx):  2264.193 
adjusted BIC: 
RMSEA:  0.7592611

get the same result just copying the front page ALA'+U model into R and pressing return... Baffled, as it worked at home (intel laptop) last night.
Can some other people try this please?

Log in or register to post comments

Replied on Mon, 09/07/2009 - 20:30

Steve Joined: Jul 30, 2009

Just did an svn up cd

Just did an

svn up
cd trunk
make clean install
R

That installed rev 803.

Copied the model from your post.

I got exactly what I'd expect:

[6]sboker@localhost:svn/OpenMx/trunk % R

R version 2.9.2 (2009-08-24)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> require(OpenMx);
Loading required package: OpenMx
Loading required package: snowfall
Loading required package: snow
snowfall 1.70 initialized: sequential execution, one CPU.

> data(demoOneFactor)
> factorModel <- mxModel("One Factor",
+       mxMatrix("Full", 5, 1, values=0.2, free=T, name="A"),
+       mxMatrix("Symm", 1, 1, values=1,   free=F, name="L"),
+       mxMatrix("Diag", 5, 5, values=1,   free=T, name="U"),
+       mxAlgebra(A %*% L %*% t(A) + U, name="R", dimnames = list(names(demoOneFactor), names(demoOneFactor)) ),
+       mxMLObjective("R"),
+       mxData(cov(demoOneFactor), type="cov", numObs=500)
+ )
> summary(mxRun(factorModel))
Running One Factor 
       x1               x2               x3               x4        
 Min.   :0.1985   Min.   :0.2000   Min.   :0.2312   Min.   :0.2784  
 1st Qu.:0.2000   1st Qu.:0.2917   1st Qu.:0.2925   1st Qu.:0.3515  
 Median :0.2312   Median :0.2925   Median :0.3740   Median :0.4061  
 Mean   :0.2447   Mean   :0.3075   Mean   :0.3522   Mean   :0.4261  
 3rd Qu.:0.2784   3rd Qu.:0.3515   3rd Qu.:0.4061   3rd Qu.:0.5333  
 Max.   :0.3156   Max.   :0.4019   Max.   :0.4574   Max.   :0.5611  
       x5        
 Min.   :0.3156  
 1st Qu.:0.4019  
 Median :0.4574  
 Mean   :0.4813  
 3rd Qu.:0.5611  
 Max.   :0.6703  

   name matrix row col parameter estimate error estimate
1        A   1   1         0.39715251    0.010964307
2        A   2   1         0.50366157    0.012783503
3        A   3   1         0.57724193    0.014400946
4        A   4   1         0.70277434    0.016884658
5        A   5   1         0.79625074    0.018766326
6        U   1   1         0.04081420    0.001953551
7        U   2   2         0.03801999    0.001723759
8        U   3   3         0.04082719    0.002185719
9        U   4   4         0.03938707    0.002396026
10       U   5   5         0.03628712    0.002584026

Observed statistics:  16 
Estimated parameters:  10 
Degrees of freedom:  6 
-2 log likelihood:  -3648.281 
Saturated -2 log likelihood:  -3655.665 
Chi-Square:  7.384002 
p:  0.2867893 
AIC (Mx):  -4.615998 
BIC (Mx):  -14.95182 
adjusted BIC: 
RMSEA:  0.02147869

Log in or register to post comments

Replied on Mon, 09/07/2009 - 20:40

tbates Joined: Jul 31, 2009

yes, and works fine or me at

yes, and works fine or me at home (intel macbook).

I wonder if it is something about the PowerPC (that's my work machine) NPSOL?

Log in or register to post comments

Replied on Mon, 09/07/2009 - 21:10

mspiegel Joined: Jul 31, 2009

No it's not a PPC issue.

No it's not a PPC issue. Your model has been specified with no free parameters. I took the original model and replaced all the free parameters with fixed parameters and I got the same summary statistics as you reported when running the model. Try executing rm(list=ls()) in your workspace before running the model (I'm assuming you don't store critical data or code in your workspace). Hmm, I believe you had another forum post where you assigned T <- FALSE as an example of R weirdness. If that assignment was done in your workspace, that would explain the behavior you are seeing.

Log in or register to post comments

Replied on Tue, 09/08/2009 - 06:00

tbates Joined: Jul 31, 2009

hi mike, just came here to

hi mike, just came here to report that exact solution... beat me too it.

This has to be a _great_ reason for stopping using variables as native booleans in example code...

Log in or register to post comments

Replied on Tue, 09/08/2009 - 07:51

Steve Joined: Jul 30, 2009

The reason I did it on the

The reason I did it on the front page code is... well... it's the front page and I was trying to fit a whole factor model into one column.

I think we should have a short check script for R newbies (and wicked beta testers who do bad things on purpose).

Something like

T==TRUE
F==FALSE
c(1,2,3)==1:3
t(matrix(c(1,2,3,4), 2, 2))==matrix(c(1,3,2,4),2,2)

and so forth so that if you reassign 'T' or 't' or 'F' or 'c', etc. we can catch it.

I mean, we can be verbose and protect people against 'T', but we can't protect people against 't'. So, I'm not sure I even buy the logic of never ever using 'T'.

Instead, let's put together an omnibus script that we recommend bug reporters run to verify that they have a working instance of .RData.

Log in or register to post comments

Log in or register to post comments

News

Recent Posts

Computing and Reporting fit statistics

Several models do not report

Front page factor model in

Are you sure these are

I tried running the code in

Hi Steve, Thanks for running

Interesting proposal. Does

There are reasons that Mx1

I would like to see an

Especially as the saturated

Hi Mike, I'm finally starting

Dear Irene, You can get the

hmm. In at work again and

Just did an svn up cd

yes, and works fine or me at

No it's not a PPC issue.

hi mike, just came here to

The reason I did it on the

News

Recent Posts