Several models do not report fit statistics. What should be happening here? Do we need to attach them here the summary function can be improved, or should summary report the problems it found that stopped it computing the fit (couldn't be sure of df etc.) and then users can manually set these parameters in the model (df=xxx, ) to allow the existing summary to work?
Are you sure these are actually the same models? The non-pathic models are reporting 0 estimated parameters. That's just a count of the number of elements in the parameter vector passed to the estimator. Zero free parameters means that the estimator just calculated the -2LL of the starting values and returned. Fit stats will be horrible for such a model (unless you are very lucky with your starting values!)
I tried running the code in ML_vs_FIML.R and the ML matrix example came up with the same results as the pathic example. The FIML example returned an error because "names" was not defined.
Once I set
names <- manifests
your code ran and came up with the same parameter estimates and reasonable fit statistics.
Hi Steve,
Thanks for running the script. I did it again from a clean boot at home and all is well. Can't see how the cov matrix version ever got 0 estimated but as you say... it runs fine.
For the FIML, I must have had the "names" variable set from some other model... A lesson in ensuring all used variables are set in the job, not just hanging around. oops.
For the missing Saturated -2 log likelihood in the FIML model, could mxRun() get this by estimating a saturated model (for means, variances, and covariances of the data)? Then we could have Chi-Square in the FIML model?
There are reasons that Mx1 does not fit the saturated model for FIML. One is because fitting the saturated model (a free parameter for every mean, variance and covariance) is computationally intensive. Ideally it should be done only once. Mx1 has two ways of using the results of such a model: Option issat, which assumes that later in the same script there is a model of interest for which chi-sq fit will be computed; and ii) option saturated=-2lnL,df which allows the user to supply fit statistics which have been computed in a separate job.
Two, when there are definition variables, what the saturated model actually is becomes obscure, since any parameter of the model might be any polynomial function of any or all of the definition variables.
Note that there isn't a shortcut to computing the likelihood when there are missing data, although it is possible to supply decent starting values as long as the data are MCAR; when data are MAR the observed means, variances and covariances are not the same as the MLE's.
I am not sure whether we should just have an mxCompare(model1,model2,...) function to compare models, or if an additional argument to mxModel(compare=list) might allow for comparative statistics to be computed on the fly. I lean towards mxCompare, albeit at a cost of additional commands.
I would like to see an mxCompare function. As you say, one of the things it could do is compare to a saturated model.
Tim Brick and I had a conversation yesterday about the saturated model problem. I continue to think that the obscure cases are only maybe 5% of actual usage in SEM. So, we should expect to need to override something that can be automatically computed, and we need to throw an error when a person asks for automatic computation and there are potential pitfalls such as definition variables in the model. For the vast majority of SEMers, an auto saturated model computation (expensive though it might be) would be useful. Whether it should be on by default is another question about which I remain unsure.
Hi Mike,
I'm finally starting with OpenMx, and I must say that I love it!
Have you given any thoughts to the mxCompare function?
In the meantime, is there a way to get the degrees of freedom using mxEval, in the same way that we get the LL. If that's possible then one can simply compute the LRT and the difference in degrees of freedom, with the corresponding p value using R commands.
Dear Irene,
You can get the degrees of freedom from summary().
assuming you have a fitted model in "fit"
> a = summary(fit)
> names(a)
[1] "parameters" "Minus2LogLikelihood" "SaturatedLikelihood" "numObs"
[5] "estimatedParameters" "observedStatistics" "degreesOfFreedom" "Chi"
[9] "p" "AIC.Mx" "BIC.Mx" "RMSEA"
[13] "dataSummary" "frontendTime" "backendTime" "mxVersion"
hmm. In at work again and running the covariance model. Building today's R, shutting down and reloading R.app (today's build)... I get a model with 0 df.
require(OpenMx);
data(demoOneFactor)
factorModel <- mxModel("One Factor",
mxMatrix("Full", 5, 1, values=0.2, free=T, name="A"),
mxMatrix("Symm", 1, 1, values=1, free=F, name="L"),
mxMatrix("Diag", 5, 5, values=1, free=T, name="U"),
mxAlgebra(A %*% L %*% t(A) + U, name="R", dimnames = list(names(demoOneFactor), names(demoOneFactor)) ),
mxMLObjective("R"),
mxData(cov(demoOneFactor), type="cov", numObs=500)
)
summary(mxRun(factorModel))
Running One Factor
x1 x2 x3 x4 x5
Min. :0.1985 Min. :0.2000 Min. :0.2312 Min. :0.2784 Min. :0.3156
1st Qu.:0.2000 1st Qu.:0.2917 1st Qu.:0.2925 1st Qu.:0.3515 1st Qu.:0.4019
Median :0.2312 Median :0.2925 Median :0.3740 Median :0.4061 Median :0.4574
Mean :0.2447 Mean :0.3075 Mean :0.3522 Mean :0.4261 Mean :0.4813
3rd Qu.:0.2784 3rd Qu.:0.3515 3rd Qu.:0.4061 3rd Qu.:0.5333 3rd Qu.:0.5611
Max. :0.3156 Max. :0.4019 Max. :0.4574 Max. :0.5611 Max. :0.6703
Observed statistics: 16
Estimated parameters: 0
Degrees of freedom: 16
-2 log likelihood: 972.155
Saturated -2 log likelihood: -3655.665
Chi-Square: 4627.819
p: 0
AIC (Mx): 4595.819
BIC (Mx): 2264.193
adjusted BIC:
RMSEA: 0.7592611
get the same result just copying the front page ALA'+U model into R and pressing return... Baffled, as it worked at home (intel laptop) last night.
Can some other people try this please?
[6]sboker@localhost:svn/OpenMx/trunk % R
R version 2.9.2 (2009-08-24)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> require(OpenMx);
Loading required package: OpenMx
Loading required package: snowfall
Loading required package: snow
snowfall 1.70 initialized: sequential execution, one CPU.
> data(demoOneFactor)
> factorModel <- mxModel("One Factor",
+ mxMatrix("Full", 5, 1, values=0.2, free=T, name="A"),
+ mxMatrix("Symm", 1, 1, values=1, free=F, name="L"),
+ mxMatrix("Diag", 5, 5, values=1, free=T, name="U"),
+ mxAlgebra(A %*% L %*% t(A) + U, name="R", dimnames = list(names(demoOneFactor), names(demoOneFactor)) ),
+ mxMLObjective("R"),
+ mxData(cov(demoOneFactor), type="cov", numObs=500)
+ )
> summary(mxRun(factorModel))
Running One Factor
x1 x2 x3 x4
Min. :0.1985 Min. :0.2000 Min. :0.2312 Min. :0.2784
1st Qu.:0.2000 1st Qu.:0.2917 1st Qu.:0.2925 1st Qu.:0.3515
Median :0.2312 Median :0.2925 Median :0.3740 Median :0.4061
Mean :0.2447 Mean :0.3075 Mean :0.3522 Mean :0.4261
3rd Qu.:0.2784 3rd Qu.:0.3515 3rd Qu.:0.4061 3rd Qu.:0.5333
Max. :0.3156 Max. :0.4019 Max. :0.4574 Max. :0.5611
x5
Min. :0.3156
1st Qu.:0.4019
Median :0.4574
Mean :0.4813
3rd Qu.:0.5611
Max. :0.6703
name matrix row col parameter estimate error estimate
1 A 1 1 0.39715251 0.010964307
2 A 2 1 0.50366157 0.012783503
3 A 3 1 0.57724193 0.014400946
4 A 4 1 0.70277434 0.016884658
5 A 5 1 0.79625074 0.018766326
6 U 1 1 0.04081420 0.001953551
7 U 2 2 0.03801999 0.001723759
8 U 3 3 0.04082719 0.002185719
9 U 4 4 0.03938707 0.002396026
10 U 5 5 0.03628712 0.002584026
Observed statistics: 16
Estimated parameters: 10
Degrees of freedom: 6
-2 log likelihood: -3648.281
Saturated -2 log likelihood: -3655.665
Chi-Square: 7.384002
p: 0.2867893
AIC (Mx): -4.615998
BIC (Mx): -14.95182
adjusted BIC:
RMSEA: 0.02147869
No it's not a PPC issue. Your model has been specified with no free parameters. I took the original model and replaced all the free parameters with fixed parameters and I got the same summary statistics as you reported when running the model. Try executing rm(list=ls()) in your workspace before running the model (I'm assuming you don't store critical data or code in your workspace). Hmm, I believe you had another forum post where you assigned T <- FALSE as an example of R weirdness. If that assignment was done in your workspace, that would explain the behavior you are seeing.
and so forth so that if you reassign 'T' or 't' or 'F' or 'c', etc. we can catch it.
I mean, we can be verbose and protect people against 'T', but we can't protect people against 't'. So, I'm not sure I even buy the logic of never ever using 'T'.
Instead, let's put together an omnibus script that we recommend bug reporters run to verify that they have a working instance of .RData.
Several models do not report fit statistics. What should be happening here? Do we need to attach them here the summary function can be improved, or should summary report the problems it found that stopped it computing the fit (couldn't be sure of df etc.) and then users can manually set these parameters in the model (df=xxx, ) to allow the existing summary to work?
Front page factor model in pathicCov, matricCov, and matricRaw forms as an example of differences in fit (models attached).
One-factor pathic model of covariance data
Now a Matrix model of the cov data
NB: same model of the same data as the pathic, but RMSEA now suggests terrible fit, p differs etc. etc.
Next try a matrix model with full information
Same model, but raw data. RMSEA now improbably = 0, most statistics not computed
Are you sure these are actually the same models? The non-pathic models are reporting 0 estimated parameters. That's just a count of the number of elements in the parameter vector passed to the estimator. Zero free parameters means that the estimator just calculated the -2LL of the starting values and returned. Fit stats will be horrible for such a model (unless you are very lucky with your starting values!)
I tried running the code in ML_vs_FIML.R and the ML matrix example came up with the same results as the pathic example. The FIML example returned an error because "names" was not defined.
Once I set
your code ran and came up with the same parameter estimates and reasonable fit statistics.
Hi Steve,
Thanks for running the script. I did it again from a clean boot at home and all is well. Can't see how the cov matrix version ever got 0 estimated but as you say... it runs fine.
For the FIML, I must have had the "names" variable set from some other model... A lesson in ensuring all used variables are set in the job, not just hanging around. oops.
For the missing Saturated -2 log likelihood in the FIML model, could mxRun() get this by estimating a saturated model (for means, variances, and covariances of the data)? Then we could have Chi-Square in the FIML model?
Interesting proposal. Does anyone have objections?
There are reasons that Mx1 does not fit the saturated model for FIML. One is because fitting the saturated model (a free parameter for every mean, variance and covariance) is computationally intensive. Ideally it should be done only once. Mx1 has two ways of using the results of such a model: Option issat, which assumes that later in the same script there is a model of interest for which chi-sq fit will be computed; and ii) option saturated=-2lnL,df which allows the user to supply fit statistics which have been computed in a separate job.
Two, when there are definition variables, what the saturated model actually is becomes obscure, since any parameter of the model might be any polynomial function of any or all of the definition variables.
Note that there isn't a shortcut to computing the likelihood when there are missing data, although it is possible to supply decent starting values as long as the data are MCAR; when data are MAR the observed means, variances and covariances are not the same as the MLE's.
I am not sure whether we should just have an mxCompare(model1,model2,...) function to compare models, or if an additional argument to mxModel(compare=list) might allow for comparative statistics to be computed on the fly. I lean towards mxCompare, albeit at a cost of additional commands.
I would like to see an mxCompare function. As you say, one of the things it could do is compare to a saturated model.
Tim Brick and I had a conversation yesterday about the saturated model problem. I continue to think that the obscure cases are only maybe 5% of actual usage in SEM. So, we should expect to need to override something that can be automatically computed, and we need to throw an error when a person asks for automatic computation and there are potential pitfalls such as definition variables in the model. For the vast majority of SEMers, an auto saturated model computation (expensive though it might be) would be useful. Whether it should be on by default is another question about which I remain unsure.
Especially as the saturated model can be computed in parallel, so on machines where the user has a few spare cores, will add no time.
Maybe something like
saturated=(T|F) as default, and if the user provides a numeric value, that is used as the saturated value of the objective?
Hi Mike,
I'm finally starting with OpenMx, and I must say that I love it!
Have you given any thoughts to the mxCompare function?
In the meantime, is there a way to get the degrees of freedom using mxEval, in the same way that we get the LL. If that's possible then one can simply compute the LRT and the difference in degrees of freedom, with the corresponding p value using R commands.
Thanks!
Dear Irene,
You can get the degrees of freedom from summary().
assuming you have a fitted model in "fit"
> a = summary(fit)
> names(a)
[1] "parameters" "Minus2LogLikelihood" "SaturatedLikelihood" "numObs"
[5] "estimatedParameters" "observedStatistics" "degreesOfFreedom" "Chi"
[9] "p" "AIC.Mx" "BIC.Mx" "RMSEA"
[13] "dataSummary" "frontendTime" "backendTime" "mxVersion"
> a$Minus2LogLikelihood
1066
hmm. In at work again and running the covariance model. Building today's R, shutting down and reloading R.app (today's build)... I get a model with 0 df.
get the same result just copying the front page ALA'+U model into R and pressing return... Baffled, as it worked at home (intel laptop) last night.
Can some other people try this please?
Just did an
svn up
cd trunk
make clean install
R
That installed rev 803.
Copied the model from your post.
I got exactly what I'd expect:
yes, and works fine or me at home (intel macbook).
I wonder if it is something about the PowerPC (that's my work machine) NPSOL?
No it's not a PPC issue. Your model has been specified with no free parameters. I took the original model and replaced all the free parameters with fixed parameters and I got the same summary statistics as you reported when running the model. Try executing rm(list=ls()) in your workspace before running the model (I'm assuming you don't store critical data or code in your workspace). Hmm, I believe you had another forum post where you assigned T <- FALSE as an example of R weirdness. If that assignment was done in your workspace, that would explain the behavior you are seeing.
hi mike, just came here to report that exact solution... beat me too it.
This has to be a great reason for stopping using variables as native booleans in example code...
The reason I did it on the front page code is... well... it's the front page and I was trying to fit a whole factor model into one column.
I think we should have a short check script for R newbies (and wicked beta testers who do bad things on purpose).
Something like
and so forth so that if you reassign 'T' or 't' or 'F' or 'c', etc. we can catch it.
I mean, we can be verbose and protect people against 'T', but we can't protect people against 't'. So, I'm not sure I even buy the logic of never ever using 'T'.
Instead, let's put together an omnibus script that we recommend bug reporters run to verify that they have a working instance of .RData.