Hi, The current trunk of OpenMx (999.0.0-3521) seems to be not estimating parameters correctly.
I've uploaded a gist that shows the issues here: It's a simple independence model with 3 variables in the built-in mtcars data set.
https://gist.github.com/tbates/d2261e8c5daf426e1c8e
You can see that the means are borderline OK, but the variances are way off.
# =================================== # = Means and variances in the data = # =================================== round(diag(cov(mtcars[,manifests])),2) mpg disp gear 36.32 15360.80 0.54 colMeans(mtcars[,manifests]) mpg disp gear 20.09 230.72 3.69 summary(m1)$parameters name matrix row col Estimate Std.Error lbound ubound 1 ind.S[1,1] S mpg mpg 6.955250e+05 NA 2 ind.S[2,2] S disp disp 1.089073e+08 NA 3 ind.S[3,3] S gear gear 1.879208e+04 NA 4 ind.M[1,1] M 1 mpg 2.001955e+01 145.28108 5 ind.M[1,2] M 1 disp 1.307563e+02 1833.42676 6 ind.M[1,3] M 1 gear 3.693204e+00 24.14331
Comments
#1
I took a look at this today. It is a CSOLNP issue. NPSOL miraculously gets to the correct solution from the awful starting values provided (zeroes for everything basically). CSOLNP does not. Even given halfway decent ones with something like
Note that the correct ML estimates of the variances deviate from those calculated using var() or cov() a non-trivial amount (31/32) due to the small N of 32. This is definitely not the whole issue with CSOLNP.
Even given halfway decent starting values, CSOLNP fails to get var(disp) correct:
Although the estimates are off, the -2lnL is only off by .2 or so, this is likely due to flattish gradients on account of the lack of data. But it should do better.
Useful test case, should probably go in models/failing in a CSOLNPmtcars.R
#2
This is not a raw data issue. The problem is the same with covariance data.
Scaling is almost certainly the major problem. When free parameters span five orders of magnitude (1e-1 to 1e+4), only the most robust optimizers will be able to handle that on their own without blinking. It really screws with the magnitude of the gradient because you have to make tiny steps in some directions (for the 1e-1 parameters) and giant steps in other directions (for the 1e+4 parameters). Physicists suggest that you re-parameterize the problem to avoid parameter estimation at very different orders of magnitude.
Another part of the problem is definitely the starting values. Zero is just about the worst starting value for a variance. With that being said, CSOLNP does wiggle off from zero starting values. It just doesn't wiggle far enough.
Re-scaling the data (ala physics) and letting CSOLNP wiggle the starting values off zero resolves the behavioral issue.
Of course, we would probably want CSOLNP to more robustly find the correct solution. Does anyone know if NPSOL rescales the gradient similar to CSOLNP?
#3
is it ok to use library(testthat) in failing models?
if so, I can upload this to "models/failing/CSOLNPmtcars.R"
#4
Oh, we use our own omx routine for that, which reduces the number of required libraries.
E.g.,
so a translation would be:
This passes with NPSOL but not with CSOLNP.
However, omxCheckCloseEnough() only works on absolute differences, whereas testthat() can do relative as well. I like the flexibility of testthat() though -- not sure if it's worth changing over to using it more broadly, a question for the developers generally.
#5
OpenMx has omxCheckCloseEnough() for absolute differences and omxCheckWithinPercentError() for relative (percent) differences. These functions are defined in trunk/R/MxUnitTesting.R.
#6
OK. uploaded the failing model using omxCheckWithinPercentError
#7
library(testthat) also has a wealth of other tests, like boolean, warning, string, error, comparison, type, name, class etc.
Really worthwhile.
Plus dovetails with devtools
#8
OpenMx has
As far as I can tell we've had all of these functions for 3+ years.
#9
Need to be better at starting... sorry