I've uploaded a gist that shows the issues here: It's a simple independence model with 3 variables in the built-in mtcars data set.
https://gist.github.com/tbates/d2261e8c5daf426e1c8e
You can see that the means are borderline OK, but the variances are way off.
# ===================================
# = Means and variances in the data =
# ===================================
round(diag(cov(mtcars[,manifests])),2)
mpg disp gear
36.32 15360.80 0.54
colMeans(mtcars[,manifests])
mpg disp gear
20.09 230.72 3.69
summary(m1)$parameters
name matrix row col Estimate Std.Error lbound ubound
1 ind.S[1,1] S mpg mpg 6.955250e+05 NA
2 ind.S[2,2] S disp disp 1.089073e+08 NA
3 ind.S[3,3] S gear gear 1.879208e+04 NA
4 ind.M[1,1] M 1 mpg 2.001955e+01 145.28108
5 ind.M[1,2] M 1 disp 1.307563e+02 1833.42676
6 ind.M[1,3] M 1 gear 3.693204e+00 24.14331
#1
Note that the correct ML estimates of the variances deviate from those calculated using var() or cov() a non-trivial amount (31/32) due to the small N of 32. This is definitely not the whole issue with CSOLNP.
> s = summary(m1)$parameters; s$Estimate = round(s$Estimate,2); s[,3:5]; diag(cov(mtcars[,manifests]))*31/32; colMeans(mtcars[,manifests])
row col Estimate
1 mpg mpg 35.19
2 disp disp 14880.81
3 gear gear 0.53
4 1 mpg 20.09
5 1 disp 230.72
6 1 gear 3.69
mpg disp gear
3.518897e+01 1.488077e+04 5.273438e-01
mpg disp gear
20.09062 230.72188 3.68750
Even given halfway decent starting values, CSOLNP fails to get var(disp) correct:
mxOption(NULL,"Default optimizer", "CSOLNP")
manifests = c("mpg", "disp", "gear")
# =============================
# = simple independence model =
# =============================
m1 <- mxModel("ind", type = "RAM",
manifestVars = manifests,
mxPath(from = manifests, arrows = 2, values=diag(cov(mtcars[,manifests]))*1.05),
mxPath(from = "one", to = manifests, values=colMeans(mtcars[,manifests])*1.1),
mxData(mtcars[,manifests], type="raw")
)
m1 = mxRun(m1)
# ===========================
# = Our parameter estiamtes =
# ===========================
> s = summary(m1)$parameters; s$Estimate = round(s$Estimate,2); s[,3:5];
row col Estimate
1 mpg mpg 35.19
2 disp disp 16128.81
3 gear gear 0.53
4 1 mpg 20.09
5 1 disp 230.72
6 1 gear 3.69
Although the estimates are off, the -2lnL is only off by .2 or so, this is likely due to flattish gradients on account of the lack of data. But it should do better.
Useful test case, should probably go in models/failing in a CSOLNPmtcars.R
mxOption(NULL,"Default optimizer", "CSOLNP")
manifests = c("mpg", "disp", "gear")
# =============================
# = simple independence model =
# =============================
m1 <- mxModel("ind", type = "RAM",
manifestVars = manifests,
mxPath(from = manifests, arrows = 2, values=diag(cov(mtcars[,manifests]))*1.05),
mxPath(from = "one", to = manifests, values=colMeans(mtcars[,manifests])*1.1),
mxData(mtcars[,manifests], type="raw")
)
m1 = mxRun(m1)
# ===========================
# = Our parameter estiamtes =
# ===========================
s = summary(m1)$parameters; s$Estimate = round(s$Estimate,2); s[,3:5]; summary(m1); diag(cov(mtcars[,manifests]))*31/32; colMeans(mtcars[,manifests])
Log in or register to post comments
#2
# =============================
m1 <- mxModel("ind", type = "RAM",
manifestVars = manifests,
mxPath(from = manifests, arrows = 2),
mxPath(from = "one", to = manifests),
mxData(cov(mtcars[,manifests]), means=colMeans(mtcars[,manifests]), numObs=nrow(mtcars), type="cov")
)
m1 = mxRun(m1)
#Running ind
s = summary(m1)$parameters; s$Estimate = round(s$Estimate,2); s[,3:5]
# row col Estimate
#1 mpg mpg 695012.58
#2 disp disp 108907335.59
#3 gear gear 0.54
#4 1 mpg 20.09
#5 1 disp 230.59
#6 1 gear 3.69
Scaling is almost certainly the major problem. When free parameters span five orders of magnitude (1e-1 to 1e+4), only the most robust optimizers will be able to handle that on their own without blinking. It really screws with the magnitude of the gradient because you have to make tiny steps in some directions (for the 1e-1 parameters) and giant steps in other directions (for the 1e+4 parameters). Physicists suggest that you re-parameterize the problem to avoid parameter estimation at very different orders of magnitude.
Another part of the problem is definitely the starting values. Zero is just about the worst starting value for a variance. With that being said, CSOLNP does wiggle off from zero starting values. It just doesn't wiggle far enough.
Re-scaling the data (ala physics) and letting CSOLNP wiggle the starting values off zero resolves the behavioral issue.
library(OpenMx)
manifests = c("mpg", "disp", "gear")
mtcars$mpg <- mtcars$mpg/10
mtcars$disp <- mtcars$disp/300
mtcars$gear <- mtcars$gear
# =============================
# = simple independence model =
# =============================
m1 <- mxModel("ind", type = "RAM",
manifestVars = manifests,
mxPath(from = manifests, arrows = 2, values=0),
mxPath(from = "one", to = manifests),
mxData(mtcars[,manifests], type="raw")
)
m1 = mxRun(m1)
# ===========================
# = Our parameter estiamtes =
# ===========================
s = summary(m1)$parameters; s$Estimate = round(s$Estimate,2); s[,3:5]
# row col Estimate
#1 mpg mpg 35.19
#2 disp disp 16.53
#3 gear gear 52.73
#4 1 mpg 20.09
#5 1 disp 7.69
#6 1 gear 36.87
round(diag(cov(mtcars[,manifests])),2)
# mpg disp gear
#36.32 17.07 54.44
Of course, we would probably want CSOLNP to more robustly find the correct solution. Does anyone know if NPSOL rescales the gradient similar to CSOLNP?
Log in or register to post comments
#3
if so, I can upload this to "models/failing/CSOLNPmtcars.R"
# models/failing/CSOLNPmtcars.R
# Not taking big enough steps?
library(OpenMx)
library(testthat)
# ===================
# = test with NPSOL =
# ===================
mxOption(NULL, "Default optimizer", "NPSOL")
# ====================================
# = make a simple independence model =
# ====================================
manifests = c("mpg", "disp", "gear")
m1 <- mxModel("ind", type = "RAM",
manifestVars = manifests,
mxPath(from = manifests, arrows = 2),
mxPath(from = "one", to = manifests),
mxData(mtcars[,manifests], type="raw")
)
# ======================================
# = get parameter estimated parameters =
# ======================================
m1 = mxRun(m1)
obtainedVars = summary(m1)$parameters[1:3,5]
expectedVars = as.numeric(diag(cov(mtcars[,manifests])))
mlexpected = expectedVars*(31/32) # n-1/n
testthat::expect_equal(obtainedVars[1], expected = mlexpected[1], tolerance = 0.01)
testthat::expect_equal(obtainedVars[2], expected = mlexpected[2], tolerance = 0.01)
testthat::expect_equal(obtainedVars[3], expected = mlexpected[3], tolerance = 0.01)
# ====================
# = switch to CSOLNP =
# ====================
mxOption(NULL, "Default optimizer", "CSOLNP")
manifests = c("mpg", "disp", "gear")
# re-build to set to crappy starts that are fine in NPSOL
m1 <- mxModel("ind", type = "RAM",
manifestVars = manifests,
mxPath(from = manifests, arrows = 2),
mxPath(from = "one", to = manifests),
mxData(mtcars[,manifests], type="raw")
)
m1 = mxRun(m1)
obtainedVars = summary(m1)$parameters[1:3,5]
expectedVars = as.numeric(diag(cov(mtcars[,manifests])))
mlexpected = expectedVars*(31/32) # n-1/n
testthat::expect_equal(obtainedVars[1], expected = mlexpected[1], tolerance = 0.01)
testthat::expect_equal(obtainedVars[2], expected = mlexpected[2], tolerance = 0.01)
testthat::expect_equal(obtainedVars[3], expected = mlexpected[3], tolerance = 0.01)
# e.g
# Error: obtainedVars[3] not equal to expectedVars[3]
# Mean relative difference: 34520.75
Log in or register to post comments
#4
E.g.,
omxCheckCloseEnough(growthCurveFit$output$estimate[["meani"]], 9.930, 0.01)
so a translation would be:
omxCheckCloseEnough(obtainedVars[1], mlexpected[1], epsilon = 0.01)
omxCheckCloseEnough(obtainedVars[2], mlexpected[2], epsilon = 0.05)
omxCheckCloseEnough(obtainedVars[3], mlexpected[3], epsilon = 0.01)
This passes with NPSOL but not with CSOLNP.
However, omxCheckCloseEnough() only works on absolute differences, whereas testthat() can do relative as well. I like the flexibility of testthat() though -- not sure if it's worth changing over to using it more broadly, a question for the developers generally.
Log in or register to post comments
#5
Log in or register to post comments
#6
tim:OpenMx tim$ svn commit trunk/models/failing/CSOLNPmtcars.R -m "Estimates are good in NPSOL but poor in CSOLNP - step size issue?"
Adding trunk/models/failing/CSOLNPmtcars.R
Transmitting file data .
Committed revision 3526.
Log in or register to post comments
#7
Really worthwhile.
Plus dovetails with devtools
Log in or register to post comments
#8
omxCheckIdentical # uses R's identical() function
omxCheckEquals # uses R's == function
omxCheckSetEquals # checks for equality of sets (order independent)
omxCheckTrue # similar to boolean test
omxCheckCloseEnough # approximate equality on absolute scale
omxCheckWithinPercentError # approximate equality on relative scale
omxCheckWarning # verify existence of warning and the right message is printed
omxCheckError # verify existence of error and the right message is printed
As far as I can tell we've had all of these functions for 3+ years.
Log in or register to post comments
#9
Log in or register to post comments