You are here

Alternative estimate of the Hessian

Per some recent dialog with Greg Carey, I suggest that we alter the OpenMx Hessian estimation by default. Rather than use the estimates from NPSOL to compute the standard errors, we should use the numDeriv package.

The attached R script, when run, shows that NPSOL's Hessian* is incorrect when the model is underidentified (more free parameters than statistics). As a result a user could be misled into thinking that a model is identified when it is not. A better picture emerges when numDeriv is used.

Exactly how to implement/integrate numDeriv() is not clear to me. The example sets up the fit function directly in R code; we would want to use the built-in code for this purpose, so as to take advantage of computational efficiency. Basically, one supplies the function to evaluate, and a vector of parameter estimates at which the Hessian is required.

I'm not sure that it will work with constraints, that should be investigated.

*The Hessian is the matrix of second partial derivatives of the fit function with respect to the parameters, and is the inverse of the covariance matrix of the parameters.

AttachmentSize
Binary Data omxTest2.R2.34 KB
Reporter: 
Created: 
Tue, 12/08/2009 - 11:03
Updated: 
Wed, 05/07/2014 - 11:03

Comments

Mike,

(a) Is there any way, to flag an underidentified model as such? Perhaps using some heruristic? Will not work in all cases, but may catch most culprits!?
(b) I thought NPSOL Hessian will not be PD if the model is underidentified. Why is that not the case?

paras

Related to Paras' second point, why would we care about standard errors for an underidentified model? By definition, an underidentified model doesn't have a unique solution, so I'm not sure I understand the utility of the confidence region of any one possible set of parameter values.

the "hessian" from NPSOL is actually not a true hessian. it is an approximation to a hessian matrix that is deliberately constructed to be positive definite (technically, a Broydon-Fletcher-Goldfarb-Shanno, ? sp, or BFGS update). hence, in unidentified models, save for numerical error, it will be positive definite.

there are two criteria for testing whether a solution is at a local minimum: (1) all gradient elements are close to 0; (2) the hessian--either analytic or numerically estimated--is positive definite. one cannot trust the hessian from NPSOL to examine condition (2). i suspect that it would suffice when a model is truly identified, but it is definitely inappropriate in at least some cases of unidentified models.

to assess convergence, i use a routine that examines OpenMx output for (a) elements in which the gradient is exactly 0 (useful to isolate a parameter that does not contribute to the function value); (2) the maximum absolute value of the gradient elements; (3) the number of gradient elements with an absolute value greater than .001; and (4), hopefully if it is possible to get a numerical estimate of the hessian, a check on whether it is positive definite.

greg

True, we don't really care about errors for an underidentified model. They would best be returned as NA's. In its present form, however, OpenMx will return incorrect errors in such an instance. For example, adding a second factor to the homepage model:

require(OpenMx)
data(demoOneFactor)
manifests <- names(demoOneFactor)
latents <- c("G","G2")
factorModel <- mxModel("One Factor", type="RAM",
manifestVars = manifests,
latentVars = latents,
mxPath(from=latents, to=manifests, all=TRUE),
mxPath(from=manifests, arrows=2),
mxPath(from=latents, arrows=2,
free=F, values=1.0),
mxData(cov(demoOneFactor), type="cov",
numObs=500))
summary(mxRun(factorModel))

gives us:

name matrix row col Estimate Std.Error
1 A x1 G 0.38992864 0.226214872
2 A x2 G 0.49334864 0.278522409
3 A x3 G 0.56885880 0.338047426
4 A x4 G 0.68604328 0.381755391
5 A x5 G 1.01690047 2.978002493
6 A x1 G2 -0.07950142 1.112480039
7 A x2 G2 -0.09795222 1.408279928
8 A x3 G2 -0.11894525 1.622112067
9 A x4 G2 -0.13411732 1.958912336
10 A x5 G2 1.01822349 2.548480441
11 S x1 x1 0.04018047 0.002068288
12 S x2 x2 0.03870883 0.002239603
13 S x3 x3 0.03628890 0.002446600
14 S x4 x4 0.04463908 0.003227481
15 S x5 x5 -1.40056041 0.869262743

Clearly the second factor is struggling, and different starting values might yield a different solution. But it would be better if underidentification were automatically detected and flagged in the Summary, rather than giving the user a false sense of security.

OpenMx currently uses a numeric Hessian for standard error calculation. NPSOL's Hessian is also returned in @output, but it is not used. I'm not sure why this was still open, so I'm closing it.

NPSOL returns a reduced Hessian when it is able - effectively fixing some of the parameters to resolve under identification. This may result in warnings (code 6, e.g.). It probably returns other info to indicate that something like this has happened, and it would be good to let the user know if there seems to be an under identification issue. Exact derivatives would obviously help, but we have no solution for the general arbitrary algebra case.