Optimizers w/ numerical vs. analytical derivatives

Posted on
No user picture. falkcarl Joined: 10/29/2015
I'm experimenting with some novel item models for use with OpenMx's item factor analysis capabilities. I was wondering what optimization methods are currently available (e.g., for the M-step when doing Bock-Aitkin EM), and which one(s) can make use of analytical derivatives (which tend to be faster, at least in my experience). I've previously used my own hand-rolled Newton-Raphson code and discovered that N-R has trouble with some of these item models and usually requires reducing the step size or some backtracking, and sometimes ridging the hessian when early in the E-M cycles. I've been looking at mxComputeNewtonRaphson, but don't see in the documentation whether it just tries to take the full step each time or does anything else.

I also see mxComputeGradientDescent, which optionally uses the analytical gradient?

Did I miss any other optimization options?

Many thanks,
Carl

Replied on Sat, 11/14/2015 - 06:44
Picture of user. jpritikin Joined: 05/23/2012

> Did I miss any other optimization options?

No, I don't think so.

> mxComputeGradientDescent, which optionally uses the analytical gradient?

Yeah, it can use an analytic gradient. The use of an analytic gradient is implemented for NPSOL and could be implemented for SLSQP very easily.

> discovered that N-R has trouble with some of these item models and usually requires reducing the step size or some backtracking, and sometimes ridging the hessian when early in the E-M cycles. I've been looking at mxComputeNewtonRaphson, but don't see in the documentation whether it just tries to take the full step each time or does anything else.

I wrote the N-R optimizer. It includes some heuristics, but I'm not sure whether it will perform adequately on your model. Certainly we can add whatever heuristics you need. You can enable some diagnostics by passing verbose=2 or verbose=3. I recommend you try it out. If you can find a reproducible example of poor performance and suggest how to fix it then I can investigate.

Replied on Sat, 11/14/2015 - 23:32
No user picture. falkcarl Joined: 10/29/2015

In reply to by jpritikin

Thank you! So far I have it working with SLSQP (with some Gaussian priors for stabilizing some parameters) and getting close to replicating my own code - all parameter estimates are close to within .001 or so, except for one or two that just happen to be unstable for the example data I generated. -2LL seems to be off a bit though - does OpenMx somehow include the effect of priors? I'll be in touch again when I try analytical derivatives and may post code if I run into any difficulties.

Replied on Tue, 11/17/2015 - 15:24
No user picture. falkcarl Joined: 10/29/2015

In reply to by jpritikin

What is sometimes done is the Bayesian estimates are substituted into the -2LL (not including the prior) as a way of comparing the Bayesian estimates to the ML estimates. e.g., see if -2LL changes much with the Bayesian estimates (which it often doesn't). For example, see bottom of p. 190 of the below reference:

Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51(2), 177-195.

I believe certain other software programs do this when calculating -2LL (and can provide an example if requested), though I cannot say for sure whether it is conventional across all other programs.

P.S. in rpf.rparam() when using the dichotomous response model and a 1-dimensional model,
such as:

rpf.param(rpf.drm(1))

should the second element returned be a "c" instead of "b"? I ask b/c documentation lists slope/intercept parameterization and uses "c", but the output is labeled as a "b".

P.P.S. Do you want future questions regarding IFA in OpenMx to be posted to this main forum, or Categorical Outcomes, or...?

Replied on Tue, 11/17/2015 - 15:43
Picture of user. jpritikin Joined: 05/23/2012

In reply to by falkcarl

> I believe certain other software programs do this when calculating -2LL (and can provide an example if requested), though I cannot say for sure whether it is conventional across all other programs.

Hm .. well, to return separate values for the likelihood and prior, we'd have to distinguish them in the backend. We don't do that now. With respect to estimation, we just optimize the whole posterior. If you set up a multigroup model with the prior in one group and the likelihood in the other then it is pretty easy to pull out the separate values. There is an example in OpenMx/inst/models/passing/fm-example2-2.R but I admit that it is a bit cluttered and not easy to read.

> should the second element returned be a "c" instead of "b"? I ask b/c documentation lists slope/intercept parameterization and uses "c", but the output is labeled as a "b".

Yeah, the parameter labels do not match convention. I guess I'll accept a patch to fix the label. All the current models use slope/intercept parameterization because it is more consistent between single and multidimensional models.

> Do you want future questions regarding IFA in OpenMx to be posted to this main forum, or Categorical Outcomes, or...?

It doesn't really matter. We don't have enough traffic to make it inconvenient to monitor all the forums together.

Replied on Wed, 11/18/2015 - 09:37
No user picture. falkcarl Joined: 10/29/2015

In reply to by jpritikin

Yeah, as long as I at least know what OpenMx is doing, that's perfectly fine. I'm sure there are ways or tricks to calculating -2LL w/o including the priors. Thanks again!