potential optimization problem
Posted on

Forums
There MAY be a problem with the optimization using some likelihood functions in OpenMx. Specifically, the gradient and the standard errors from the output slots in an MxModel are congruent with minimizing -log(L). The calculatedHessian, however, is congruent with minimizing -2log(L). The attached R code provides an illustration.
The potential problem involves which function (-log(L) or -2log(L)) NPSOL minimizes. If it is -log(L), then everything is fine with respect to optimization, but OpenMx is incorrectly calculating the Hessian. If it is -2log(L), however, things MAY get gnarly. The estimated gradient elements will be twice as big as they should be: i.e., dX/d(-2log(L)) = 2dX/d(-log(L)). One of the convergence criteria is the norm of the gradient. If g is the gradient then the norm equals t(g) %*% g. If the function minimized is -2log(L), then the norm of the gradient will be FOUR times its appropriate value. This could lead to convergence problems, particularly with ill conditioned problems.
Best,
Greg
I apologize for the delay in
Log in or register to post comments
opps! sorry about that.
Log in or register to post comments
I apologize for missing this
With the standard objective functions (FIML, ML, and RAM), we minimize the -2log(L). The hessian reflects this calculation, and standard error calculation takes this into account. As a result, the standard errors line up with what you expect, and the hessian is congruent with a calculation of -2log(L).
I think what you're arguing here is that if convergence is based on the gradient norm being smaller than some absolute epsilon, using -2log(L) will make the convergence check more conservative than using -log(L). More precisely, that to maintain the same convergence requirements for -log(L) and -2log(L), that the convergence criterion has be adjusted accordingly.
The real question, then, is whether or not we're choosing an appropriate optimality tolerance given that we're calculating the -2log(L) instead of the -log(L). This is one of the optimizer options we don't presently expose to the user, but we could if you think people would want to see and manipulate it.
So I guess I have two questions. First, is there a problem with the existing optimality tolerance? Second, would it be helpful to expose the parameters of the tolerance calculation to the user so that it can be adjusted if needed? In my experience, this is one of the parameters that's rarely adjusted, but that doesn't mean it wouldn't be useful.
Log in or register to post comments
In reply to I apologize for missing this by tbrick
Overall, it's a bit difficult
Log in or register to post comments
In reply to I apologize for missing this by tbrick
thanks for the response,
you are correct in surmising that minimizing -2logL is more conservative than minimizing -logL, but i am not certain that adjusting the optimality tolerance is the best solution. equations 8.1 and 8.2a in the NPSOL documentation show that the optimality tolerance influences both the convergence criteria (the difference in parameter values between one iteration and the next in 8.1) and the "optimality condition" in 8.2 that is a function of the norm of the gradient. one might simultaneously vary the function precision parameter, given that -2logL can take on large values when sample sizes are large.
with most problems in the behavioral sciences, there will be no substantive difference. some genetic problems, however, can have very large sample sizes and be poorly conditioned. here, i agree with mike that all options should be made available to the user.
a more radical suggestion is to permit the user to specify the function to minimize. many of us have found that minimizing -logL/-logL0 where -logL0 is minus the log likelihood of the initial set of parameter estimates can help solve some gnarly problems.
Log in or register to post comments
In reply to thanks for the response, by carey
I think we have this covered;
Log in or register to post comments
In reply to I think we have this covered; by neale
You're right, but this could
Also, Tim and I discovered yesterday that the algebra objective treats the user specified objective function as a -2LL to calculate standard errors. We'll discuss this in the meeting, but we might want to find a way to disable that while maintaining as much backwards compatability as possible.
Log in or register to post comments
In reply to I think we have this covered; by neale
many thanks, mike & ryne (1)
(1) always wondered what mxAlgebraObjective() was all about. suggest that you make that more specific in the documentation.
(2) just ran a problem with 90 parms and rescaling the function so that it is close to 1 reduced the time by one third.
greg
Log in or register to post comments
In reply to many thanks, mike & ryne (1) by carey
Interesting. For the record,
For even greater flexibility in specification of functions, mxRowObjective() is also available, but it needs a few helper functions to make it convenient to use for cases with NA's in the data frame.
Log in or register to post comments
In reply to Interesting. For the record, by neale
any documentation on
Log in or register to post comments
In reply to any documentation on by carey
The docs for mxRowObjective
As to why the functions are there:
All three of the self-specified objective functions (mxAlgebraObjective, mxRObjective, and mxRowObjective) are there to cover all the things that we haven't implemented directly in OpenMx. For example, the likelihood weighting you just did. Alternately, if you wanted to use a different objective function entirely (say, weighted least squares), you could implement that as an algebra, and do things the same way.
We also wanted to make a rapid prototyping sequence available for folks who wanted to quickly implement and test new objective functions to expand OpenMx or to develop new techniques for the field.
mxAlgebraObjective and mxRowObjective are for new objective functions that can be expressed algebraically using the OpenMx operators, either for a moment matrix calculation (like ML) or a row-by-row calculation (like FIML). Again, as of 1.0.3, mxRowObjective is still awaiting the completion of a few more functions to handle missing data before it can be used for full information methods. mxRObjective is there for new functions that can't be easily expressed by the OpenMx operators.
The hope is that methods developers will be able to use these to quickly build and test their own functions. The functions can then be implemented in C in the kernel to make then faster. We already have a few folks working on some methods now.
Log in or register to post comments
In reply to The docs for mxRowObjective by tbrick
muchas gracias, tim. that
that was very informative.
if you are checking on documentation (or can send this to someone who does that), i've noticed that the Code demos link and the demos() link on http://openmx.psyc.virginia.edu/docs/OpenMx/latest/_static/Rdoc/00Index.html are broken.
also, the documentation of an MxModel object (http://openmx.psyc.virginia.edu/docs/OpenMx/latest/_static/Rdoc/MxModel-class.html) does not include the "intervals" slot.
Log in or register to post comments
In reply to The docs for mxRowObjective by tbrick
The documentation is not
Log in or register to post comments
In reply to The documentation is not by mspiegel
Documentation for
Log in or register to post comments