You are here

Optimization issues - binary with low prevalence

14 posts / 0 new
Last post
iloo's picture
Offline
Joined: 05/26/2010 - 09:44
Optimization issues - binary with low prevalence
AttachmentSize
Binary Data OptimizationIssuesOpenMx.R2.82 KB

Hey,
I work with OpenMx using a bit different data than most others; often data comes from a full population and has quite many rows (up to 3 million). A common type of analysis is for relatives with one or more binary variables, e.g. observed disease diagnosis, where the prevalence is low, e.g. 1% to 0.05%. The complexity of the models vary from simple 2x2 covariance matrices without any definition variables to 8x8 covariance matrices with several definition variables adjusting the means/thresholds.

I've been doing this since OpenMx 1.X and has throughout encountered optimization issues. When I do the analyses myself I usually can handle these issues by varying starting values and changing some options for the optimizer, mainly in "mxOption( NULL , 'Line search tolerance' , .4 )", and compare resulting expected prevalences and covariances/correlations with non-modelled ones as well as looking at the likelihood values to ensure a global optimum (even when there's warnings from the optimizer). But I'm no computer scientist, and don't really know how to tweak the optimizer to handle my issues.

Now I'm teaching a lot of student who want to use similar data, and since I don't have a solution for making the optimization work "all the time" I'm asking for some help. I am hoping for some help in general ways to make the optimization work for this type of data, and possibly even more complex data.

Unfurtunately I can't share data because of etichal issues, but I attach some generated data with optimization issues. It's inded a very simple model, and I cannot see why this would be a problem with regards to optimization.

Also, the issue is not dependent on whether I use NPSOL or SLSQP, it may appear in both, or one but not the other (I usually vary this to ensure global maximum as well).

I'm currently running on a Linux server, software info where I have this issues below (but they are on other software/hardware as well)
R version 3.3.1 (2016-06-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux
OpenMx_2.6.9

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
ordinal

If all of your data is ordinal then you might consider using item factor analysis,

https://journal.r-project.org/archive/2016-1/pritikin-schmidt.pdf

iloo's picture
Offline
Joined: 05/26/2010 - 09:44
Clarification and way forward.

Hey All,
Thanks for quick feedback. Although, perhaps I wasn't clear in my original post. Clarifications:

  1. The attached code is just examplifying where my intuition about optimization fails me; the model is so easy that I can't see why it has problems.

  2. The solution is not for me, rather for me to use for the students who has had maybe one week course in classic twin modelling and want to use standard methodology for this type of data (i.e., many observations, >1million). The students aren't typically interested in what's going on under the hood, and since they often have a non-technical background they are mainly focused on the statistical theory - expected covariance matrices etc. If I can't help them with this, I will eventually have to double check their solutions anyway - not very helpful for them, and time consuming for me - and I don't know what simple hints to give them to solve the issues themselves. (I have multiple ways of checking that a solution is correct, including leaving OpenMx for R basic functions :( )

  3. Often at least one variable per individual is binary (i.e., two if pairs are analyzed), but there may also be continuous variables, or ordinal with many levels.

  4. I thought that the issue might have been with optimization well suited for smaller data, perhaps problems where "seeing" the likelihood as very flat around maximum in bigger data?

So, if I get your answer right, a potential way forward is to.

A. If non-missingness in data, use wls.
B. Use mxTryHardOrdinal() rather than mxTryHard() if only ordinal.
C. Check if CSOLNP helps.

Finally, if I get you right Mike, the issue is most likely with imprecision in numerical integration due to time-saving? If so, it should be possible to increase this precision? How to do that? (Most students have access to server so they can use multiple [~10] cores if needed).

Again, thanks for input.

neale's picture
Offline
Joined: 07/31/2009 - 15:14
We tried

We have increased the numerical precision of the integration until it really began to climb the quadratic (or exponentially steeper) curve of cpu time against precision. It didn't seem to help a whole lot. We are looking at other numerical integration routines. In your case, the problem is especially difficult because the probabilities can be very small (which means that during optimization a small change in parameters may generate very large changes in the fit function in the smaller probability direction, but much less change in the other direction).

Your ABC's look right, except that A may be a larger category than you think. If the missingness is completely at random (MCAR - see https://en.wikipedia.org/wiki/Missing_data), WLS is still fine.

neale's picture
Offline
Joined: 07/31/2009 - 15:14
Integration precision

We make life difficult for the optimizer with FIML ordinal data analysis, because the numerical integration required is somewhat inaccurate. Full accuracy would take ages to evaluate. As a result we are walking a tightrope between a good enough integration precision for optimization to work a reasonable proportion of the time. The function mxTryHard() can be useful to validate solutions. Nevertheless, I feel your pain and you should know that the development team have been working directly on this problem for some time. The CSOLNP optimizer seems to work better than the others in certain test cases, but not all. We have worked with altering numerical precision of the integration and are preparing a publication on those results. However, the false positive rate of "code red" or IFAIL=6 is still too high. In my opinion, rather like pain and inflammation it is better to have the alarm system tuned to many more false positives than false negatives (which carry greater risk of publishing incorrect results).

Another approach you might try is to use WLS, which is not widely touted in the documentation yet, but which can work very efficiently. There are limitations -- it is known to be biased when data are missing at random (MAR) and one does need a decent sample size. In addition it isn't straightforward to specify moderator models such as those that moderate paths from genotype to phenotype (multi group approaches can be used as a partial solution). You can find a few WLS examples in the inst/models/passing directory of the repository, i.e., https://github.com/OpenMx/OpenMx/tree/master/inst/models/passing -- those with WLS in their name are ContinuousOnlyWLSTest.R IntroSEM-OneFactorCov.R MultipleGroupWLS.R and SaturatedWLSTest.R. I hope that this helps - do let us know how you get on!

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
mxTryHardOrdinal()

I notice that you have a line commented out in your attached script, #simpModFit <- mxTryHard( simpMod , intervals=F ). There is a wrapper to mxTryHard() called mxTryHardOrdinal(), which has its default arguments specifically oriented toward analyses of ordinal data.

As you probably know, mxTryHard() randomly perturbs start values between attempted model fits. Your script is parameterized in terms of tetrachoric correlations and thresholds, so randomly perturbed start values are likely to cause the tetrachoric correlation matrix to be non-positive-definite at the start of a fit attempt, or even to have off-diagonals outside the interval (-1,1). You could place lbounds of zero and ubounds of 0.99 on the tetrachoric correlations. I think the lower bound of zero is reasonable for your purposes, since it sounds like your tetrachoric correlations are interpretable as familial resemblance for disease risk.

Better yet, re-parameterize the correlation matrix so that the relevant parameters can take values on the whole real line, and use an MxConstraint to ensure a unit diagonal. Something like this:

### Very simple model
simpMod <- mxModel(
    'SimpMod',
    ### Expected covariance matrix (variance==1 due to binary variables)
    mxMatrix(type="Lower",nrow=4,free=T,values=c(1,0,0,0,1,0,0,1,0,1),
                     name="L"),
    mxMatrix(type="Unit",nrow=4,ncol=1,name="ONE"),
    mxAlgebra(L%*%t(L),name="eC"),
    mxConstraint(diag2vec(eC) == ONE),
    ### Expected Means 
    mxMatrix(type='Full',nrow=1,ncol=4,free=F,values=0,name='eM'),
    ### Expected thresholds
    mxMatrix(type='Full',nrow=1,ncol=4,free=T,values=qnorm(colMeans(1*(datS==1)),lower.tail=F),name='eT'),#,labels=c("t1","t2","t3","t4")),
    ### Data and likelihood
    mxModel(
        'Datmod',
        mxData(datS , type='raw'),
        mxExpectationNormal(means='SimpMod.eM',covariance='SimpMod.eC',thresholds='SimpMod.eT',threshnames=varNames,dimnames=varNames),
        mxFitFunctionML()
    ),
    ### Add the likelihoods (only one)
    mxFitFunctionMultigroup( c('Datmod.fitfunction') )
)
simpModFit <- mxTryHardOrdinal( simpMod  )

I tried this on my Linux laptop just now with SLSQP, and it was able to achieve a fitfunction value of 61877.31, with no status warning, and the MxConstraint satisfied well within feasibility tolerance. NPSOL did not fare so well--its best solution had Status Red, and was not even in the feasible space.

Edit: Actually, include this in the MxModel as well:

mxMatrix(type="Zero",nrow=4,ncol=4,name="ZERO"),
mxConstraint(eC > ZERO),
AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
start values

Start values from best fit:
1, 0.748939348388329, 0.475524572263707, 0.147689759804613, 0.664647948772721, -0.324824935293478, 0.163502258576246, 0.819050224907451, 0.970724573486396, 0.0972730693883954, 3.08196654269581, 3.21380410569994, 3.08578293382576, 3.19111179271254

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
warning: effortpost ahead

iloo, thank you for starting this thread. And yes, your post was clear. My prior replies in this thread were only meant to illustrate how you could make an MxModel more amenable to randomized start values with mxTryHardOrdinal(). I feel like this thread hasn't been resolved satisfactorily, so I wanted to address that.

The example model you posted is indeed quite a simple model, but it poses a surprisingly difficult optimization problem. Since November, I've been trying a number of strategies to get your model to reach a solution meeting criteria for a local minimum. I have concluded that the difficulty is likely an algorithmic limitation. Specifically, it appears to be inherently difficult to use finite-differences gradient-based optimization with likelihood functions that are defined in terms of the multivariate-normal (and multivariate-t!) probability integral, applied to low-prevalence dichotomous variables.

There is numerical inaccuracy, i.e. systematic error, in the algorithms that evaluate the probability integral, and the magnitude of the error, at least relative to the magnitude of the probability, is larger in the tails of the distribution. It's possible for the error to overwhelm the reliable component when taking derivatives by finite-differences.

So, here's what I've tried, to no avail...

  • I tried systematically varying mxOption "Gradient step size" with CSOLNP and SLSQP, and mxOption "Function precision" with NPSOL, and attempting to fit the model, via mxTryHardOrdinal(), at each different value of the option.
  • To put the free parameters on the same scale, I defined the tetrachorics in terms of a logistic mapping from (-Inf,Inf) to (0,1) (again, I think constraining the correlations to be non-negative is reasonable for this application). Perhaps not surprisingly, doing this resulted in difficulty keeping the correlation matrix positive-definite.
  • I wrote an MxFitFunctionR to fit the model using two different alternative algorithms for the probability integral, namely, the "Miwa" and "Genz-Bretz" algorithms, via pmvnorm() from the 'mvtnorm' package. Genz-Bretz is a stochastic algorithm, and allows the user to specify an upper limit ('abseps') of the estimated numerical error of the probability. At its default abseps, Genz-Bretz performed no better than OpenMx's probability-integral algorithm, SADMVN; at an abseps better suited to this problem, Genz-Bretz would fail to satisfy it. The Miwa algorithm, even at a large value of its 'steps' parameter, fared no better (and just got slower).
  • Using an R frontend to SADMVN from the 'mnormt' package, I wrote an MxFitFunctionR, and compared its fit values to those of OpenMx's internal ML fitfunction. The fit values matched at the start values, but diverged during optimization (perhaps because floating-point addition isn't associative?).
  • I used an R function similar to that MxFitFunctionR to fit the model, using optimizers outside OpenMx. I also wrote an R function to, in effect, serve as a wrapper to OpenMx's ML fitfunction. The derivative-based optimizers I tried--the "BFGS" method of optim() (from the base-R 'stats' package), SLSQP (via package 'nloptr'), and SOLNP (via package 'Rsolnp')--had as much difficulty as OpenMx. I also tried derivative-free optimizers. Two such optimizers--method "SANN" and "Nelder-Mead" from optim()--did not reach solutions with positive-definite Hessians, and a third--BOBYQA, from nloptr--reached a minimum not low enough. I do note, however, that Nelder-Mead could reach a lower fitfunction value with one restart than OpenMx typically reaches after 11 attempts of mxTryHard().
  • SADMVN returns an upper bound on the absolute error in its probability result. I used this bound in another MxFitFunctionR to round each log-probability (loglikelihood) to a given number of reliable significant figures, using a common heuristic based on the number of significant figures in the argument to log(). This didn't help. I also wrote a similar R function that provided the fit value in the same manner, and also, when calculating gradients, tried to use the smallest possible 'abseps' that SADMVN could meet, and the smallest possible gradient interval that produced a nonzero gradient element. Since MxFitFunctionR doesn't support user-supplied derivatives, I tried optimizing this function with BFGS and with SLSQP (via the 'nloptr' package). The solutions did not appear to be a local minimum.
  • I wrote yet another R function using SADMVN that calculates the gradient by (1) evaluating the fitfunction at six points around the current value of the free parameter in question, (2) fitting, by least-squares, a quadratic polynomial to the total of seven points, and (3) using the first derivative of that polynomial as a gradient element. BFGS and SLSQP still couldn't reach something that looked like a minimum.
  • I reasoned that, since the multivariate-normal integral is being evaluated in the tails of the distribution, perhaps it would be easier to use a distribution that has more weight in its tails. I wrote an MxFitFunctionR to fit the model using the multivariate Student's t distribution, instead of the multivariate normal, using an R frontend to SADMVT from the 'mnormnt' package. The degrees-of-freedom of the multivariate t had to be fixed, in order to fix the scale of the latent continua. Even at values implying a good deal of kurtosis (between 4 and 10), this approach didn't really help matters.

From these efforts, I conclude that the issue is unlikely to be a shortcoming in OpenMx's gradient-descent optimizers, its finite-difference gradient evaluation, or its multivariate-normal integration algorithm. Instead, the numerical error in evaluating the multivariate-normal probability integral out in the distribution's tails is sufficiently large that finite-difference differentiation is unreliable. Thus, for this model, OpenMx will never be sure it has found a local minimum. This issue primarily underscores a need for OpenMx to incorporate derivative-free optimizers. It would also be nice to have a different integration algorithm, either being faster (to facilitate brute-force with mxTryHardOrdinal()), or obviously, being more accurate--if such exists?

For workarounds, I suggest the following:
1. Use WLS. I believe the only change you would need to make would be to replace mxData() with mxDataWLS(), and replace mxFitFunctionML() with mxFitFunctionWLS(). For a saturated model like this one, it should give results very close to what you get from the 'polycor' package.
2. If you use ML, set up your model to be amenable to randomized start values. For instance, incorporate precautions to keep the correlation matrix positive-definite, and to keep the thresholds in order. If that involves any MxConstraints, use SLSQP as the optimizer. Then, brute-force the problem with mxTryHardOrdinal(), with argument 'extraTries' anywhere from 100 to 500, and let it run overnight. Accept the best result as the MLE, but ignore the standard errrors.
3. Radically re-parameterize your model. For example, I've attached a script that uses an MxFitFunctionR to model the four binary disease variables as correlated Bernoulli trials, specifically,

p(x1,x2,x3,x4) = p(x1)p(x2 | x1)p(x3 | x2,x1)p(x4 | x3,x2,x1)

via probit regression. This approach could be extended to include interactions among the conditioned-upon variables, and other covariates, in the probit regressions. It could probably also be implemented using an MxFitFunctionRow. It could probably also be adapted to handle missing data, by creating a different group for each missing-data pattern and re-defining the fitfunction accordingly, but multiple imputation might be a better way to deal with missing data with this approach.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
P.S. the attachment

P.S. the attachment

File attachments: 
iloo's picture
Offline
Joined: 05/26/2010 - 09:44
Thanks

Hey,
Thank a lot for all the effort you've put in!

  1. I have run with WLS, which works very well with regards to finding the fit of model given parameters :) ! Although the non-ML model-fitting makes precision of derived estimates (functions of model parameters) a bit trickier to get. I've solved this by finding analytical standard errors using the delta method... But a numerical solution would be nice, i.e. the numerical standard errors for mxAlgebra-objects - does such solution exist?

  2. Sounds reasonable, similar to my previous solutions, but using better approach (mxTryHardOrdinal) and more tries.

  3. I will look at you code to see if I understand it :).

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
WLS
1. I have run with WLS, which works very well with regards to finding the fit of model given parameters :) ! Although the non-ML model-fitting makes precision of derived estimates (functions of model parameters) a bit trickier to get. I've solved this by finding analytical standard errors using the delta method... But a numerical solution would be nice, i.e. the numerical standard errors for mxAlgebra-objects - does such solution exist?

WLS will be more completely implemented in version 2.7. v2.7 will also include mxSE(), which can calculate SEs for arbitrary elements of MxAlgebras. We expect to announce the release of v2.7 very soon.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
Nelder-Mead implemented in OpenMx

I just wanted to post a follow-up to say that, as of v2.7.9, OpenMx has a built-in derivative-free optimizer. It's a flexible, options-rich implementation of the Nelder-Mead algorithm that I wrote from scratch in the spring. Unfortunately, it hasn't turned out to be a magic bullet that makes optimization with ordinal data work "all the time," but it is sometimes able to reach solutions with status code zero in cases where none of the three gradient-descent optimizers are able to do so (as in one script in our nightly test suite).

Using our Nelder-Mead optimizer requires a custom compute plan, for instance,

plan <- omxDefaultComputePlan()
plan$steps$GD <- mxComputeNelderMead()

(with non-default arguments to mxComputeNelderMead() if wanted), and include plan in your mxModel() statement.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
follow-up

Just a follow-up... Since my previous post, I've gained deeper insight into the Nelder-Mead algorithm as well as this optimization problem. I'm attaching a script that first fits the model using CSOLNP, plus mxTryHardOrdinal() with 30 extra tries, and a smaller-than-default scale parameter for the distribution of the random perturbations. Then, I tried again, but this time using a deliberately thought-out MxComputeNelderMead step in place of gradient-based optimization; the RNG seed and the arguments to mxTryHardOrdinal() were the same as with CSOLNP. Nelder-Mead does indeed reach a smaller fitfunction value than CSOLNP (60879.58 vs. 61179.33), but nonetheless cannot overcome status red.

Edit:

> mxVersion()
OpenMx version: 2.12.2.346 [GIT v2.12.2-346-gbeb41824-dirty]
R version: R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu
Default optimizer: CSOLNP
NPSOL-enabled?: Yes
OpenMP-enabled?: Yes
AdminNeale's picture
Offline
Joined: 03/01/2013 - 14:09
Integration precision

I would guess that the main issue here is numerical precision of integration. Even bivariate normal way out in the tails may be nearly impossible to estimate accurately enough to overcome code Red. Even simple ML estimation of the polychoric would be difficult, I think. I suppose we could increase the prevalence in simulation and see if it works any better.