You are here

Program gets stuck

10 posts / 0 new
Last post
dtofighi's picture
Offline
Joined: 10/12/2009 - 14:55
Program gets stuck
AttachmentSize
Binary Data My simulation file1.93 KB
Plain text icon sessionInfo file582 bytes

Hello,

I am running an 'embarrassingly parallel' simulation study using your excellent OpenMx software. The simulation study generates data, fits an unconstrained and constrained mediation model, where the constraint is a non-linear function of the model parameters. The problem is it appears that the program (optimizer) gets stuck-- when I check the CPU usage, it comes back down to a minimum usage as if the the program were terminated. To replicate the problem, I am attaching my code (with a parallel random generator seed) as well as sessionInfo() of the R environment on my computer. I would greatly appreciate your response.

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
works for me

I attached the output. I'm not sure, but maybe there are some bugs in how R manages processes under Windows.

File attachments: 
AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
I tried your script

I tried your script on my 32-bit Windows system with 4 logical CPUs, and my 64-bit Linux/GNU system with 2 logical CPUs. Your script used approximately the expected CPU load on both machines, but I saw no signs of progress even after waiting a while. So, I interrupted the script in RStudio and killed the processes it had spawned. I tried again with rep reduced from 200 to 20. The script ran very quickly on both systems.

How long should I expect it it take to run 200 replications, anyhow?

Under Windows, I'm running:

OpenMx version: 2.7.11 [GIT v2.7.11-dirty]
R version: R version 3.4.0 (2017-04-21)
Platform: i386-w64-mingw32
Default optimiser: CSOLNP

Under Linux/GNU:

OpenMx version: 2.7.11.59 [GIT v2.7.11-59-g5d1b3b3-dirty]
R version: R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu
Default optimiser: CSOLNP

If the issue is with OpenMx and not with doParallel, the only suggestions that come to my mind are to try using a different optimizer, or to provide each path coefficient with a nontrivial upper and lower bound with the lbound and ubound arguments to mxPath(). In fact, since your MxModel uses an MxConstraint, using SLSQP instead of CSOLNP is advisable in any event.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
I tried again

I tried again with 'rep' set to 200, and with mxOption(NULL,"Default optimizer","SLSQP") after OpenMx is loaded. The script runs OK under Linux, but seems to hang under Windows, with only one of the three child processes still using the CPU. I can't tell if it's "stuck," or if that process's share of models to fit are just taking a long time to optimize.

I would not be surprised if doParallel has some Windows-specific bugs, as Joshua suggested.

dtofighi's picture
Offline
Joined: 10/12/2009 - 14:55
re-runs

Hello All,

I would like to thank you for all your advice. I modified my code per your suggestions using SLSQP optimizer as well. I ran the code using both optimizers on Mac, W10, and Ubuntu. For W10, I used doSNOW instead of doParallel because it would allow me to use a progress bar in an interactive session. The results of all my runs were the same: the program did not stop when all the calculations were done. This can clearly be seen when running R on W10 because of the progress bar.

I attached my Windows 10 scripts as well as the ones for Mac. Mac and Ubuntu scripts were the same. I am also attaching all my sessionInfo files.

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
w10 only?

Just to clarify, does this failure mode appear on all platforms or only W10?

dtofighi's picture
Offline
Joined: 10/12/2009 - 14:55
It appears on all platforms.

It appears on all platforms.

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
works for me

I tried problemSLSQPMac.R on Debian Linux and it seems to run fine in less than 5 minutes. It creates output file resultsSLSQMac.csv with 200 lines plus a header line. I am not able to reproduce the hang.

dtofighi's picture
Offline
Joined: 10/12/2009 - 14:55
Thanks for all your comments.

Thanks for all your comments. With constrained optimization, in my simulation, I get different results between the two optimizer. On average, the results of CSLONP is much better than those from SLSQP.

Below, I am pasting the output from one of the the simulation runs when CSLONP got stuck:

[0] MxComputeGradientDescent: engine CSOLNP (ID 1) #P=7 gradient=central tol=6.3e-012 constraints=1
[0] resultForTT

[0] 0.005102
[0] resultForTT

[0] 0.000012
[0] resultForTT

[0] 0.000011
[0] resultForTT

[0] 0.000043
[0] MxComputeGradientDescent: engine CSOLNP done, iter=1288 inform=10
[0] MxComputeGradientDescent: engine CSOLNP (ID 1) #P=7 gradient=central tol=6.3e-012 constraints=1
[0] resultForTT

[0] 0.013429
[0] resultForTT

[0] -0.000013
[0] resultForTT

[0] 0.000625
[0] resultForTT

[0] -0.016426
[0] resultForTT

[0] -0.000001

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
a few more comments
Thanks for all your comments. With constrained optimization, in my simulation, I get different results between the two optimizer.
On average, the results of CSLONP is much better than those from SLSQP.

The different optimizers have their strengths and weaknesses, but I'm VERY surprised that you're getting better results with CSOLNP than with SLSQP. By what criteria are they better, and are you sure? Our testing has consistently indicated that SLSQP is the best of the three gradient-descent optimizers at handling nonlinear constraints. CSOLNP's biggest strength is with in analyses involving ordinal data, for which, relative to the other two, it can sometimes reach a lower fitfunction value and/or reach the minimum in fewer function evaluations.

[0] MxComputeGradientDescent: engine CSOLNP done, iter=1288 inform=10

Status code 10 means that the start values were infeasible. If the start values violate constraints, the optimizer is supposed to try to find a feasible point before beginning its algorithm in full swing. The fact that the attempt ended with status code 10 indicates either (1) that the optimizer was not able to find an initial feasible point, or (2) that the start values are completely outside the parameter space, i.e. the fitfunction evaluates to NaN or Inf or something like that at the start values.

Have you tried using NPSOL as the optimizer? It's a proprietary library, so you would need to install OpenMx from our repository, rather than from CRAN.

On a completely different topic...since you're doing a simulation involving an MxConstraint, you can probably speed up optimization of the constrained model if you provide an analytic Jacobian for the constraint function. Only NPSOL and SLSQP can use one. If you're interested, I could post some suggestions on how to modify your script for that purpose.