I am trying to parallelize OpenMx on a computing cluster at my university. I'm using Rmpi, and I keep getting the same error:
Error in { : task 18 failed - "job.num is at least 2."
Calls: %dopar% ->
Execution halted
mpirun has exited due to process rank 0 with PID 1077 on
node compute-0-11.local exiting improperly. There are two reasons this could occur:
-
this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination. -
this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
Googling led me to this website: https://github.com/snoweye/Rmpi_PROF/blob/master/R/Rparutilities.R. Evidently "job.num is at least 2" is given when mpi.comm.size(comm) - 1 < 2 in the function mpi.parLapply, which is called by the function omxLapply if Rmpi is loaded.
Does anyone know why this is happening? I've tried getting OpenMx to not parallelize on its own and I've tried using OpenMx to do the parallelization as opposed to another package, and neither works. What am I doing wrong?