Parallelization of the objective function for definition variables

Posted on
No user picture. MaximilianStefan Joined: 12/04/2018
We are currently trying to benchmark our SEM software against OpenMx for a SEM with unique definition variables per person.
With ML estimation, because there is a unique model-implied covariance matrix per person, which has to be inverted per person, I assumed that parallelizing the objective function should improve performance drastically. However, changing the number of threads does not change the performance - so I assume I probably did something wrong. I followed this wiki pages instructions:
https://openmx.ssri.psu.edu/wiki/speed-parallel-running-and-efficiency


omxDetectCores() # returns 8
getOption('mxOptions')$"Number of Threads" # returns 2
mxOption(model= yourModel, key="Number of Threads", value= (omxDetectCores() - 1)) #does not change time to fit the model, regardless of the value I pass

A minimal working example is attached.

Also, the same wiki page says

Streamlining estimation
If you have a large complex model, it may take hours to run by default. You can often speed up your model by turning off computation of elements that are only needed on a final evaluation, like the Hessian and standard errors."

How can I do that?

We are also thankfull for any further suggestions on how to improve the performance.

Best,
Maximilian

Replied on Wed, 01/20/2021 - 08:39
No user picture. Leo Joined: 01/09/2020

Just some trouble shooting:
- does parallelization, in general, work? try umx_check_parallel in the umx package
- if not: try it using other operating systems, i.e. linux or mac os
Replied on Wed, 01/20/2021 - 09:23
Picture of user. jpritikin Joined: 05/23/2012

Did you try running your model with

mxOption(key="Parallel diagnostics", value="Yes")

?

Replied on Wed, 01/20/2021 - 10:47
Picture of user. AdminRobK Joined: 01/24/2014

What is your `mxVersion()` output? Specifically, are you running your script under MS Windows? Neither CRAN nor we build multithreaded Windows binaries of OpenMx.

Also, this line:

mxOption(model= growthCurveModel, key="Number of Threads",
value= (omxDetectCores() - 1))
#does not change time to fit the model,
#regardless of the value I pass

If you provide an MxModel object for argument `model`, then `mxOption()` returns an MxModel object with the appropriate option set or cleared, as the case may be (see the man page for `mxOption()`). Your script needs to store the output of that line's call to `mxOption()` in an object. Or even more simply, just set the option globally by providing `NULL` for argument `model`.

Replied on Thu, 01/21/2021 - 10:06
No user picture. Leo Joined: 01/09/2020

I did not realize that as I mostly use Linux. Is there a way to get multi-threaded performance on Windows? I only found this thread:
https://openmx.ssri.psu.edu/wiki/speed-parallel-running-and-efficiency
Replied on Thu, 01/21/2021 - 10:22
Picture of user. jpritikin Joined: 05/23/2012

> Is there a way to get multi-threaded performance on Windows?

Not yet. We're waiting for [gcc](https://gcc.gnu.org/) support.

Replied on Fri, 01/22/2021 - 11:40
No user picture. MaximilianStefan Joined: 12/04/2018

Thanks for the very fast and helpful suggestions. I am using Windows, so this should explain why it is not working - I will try it on Linux next week and ask again if I still can't get it to work. (Also the comment about semantics by AdminRobK is helpful; I somehow just assumed mxOption changes the option in place for the model passed)

Regarding what is written about the hessian and standard error computation: Does this just refer to setting the mxOptions "Calculate Hessian" and "Standard Errors" or is there more to it?

We really would like to get the best performance OpenMx can do, but we are by no means experienced users, so if somebody has further ideas which options to try to speed up OpenMx (for this kind of models or in general), we would be thankful.
I thought setting "RAM Max Depth" to 1 could help in this case (because our longest directed path is of length one), but it drastically decreased performance. Maybe this is because all parameters in the A matrix are either fixed or definition variables?
In terms of optimizer choice we are just going to try the tree options, but in terms of further optimizer settings we don't know if we could take an educated guess about what could improve performance.

Best,

Maximilian

Replied on Fri, 01/22/2021 - 13:14
Picture of user. jpritikin Joined: 05/23/2012

> I thought setting "RAM Max Depth" to 1 could help in this case (because our longest directed path is of length one), but it drastically decreased performance.

The default setting should give you optimal performance. The main reason you might reduce "RAM Max Depth" is if you knew a priori that you had cycles of regressions.

> Maybe this is because all parameters in the A matrix are either fixed or definition variables?

The only thing that matters is whether A matrix entries are zero or not. It doesn't matter why they are non-zero.

Replied on Fri, 01/22/2021 - 15:57
Picture of user. AdminRobK Joined: 01/24/2014

(Also the comment about semantics by AdminRobK is helpful; I somehow just assumed mxOption changes the option in place for the model passed)

Unfortunately, R is just not designed for that kind of "in-place" modification of a user-created object. The user can, however, modify the R workspace's options list without use of the assignment operator. That's what happens if you provide `NULL` for argument `model` in a call to `mxOption()`, and that's how I usually set mxOptions.

Regarding what is written about the hessian and standard error computation: Does this just refer to setting the mxOptions "Calculate Hessian" and "Standard Errors" or is there more to it?

If I understand the question, then no, setting those two options is all there is to it.

In terms of optimizer choice we are just going to try the tree options, but in terms of further optimizer settings we don't know if we could take an educated guess about what could improve performance.

I don't have any suggestions for tuning optimizer settings other than what's already been discussed in this thread. I will remark that, in my experience, whenever the 3 gradient-based optimizers all converge to the same solution, NPSOL does so with fewer objective-function evaluations.

Is there a way to get multi-threaded performance on Windows?

If your Windows system is configured to build R packages from source, you could modify src/Makevars.win.in in your clone of the OpenMx source repository in order to enable OpenMP. With the correct configuration, you will successfully install an OpenMP-enabled Windows build of OpenMx. However, it will be nearly useless to you: trying to do anything multithreaded with it will eventually crash R. Something like 5 years ago, I tried to figure out what why it crashes. I stepped through some multithreaded loop code in the GNU Debugger, and watched as the code overran the bounds of an array before my very eyes. That was when the R developer toolchain for Windows used gcc 4.9; at the time, we assumed the crashing was a compiler bug. But with the release of R 4.0, the toolchain adopted gcc 8. Unfortunately, it is still not possible to compile a multithreaded Windows build of OpenMx that is actually thread-safe (I tried over the summer).

Edit: the current toolchain uses gcc 8, not 9.