So I'm following up on some of my factor score work and happened across and old question I had regarding holder or parent models for embarassingly many submodels.
I'm trying to run the same model on 500 (or some other large n) datasets with the independence flag set to TRUE. I'm doing this by creating one model per dataset, building a list of mxModels, then putting this list into a single mxModel for optimization. However, I'm spending most of my time the fourth line of this code:
singScore <- transformFactorScores(spRes, 1, "mu", "sigma", "epsilon") singScore@independent <- TRUE singSM <- replicate(un, singScore) singParScore <- mxModel("Singletons", singSM) singRes <- mxRun(singParScore)
Lines 1-3 combine for less than .40 seconds, and the optimization of all 500 models (line 5) takes 40 or 50 seconds. However, the fourth line takes 3 minutes. However, if I split the fourth and fifth lines into 5 separate models of 100 submodels each, total time drops to 70 seconds for everything, and 60ish seconds for 10 models with 50 submodels each.
I believe this has to do with how S4 builds objects and interacts with apply statements. When I tell mxModel to add these 500 submodels, my understanding of S4 is that they are added one at a time, essentially creating holder with 1 submodel, then holder with 2, etc.
My questions are:
-is there a better way to benefit from parallelism for this approach?
-is there a smart way to determine exactly how to balance this S4 slowdown with paralellism?
-here's the feature request: is it a worthwide endeavor to allow mxRun to optimize and return a list of models? is an mxList worth the function crawl?