Error in runHelper... xn out of range

Posted on Wed, 07/15/2020 - 01:39

khusmann Joined: 08/11/2019

Forums

Hello,

I'm running a relatively complex model with binary outcomes (but well under the max 20 ordinal vars). I'm fixing mean=0, var=1 and letting thresholds vary. Things periodically fail with:

error in runHelper(model, frontendStart, intervals, silent, suppressWarnings, : xn out of range

Starting values can seem to cause this (even when mxTryHardOrdinal is jiggling around values close to the optimum some tries will fail out with this), also increasing complexity (doing a multiple groups analysis), also sometimes decreasing mvnRelEps will cause this.

Any ideas of what might be going on or how to debug this further?

Thanks for the help!

Replied on Wed, 07/15/2020 - 08:32

jpritikin Joined: May 23, 2012

bad thresholds

This happens when the thresholds are too far away from the mean. Do you have any binary indicators which have an extreme proportion of responses (almost all 1 or 0)?

Log in or register to post comments

Replied on Wed, 07/15/2020 - 08:58

jpritikin Joined: May 23, 2012

better diag

I made some changes to improve the diagnostic output. Can you build from source 6bf6d3a8587 and try again?

Log in or register to post comments

Replied on Wed, 07/15/2020 - 13:56

khusmann Joined: Aug 11, 2019

build error

When I install_github I get this build error:

** libs make: *** No rule to make target 'omxSymbolTable.h', needed by 'Compute.o'. Stop.

I checked out the repo manually and did a "make cran-install" and it seems to be compiling now -- is there a way to do a cran-install from install_github?

Log in or register to post comments

Replied on Wed, 07/15/2020 - 14:14

khusmann Joined: Aug 11, 2019

diagnostic output

Here's the output I get:
[0] xn = matrix(c( # 2x1 -inf , nan), byrow=TRUE, nrow=2, ncol=1) [1] xn = matrix(c( # 2x1 nan , inf), byrow=TRUE, nrow=2, ncol=1) [1] lower = matrix(c( # 5x1 nan , -inf , -inf , -inf , nan), byrow=TRUE, nrow=5, ncol=1) [0] lower = matrix(c( # 5x1 -inf , -inf , -inf , -inf , -inf), byrow=TRUE, nrow=5, ncol=1) [1] upper = matrix(c( # 5x1 inf , nan , nan , nan , inf), byrow=TRUE, nrow=5, ncol=1) [0] upper = matrix(c( # 5x1 nan , nan , nan , nan , nan), byrow=TRUE, nrow=5, ncol=1)

Log in or register to post comments

Replied on Wed, 07/15/2020 - 15:41

jpritikin Joined: May 23, 2012

curious

I wonder how those NaNs got there. Gotta think about that.

As a work around, you can try mxFitFunctionML(jointConditionOn = "continuous").

Log in or register to post comments

Replied on Wed, 07/15/2020 - 16:03

jpritikin Joined: May 23, 2012

variances?

Did you set a lower bound of 1e-3 on your ordinal variances?

Log in or register to post comments

Replied on Wed, 07/15/2020 - 16:07

jpritikin Joined: May 23, 2012

try again

Can you try again with e6b94c0e02f3 ?

Log in or register to post comments

Replied on Wed, 07/15/2020 - 16:41

khusmann Joined: Aug 11, 2019

it's working!

> Did you set a lower bound of 1e-3 on your ordinal variances?

all my ordinal vars are set fixed variance = 1, mean = 0, thresh are free.

> Can you try again with e6b94c0e02f3 ?

Awesome, this seems to be working... it's been running over 10mins now and usually dies after 2min. I'll let you know if it makes it to the end...

Thanks for your help!!

Log in or register to post comments

Replied on Wed, 07/22/2020 - 17:59

khusmann Joined: Aug 11, 2019

Can confirm

Just wanted to confirm that this is working for the large multigroup models I was trying to run. Thanks again!

Log in or register to post comments

Replied on Thu, 07/30/2020 - 19:51

khusmann Joined: Aug 11, 2019

uh oh

Is there any way these changes could result in a segfault? I'm getting this error occasionally when I'm running:
*** caught segfault *** address (nil), cause 'memory not mapped'

Traceback: 1: runHelper(model, frontendStart, intervals, silent, suppressWarnings, unsafe, checkpoint, useSocket, onlyFrontend, useOptimizer, beginMessage) 2: mxRun(model = model, suppressWarnings = T, unsafe = T, silent = T, intervals = intervals, beginMessage = T) 3: runWithCounter(model, numdone, silent, intervals = F) 4: doTryCatch(return(expr), name, parentenv, handler) 5: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 6: tryCatchList(expr, classes, parentenv, handlers) 7: tryCatch(expr, error = function(e) { call <- conditionCall(e) if (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) call <- sys. call(-4L) dcall <- deparse(call)[1L] prefix <- paste("Error in", dcall, ": ") LONG <- 75L sm <- strsplit(conditionMessage(e), "\n")[[1L]] w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], type = "b") if (w > LONG) prefix <- paste0(prefix, "\n ") } else prefix <- "Error : " msg <- paste0(prefix, conditionMessage(e), "\n") .Internal(seterrmessage(msg[1L])) if (!silent && isTRUE(getOption("show.error.messages"))) { cat(msg, file = outFile) . Internal(printDeferredWarnings()) } invisible(structure(msg, class = "try-error", condition = e))}) 8: try(runWithCounter(model, numdone, silent, intervals = F)) 9: withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning")) 10: suppressWarnings(try(runWithCounter(model, numdone, silent, intervals = F))) 11: mxTryHard(model = model, greenOK = greenOK, checkHess = checkHess, finetuneGradient = finetuneGradient, exhaustive = exhaustive, OKstatuscodes = OKstatuscodes, wtgcsv = wtgcsv, ...) 12: mxTryHardOrdinal(model, intervals = T, showInits = T) An irrecoverable exception occurred. R is aborting now ...

Let me know if there's any other debug info I can send on this!

Log in or register to post comments

Replied on Thu, 07/30/2020 - 20:34

jpritikin Joined: May 23, 2012

yikes!

Does this happen with mxFitFunctionML(jointConditionOn = "continuous") too?

Log in or register to post comments

Replied on Mon, 08/03/2020 - 12:30

khusmann Joined: Aug 11, 2019

parallel problems

I'll give jointConditionOn a shot in a bit; the cluster I'm working on is down right now for maintenance. One thing I figured out though, is that it works just fine if I limit it to a single thread...

I'm wondering now if it might have to do with the way the cluster will suspend low priority jobs and then resume when resources become available again; I'm wondering if OpenMx might not respond well to this when working with multiple threads. Once the server gets back up I'll see if I can run it in a way that I can be sure it won't get suspended and let you know. (I'll try the "continuous" option as well)

Log in or register to post comments

Replied on Mon, 08/03/2020 - 13:36

jpritikin Joined: May 23, 2012

threads

> if it might have to do with the way the cluster will suspend low priority jobs and then resume when resources become available again

No, if it obtains a SEGV then it's an OpenMx bug. If you can provide a gdb stack trace then it might help track the problem down.

Log in or register to post comments

Replied on Mon, 08/03/2020 - 18:29

khusmann Joined: Aug 11, 2019

gdb stack trace

Here's the stack trace -- let me know if this is enough or I should recompile OpenMx with debugging (is that enabled in the ./configure script?)

... [New Thread 0x2aaabb963700 (LWP 16596)] [New Thread 0x2aaabb15f700 (LWP 16597)] [New Thread 0x2aaabaf5e700 (LWP 16598)] [New Thread 0x2aaabbd65700 (LWP 16599)] [New Thread 0x2aaabb762700 (LWP 16600)] [New Thread 0x2aaabb561700 (LWP 16601)] [New Thread 0x2aaabb360700 (LWP 16602)] [New Thread 0x2aaabbb64700 (LWP 16603)] [Thread 0x2aaabbb64700 (LWP 16603) exited] [Thread 0x2aaabb360700 (LWP 16602) exited] [Thread 0x2aaabaf5e700 (LWP 16598) exited] [Thread 0x2aaabb561700 (LWP 16601) exited] [Thread 0x2aaabb762700 (LWP 16600) exited] [Thread 0x2aaabbd65700 (LWP 16599) exited]

Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x2aaabb963700 (LWP 16596)] 0x00002aaab81a9474 in phi_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.212.el6_10.3.x86_64 sssd-client-1.13.3-60.el6_10.2.x86_64 (gdb) (gdb) bt #0 0x00002aaab81a9474 in phi_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so #1 0x00002aaab81a9658 in limits_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so #2 0x00002aaab81acc89 in master.0.mvnfnc_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so #3 0x0000000000000000 in ?? ()
I'll try "continuous" next to see if it crashes too.

Log in or register to post comments

Replied on Tue, 08/04/2020 - 11:38

khusmann Joined: Aug 11, 2019

valgrind

Ok, now running it with valgrind (with debug symbols) and getting this (even when using only a single thread):

... ==104811== Invalid write of size 8 ==104811== at 0x1EC12AD5: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so) ==104811== by 0x270000022E: ??? ==104811== Address 0x7fec08b80 is on thread 1's stack ==104811== ==104811== Invalid write of size 8 ==104811== at 0x1EC12ADA: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so) ==104811== by 0x270000022E: ??? ==104811== Address 0x7fec08b68 is on thread 1's stack ==104811== ==104811== Invalid write of size 8 ==104811== at 0x1EC12AE8: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so) ==104811== by 0x270000022E: ??? ==104811== Address 0x7fec08b60 is on thread 1's stack ==104811== ==104811== Invalid read of size 4 ==104811== at 0x1EC114A1: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so) ==104811== by 0x270000022E: ??? ==104811== Address 0x7febf91fc is on thread 1's stack ==104811== ==104811== Invalid read of size 8 ==104811== at 0x1EC11500: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so) ==104811== by 0x270000022E: ??? ==104811== Address 0x7fec08b80 is on thread 1's stack ==104811== ==104811== Invalid read of size 8 ==104811== at 0x1EC11505: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so) ==104811== by 0x270000022E: ??? ==104811== Address 0x7fec08b50 is on thread 1's stack ==104811== ==104811== Invalid write of size 8 ==104811== at 0x1EC1150D: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so) ==104811== by 0x270000022E: ??? ==104811== Address 0x7fec08bc8 is on thread 1's stack ==104811== ==104811== Invalid write of size 8 ==104811== at 0x1EC11516: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so) ==104811== by 0x270000022E: ??? ==104811== Address 0x7fec08bb0 is on thread 1's stack ... ==104811== Invalid write of size 4 ==104811== at 0x1EC16040: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so) ==104811== by 0x3FE3E3C19C7D4003: ??? ==104811== by 0x3F00D0619426B3E0: ??? ==104811== by 0x3EB2FBF51BD20191: ??? ==104811== by 0x3E930B23957D69FE: ??? ==104811== by 0x3ED66A84782075BE: ??? ==104811== by 0x3ED095BFE1A40C42: ??? ==104811== by 0x3E9ACE8B5731D3C1: ??? ==104811== by 0x3EDC1B33D305BFEE: ??? ==104811== by 0x3EB8689EEF212500: ??? ==104811== by 0x3ED8F95C5F95FCB9: ??? ==104811== by 0x3ED08EBC6EC100EE: ??? ==104811== Address 0x7febf91f8 is on thread 1's stack ==104811== ==104811== Invalid write of size 8 ==104811== at 0x1EC13AB8: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so) ==104811== by 0xBFA9B4BB3ADE24DC: ??? ==104811== by 0x1: ??? ==104811== by 0x7FEBF8DBF: ??? ==104811== by 0x7: ??? ==104811== by 0x1EE880C3: ??? ==104811== by 0x1EC16427: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so) ==104811== by 0x7FEBF920F: ??? ==104811== by 0x7FEBF9217: ??? ==104811== Address 0x7febf9218 is on thread 1's stack ==104811== ==104811== Invalid read of size 4 ==104811== at 0x1EC168CE: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so) ==104811== by 0xBFE3D3E0E12CDA09: ??? ==104811== by 0x7FEC089FF: ??? ==104811== by 0x7FEBFD047: ??? ==104811== by 0x7FEC00E3F: ??? ==104811== by 0x4: ??? ==104811== by 0x7FEC08A2F: ??? ==104811== by 0xB: ??? ==104811== by 0x7FEC08B7F: ??? ==104811== by 0x7FEBF921F: ??? ==104811== by 0x7FEBF8F8F: ??? ==104811== by 0x2: ??? ==104811== Address 0x7febf91f8 is on thread 1's stack ...

It gets over 1000 errors before even reaching "Begin Initial Fit Attempt". Maybe an off by 1 error somewhere?

I don't get errors on simple models... Maybe it's something to do with ordinal or multigroup code? I'll have to work sometime later to narrow it down to a minimal repeatable example.

Log in or register to post comments

Replied on Tue, 08/04/2020 - 14:06

jpritikin Joined: May 23, 2012

errors

Gah! I can't recall which version you're trying. Maybe you can update to the HEAD of master?

Log in or register to post comments

Replied on Tue, 08/04/2020 - 15:34

khusmann Joined: Aug 11, 2019

updated

Just updated to HEAD of master and still get the memory leaks. I'll start an issue on github once I narrow it down to a MWE

Log in or register to post comments

Replied on Tue, 08/04/2020 - 14:08

jpritikin Joined: May 23, 2012

github

Also, since you're not running a release, maybe we should open a GitHub issue and troubleshoot there. These forums are mostly for modeling questions. Software problems are more suited to GitHub.

Log in or register to post comments

Replied on Mon, 08/03/2020 - 18:37

khusmann Joined: Aug 11, 2019

fitfunctions

is it possible to change the fitfunction of an already built model? I'm using umxSuperModel to build my multigroup, which uses mxFitFunctionMultigroup under the hood. Can I take the super model and then change the fitfunction after it's already built, or do I need to stop using supermodel and build the multigroup model manually?

Log in or register to post comments

Replied on Tue, 08/04/2020 - 17:01

tbates Joined: Jul 31, 2009

Add a fitfunction to an existing model to overwrite the old one

Using umxSuperModel is fine.

To swap the fit function for a model, just add it to the model, and it will replace the existing one. So

m2 = mxModel(m1, mxFitFunctionWLS())

Log in or register to post comments

Replied on Thu, 08/06/2020 - 16:59

khusmann Joined: Aug 11, 2019

Thanks!

Thanks Tim!

Log in or register to post comments

Replied on Wed, 07/15/2020 - 16:09

jpritikin Joined: May 23, 2012

devtools

> is there a way to do a cran-install from install_github?

Sorry, no.

Log in or register to post comments

Replied on Wed, 07/15/2020 - 13:47

khusmann Joined: Aug 11, 2019

bad thresholds

yup, definitely have extreme responses:

(# true / # false)
Indicator 1: 2051 / 11147
Indicator 2: 2182 / 9264
Indicator 3: 2917 / 7715
Indicator 4: 2432 / 7411
Indicator 5: 2136 / 7063

any way to handle these?

Log in or register to post comments

Replied on Wed, 07/15/2020 - 15:38

News

Recent Posts

Error in runHelper... xn out of range

bad thresholds

better diag

build error

diagnostic output

curious

variances?

try again

it's working!

Can confirm

uh oh

yikes!

parallel problems

threads

gdb stack trace

valgrind

errors

updated

github

fitfunctions

Add a fitfunction to an existing model to overwrite the old one

Thanks!

devtools

bad thresholds

not too extreme

News

Recent Posts