Error in runHelper... xn out of range
Posted on

Forums
Hello,
I'm running a relatively complex model with binary outcomes (but well under the max 20 ordinal vars). I'm fixing mean=0, var=1 and letting thresholds vary. Things periodically fail with:
error in runHelper(model, frontendStart, intervals, silent, suppressWarnings, : xn out of range
Starting values can seem to cause this (even when mxTryHardOrdinal is jiggling around values close to the optimum some tries will fail out with this), also increasing complexity (doing a multiple groups analysis), also sometimes decreasing mvnRelEps will cause this.
Any ideas of what might be going on or how to debug this further?
Thanks for the help!
bad thresholds
This happens when the thresholds are too far away from the mean. Do you have any binary indicators which have an extreme proportion of responses (almost all 1 or 0)?
Log in or register to post comments
In reply to bad thresholds by jpritikin
better diag
I made some changes to improve the diagnostic output. Can you build from source 6bf6d3a8587 and try again?
Log in or register to post comments
In reply to better diag by jpritikin
build error
When I install_github I get this build error:
** libs
make: *** No rule to make target 'omxSymbolTable.h', needed by 'Compute.o'. Stop.
I checked out the repo manually and did a "make cran-install" and it seems to be compiling now -- is there a way to do a cran-install from install_github?
Log in or register to post comments
In reply to build error by khusmann
diagnostic output
Here's the output I get:
[0] xn = matrix(c( # 2x1
-inf
, nan), byrow=TRUE, nrow=2, ncol=1)
[1] xn = matrix(c( # 2x1
nan
, inf), byrow=TRUE, nrow=2, ncol=1)
[1] lower = matrix(c( # 5x1
nan
, -inf
, -inf
, -inf
, nan), byrow=TRUE, nrow=5, ncol=1)
[0] lower = matrix(c( # 5x1
-inf
, -inf
, -inf
, -inf
, -inf), byrow=TRUE, nrow=5, ncol=1)
[1] upper = matrix(c( # 5x1
inf
, nan
, nan
, nan
, inf), byrow=TRUE, nrow=5, ncol=1)
[0] upper = matrix(c( # 5x1
nan
, nan
, nan
, nan
, nan), byrow=TRUE, nrow=5, ncol=1)
Log in or register to post comments
In reply to diagnostic output by khusmann
curious
I wonder how those NaNs got there. Gotta think about that.
As a work around, you can try
mxFitFunctionML(jointConditionOn = "continuous")
.Log in or register to post comments
In reply to diagnostic output by khusmann
variances?
Did you set a lower bound of 1e-3 on your ordinal variances?
Log in or register to post comments
In reply to diagnostic output by khusmann
try again
Can you try again with e6b94c0e02f3 ?
Log in or register to post comments
In reply to try again by jpritikin
it's working!
> Did you set a lower bound of 1e-3 on your ordinal variances?
all my ordinal vars are set fixed variance = 1, mean = 0, thresh are free.
> Can you try again with e6b94c0e02f3 ?
Awesome, this seems to be working... it's been running over 10mins now and usually dies after 2min. I'll let you know if it makes it to the end...
Thanks for your help!!
Log in or register to post comments
In reply to it's working! by khusmann
Can confirm
Just wanted to confirm that this is working for the large multigroup models I was trying to run. Thanks again!
Log in or register to post comments
In reply to Can confirm by khusmann
uh oh
Is there any way these changes could result in a segfault? I'm getting this error occasionally when I'm running:
*** caught segfault ***
address (nil), cause 'memory not mapped'
Traceback:
1: runHelper(model, frontendStart, intervals, silent, suppressWarnings, unsafe, checkpoint, useSocket, onlyFrontend, useOptimizer, beginMessage)
2: mxRun(model = model, suppressWarnings = T, unsafe = T, silent = T, intervals = intervals, beginMessage = T)
3: runWithCounter(model, numdone, silent, intervals = F)
4: doTryCatch(return(expr), name, parentenv, handler)
5: tryCatchOne(expr, names, parentenv, handlers[[1L]])
6: tryCatchList(expr, classes, parentenv, handlers)
7: tryCatch(expr, error = function(e) { call <- conditionCall(e) if (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) call <- sys. call(-4L) dcall <- deparse(call)[1L] prefix <- paste("Error in", dcall, ": ") LONG <- 75L sm <- strsplit(conditionMessage(e), "\n")[[1L]] w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], type = "b") if (w > LONG) prefix <- paste0(prefix, "\n ") } else prefix <- "Error : " msg <- paste0(prefix, conditionMessage(e), "\n") .Internal(seterrmessage(msg[1L])) if (!silent && isTRUE(getOption("show.error.messages"))) { cat(msg, file = outFile) . Internal(printDeferredWarnings()) } invisible(structure(msg, class = "try-error", condition = e))})
8: try(runWithCounter(model, numdone, silent, intervals = F))
9: withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning"))
10: suppressWarnings(try(runWithCounter(model, numdone, silent, intervals = F)))
11: mxTryHard(model = model, greenOK = greenOK, checkHess = checkHess, finetuneGradient = finetuneGradient, exhaustive = exhaustive, OKstatuscodes = OKstatuscodes, wtgcsv = wtgcsv, ...)
12: mxTryHardOrdinal(model, intervals = T, showInits = T)
An irrecoverable exception occurred. R is aborting now ...
Let me know if there's any other debug info I can send on this!
Log in or register to post comments
In reply to uh oh by khusmann
yikes!
Does this happen with mxFitFunctionML(jointConditionOn = "continuous") too?
Log in or register to post comments
In reply to yikes! by jpritikin
parallel problems
I'll give jointConditionOn a shot in a bit; the cluster I'm working on is down right now for maintenance. One thing I figured out though, is that it works just fine if I limit it to a single thread...
I'm wondering now if it might have to do with the way the cluster will suspend low priority jobs and then resume when resources become available again; I'm wondering if OpenMx might not respond well to this when working with multiple threads. Once the server gets back up I'll see if I can run it in a way that I can be sure it won't get suspended and let you know. (I'll try the "continuous" option as well)
Log in or register to post comments
In reply to parallel problems by khusmann
threads
> if it might have to do with the way the cluster will suspend low priority jobs and then resume when resources become available again
No, if it obtains a SEGV then it's an OpenMx bug. If you can provide a gdb stack trace then it might help track the problem down.
Log in or register to post comments
In reply to threads by jpritikin
gdb stack trace
Here's the stack trace -- let me know if this is enough or I should recompile OpenMx with debugging (is that enabled in the ./configure script?)
...
[New Thread 0x2aaabb963700 (LWP 16596)]
[New Thread 0x2aaabb15f700 (LWP 16597)]
[New Thread 0x2aaabaf5e700 (LWP 16598)]
[New Thread 0x2aaabbd65700 (LWP 16599)]
[New Thread 0x2aaabb762700 (LWP 16600)]
[New Thread 0x2aaabb561700 (LWP 16601)]
[New Thread 0x2aaabb360700 (LWP 16602)]
[New Thread 0x2aaabbb64700 (LWP 16603)]
[Thread 0x2aaabbb64700 (LWP 16603) exited]
[Thread 0x2aaabb360700 (LWP 16602) exited]
[Thread 0x2aaabaf5e700 (LWP 16598) exited]
[Thread 0x2aaabb561700 (LWP 16601) exited]
[Thread 0x2aaabb762700 (LWP 16600) exited]
[Thread 0x2aaabbd65700 (LWP 16599) exited]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2aaabb963700 (LWP 16596)]
0x00002aaab81a9474 in phi_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.212.el6_10.3.x86_64 sssd-client-1.13.3-60.el6_10.2.x86_64
(gdb)
(gdb) bt
#0 0x00002aaab81a9474 in phi_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so
#1 0x00002aaab81a9658 in limits_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so
#2 0x00002aaab81acc89 in master.0.mvnfnc_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so
#3 0x0000000000000000 in ?? ()
I'll try "continuous" next to see if it crashes too.
Log in or register to post comments
In reply to gdb stack trace by khusmann
valgrind
Ok, now running it with valgrind (with debug symbols) and getting this (even when using only a single thread):
...
==104811== Invalid write of size 8
==104811== at 0x1EC12AD5: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7fec08b80 is on thread 1's stack
==104811==
==104811== Invalid write of size 8
==104811== at 0x1EC12ADA: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7fec08b68 is on thread 1's stack
==104811==
==104811== Invalid write of size 8
==104811== at 0x1EC12AE8: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7fec08b60 is on thread 1's stack
==104811==
==104811== Invalid read of size 4
==104811== at 0x1EC114A1: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7febf91fc is on thread 1's stack
==104811==
==104811== Invalid read of size 8
==104811== at 0x1EC11500: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7fec08b80 is on thread 1's stack
==104811==
==104811== Invalid read of size 8
==104811== at 0x1EC11505: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7fec08b50 is on thread 1's stack
==104811==
==104811== Invalid write of size 8
==104811== at 0x1EC1150D: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7fec08bc8 is on thread 1's stack
==104811==
==104811== Invalid write of size 8
==104811== at 0x1EC11516: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7fec08bb0 is on thread 1's stack
...
==104811== Invalid write of size 4
==104811== at 0x1EC16040: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x3FE3E3C19C7D4003: ???
==104811== by 0x3F00D0619426B3E0: ???
==104811== by 0x3EB2FBF51BD20191: ???
==104811== by 0x3E930B23957D69FE: ???
==104811== by 0x3ED66A84782075BE: ???
==104811== by 0x3ED095BFE1A40C42: ???
==104811== by 0x3E9ACE8B5731D3C1: ???
==104811== by 0x3EDC1B33D305BFEE: ???
==104811== by 0x3EB8689EEF212500: ???
==104811== by 0x3ED8F95C5F95FCB9: ???
==104811== by 0x3ED08EBC6EC100EE: ???
==104811== Address 0x7febf91f8 is on thread 1's stack
==104811==
==104811== Invalid write of size 8
==104811== at 0x1EC13AB8: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0xBFA9B4BB3ADE24DC: ???
==104811== by 0x1: ???
==104811== by 0x7FEBF8DBF: ???
==104811== by 0x7: ???
==104811== by 0x1EE880C3: ???
==104811== by 0x1EC16427: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x7FEBF920F: ???
==104811== by 0x7FEBF9217: ???
==104811== Address 0x7febf9218 is on thread 1's stack
==104811==
==104811== Invalid read of size 4
==104811== at 0x1EC168CE: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0xBFE3D3E0E12CDA09: ???
==104811== by 0x7FEC089FF: ???
==104811== by 0x7FEBFD047: ???
==104811== by 0x7FEC00E3F: ???
==104811== by 0x4: ???
==104811== by 0x7FEC08A2F: ???
==104811== by 0xB: ???
==104811== by 0x7FEC08B7F: ???
==104811== by 0x7FEBF921F: ???
==104811== by 0x7FEBF8F8F: ???
==104811== by 0x2: ???
==104811== Address 0x7febf91f8 is on thread 1's stack
...
It gets over 1000 errors before even reaching "Begin Initial Fit Attempt". Maybe an off by 1 error somewhere?
I don't get errors on simple models... Maybe it's something to do with ordinal or multigroup code? I'll have to work sometime later to narrow it down to a minimal repeatable example.
Log in or register to post comments
In reply to valgrind by khusmann
errors
Gah! I can't recall which version you're trying. Maybe you can update to the HEAD of master?
Log in or register to post comments
In reply to errors by jpritikin
updated
Just updated to HEAD of master and still get the memory leaks. I'll start an issue on github once I narrow it down to a MWE
Log in or register to post comments
In reply to valgrind by khusmann
github
Also, since you're not running a release, maybe we should open a GitHub issue and troubleshoot there. These forums are mostly for modeling questions. Software problems are more suited to GitHub.
Log in or register to post comments
In reply to yikes! by jpritikin
fitfunctions
is it possible to change the fitfunction of an already built model? I'm using umxSuperModel to build my multigroup, which uses mxFitFunctionMultigroup under the hood. Can I take the super model and then change the fitfunction after it's already built, or do I need to stop using supermodel and build the multigroup model manually?
Log in or register to post comments
In reply to fitfunctions by khusmann
Add a fitfunction to an existing model to overwrite the old one
Using umxSuperModel is fine.
To swap the fit function for a model, just add it to the model, and it will replace the existing one. So
m2 = mxModel(m1, mxFitFunctionWLS())
Log in or register to post comments
In reply to Add a fitfunction to an existing model to overwrite the old one by tbates
Thanks!
Thanks Tim!
Log in or register to post comments
In reply to build error by khusmann
devtools
> is there a way to do a cran-install from install_github?
Sorry, no.
Log in or register to post comments
In reply to bad thresholds by jpritikin
bad thresholds
yup, definitely have extreme responses:
(# true / # false)
Indicator 1: 2051 / 11147
Indicator 2: 2182 / 9264
Indicator 3: 2917 / 7715
Indicator 4: 2432 / 7411
Indicator 5: 2136 / 7063
any way to handle these?
Log in or register to post comments
In reply to bad thresholds by khusmann
not too extreme
Those data don't seem all that extreme. Seems like the model should work.
Log in or register to post comments