Error in runHelper... xn out of range
Posted on
khusmann
Joined: 08/11/2019
Forums
Hello,
I'm running a relatively complex model with binary outcomes (but well under the max 20 ordinal vars). I'm fixing mean=0, var=1 and letting thresholds vary. Things periodically fail with:
error in runHelper(model, frontendStart, intervals, silent, suppressWarnings, : xn out of range
Starting values can seem to cause this (even when mxTryHardOrdinal is jiggling around values close to the optimum some tries will fail out with this), also increasing complexity (doing a multiple groups analysis), also sometimes decreasing mvnRelEps will cause this.
Any ideas of what might be going on or how to debug this further?
Thanks for the help!
bad thresholds
Log in or register to post comments
In reply to bad thresholds by jpritikin
better diag
Log in or register to post comments
In reply to better diag by jpritikin
build error
** libs
make: *** No rule to make target 'omxSymbolTable.h', needed by 'Compute.o'. Stop.
I checked out the repo manually and did a "make cran-install" and it seems to be compiling now -- is there a way to do a cran-install from install_github?
Log in or register to post comments
In reply to build error by khusmann
diagnostic output
[0] xn = matrix(c( # 2x1
-inf
, nan), byrow=TRUE, nrow=2, ncol=1)
[1] xn = matrix(c( # 2x1
nan
, inf), byrow=TRUE, nrow=2, ncol=1)
[1] lower = matrix(c( # 5x1
nan
, -inf
, -inf
, -inf
, nan), byrow=TRUE, nrow=5, ncol=1)
[0] lower = matrix(c( # 5x1
-inf
, -inf
, -inf
, -inf
, -inf), byrow=TRUE, nrow=5, ncol=1)
[1] upper = matrix(c( # 5x1
inf
, nan
, nan
, nan
, inf), byrow=TRUE, nrow=5, ncol=1)
[0] upper = matrix(c( # 5x1
nan
, nan
, nan
, nan
, nan), byrow=TRUE, nrow=5, ncol=1)
Log in or register to post comments
In reply to diagnostic output by khusmann
curious
As a work around, you can try
mxFitFunctionML(jointConditionOn = "continuous")
.Log in or register to post comments
In reply to diagnostic output by khusmann
variances?
Log in or register to post comments
In reply to diagnostic output by khusmann
try again
Log in or register to post comments
In reply to try again by jpritikin
it's working!
all my ordinal vars are set fixed variance = 1, mean = 0, thresh are free.
> Can you try again with e6b94c0e02f3 ?
Awesome, this seems to be working... it's been running over 10mins now and usually dies after 2min. I'll let you know if it makes it to the end...
Thanks for your help!!
Log in or register to post comments
In reply to it's working! by khusmann
Can confirm
Log in or register to post comments
In reply to Can confirm by khusmann
uh oh
*** caught segfault ***
address (nil), cause 'memory not mapped'
Traceback:
1: runHelper(model, frontendStart, intervals, silent, suppressWarnings, unsafe, checkpoint, useSocket, onlyFrontend, useOptimizer, beginMessage)
2: mxRun(model = model, suppressWarnings = T, unsafe = T, silent = T, intervals = intervals, beginMessage = T)
3: runWithCounter(model, numdone, silent, intervals = F)
4: doTryCatch(return(expr), name, parentenv, handler)
5: tryCatchOne(expr, names, parentenv, handlers[[1L]])
6: tryCatchList(expr, classes, parentenv, handlers)
7: tryCatch(expr, error = function(e) { call <- conditionCall(e) if (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) call <- sys. call(-4L) dcall <- deparse(call)[1L] prefix <- paste("Error in", dcall, ": ") LONG <- 75L sm <- strsplit(conditionMessage(e), "\n")[[1L]] w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], type = "b") if (w > LONG) prefix <- paste0(prefix, "\n ") } else prefix <- "Error : " msg <- paste0(prefix, conditionMessage(e), "\n") .Internal(seterrmessage(msg[1L])) if (!silent && isTRUE(getOption("show.error.messages"))) { cat(msg, file = outFile) . Internal(printDeferredWarnings()) } invisible(structure(msg, class = "try-error", condition = e))})
8: try(runWithCounter(model, numdone, silent, intervals = F))
9: withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning"))
10: suppressWarnings(try(runWithCounter(model, numdone, silent, intervals = F)))
11: mxTryHard(model = model, greenOK = greenOK, checkHess = checkHess, finetuneGradient = finetuneGradient, exhaustive = exhaustive, OKstatuscodes = OKstatuscodes, wtgcsv = wtgcsv, ...)
12: mxTryHardOrdinal(model, intervals = T, showInits = T)
An irrecoverable exception occurred. R is aborting now ...
Let me know if there's any other debug info I can send on this!
Log in or register to post comments
In reply to uh oh by khusmann
yikes!
Log in or register to post comments
In reply to yikes! by jpritikin
parallel problems
I'm wondering now if it might have to do with the way the cluster will suspend low priority jobs and then resume when resources become available again; I'm wondering if OpenMx might not respond well to this when working with multiple threads. Once the server gets back up I'll see if I can run it in a way that I can be sure it won't get suspended and let you know. (I'll try the "continuous" option as well)
Log in or register to post comments
In reply to parallel problems by khusmann
threads
No, if it obtains a SEGV then it's an OpenMx bug. If you can provide a gdb stack trace then it might help track the problem down.
Log in or register to post comments
In reply to threads by jpritikin
gdb stack trace
...
[New Thread 0x2aaabb963700 (LWP 16596)]
[New Thread 0x2aaabb15f700 (LWP 16597)]
[New Thread 0x2aaabaf5e700 (LWP 16598)]
[New Thread 0x2aaabbd65700 (LWP 16599)]
[New Thread 0x2aaabb762700 (LWP 16600)]
[New Thread 0x2aaabb561700 (LWP 16601)]
[New Thread 0x2aaabb360700 (LWP 16602)]
[New Thread 0x2aaabbb64700 (LWP 16603)]
[Thread 0x2aaabbb64700 (LWP 16603) exited]
[Thread 0x2aaabb360700 (LWP 16602) exited]
[Thread 0x2aaabaf5e700 (LWP 16598) exited]
[Thread 0x2aaabb561700 (LWP 16601) exited]
[Thread 0x2aaabb762700 (LWP 16600) exited]
[Thread 0x2aaabbd65700 (LWP 16599) exited]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2aaabb963700 (LWP 16596)]
0x00002aaab81a9474 in phi_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.212.el6_10.3.x86_64 sssd-client-1.13.3-60.el6_10.2.x86_64
(gdb)
(gdb) bt
#0 0x00002aaab81a9474 in phi_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so
#1 0x00002aaab81a9658 in limits_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so
#2 0x00002aaab81acc89 in master.0.mvnfnc_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so
#3 0x0000000000000000 in ?? ()
I'll try "continuous" next to see if it crashes too.
Log in or register to post comments
In reply to gdb stack trace by khusmann
valgrind
...
==104811== Invalid write of size 8
==104811== at 0x1EC12AD5: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7fec08b80 is on thread 1's stack
==104811==
==104811== Invalid write of size 8
==104811== at 0x1EC12ADA: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7fec08b68 is on thread 1's stack
==104811==
==104811== Invalid write of size 8
==104811== at 0x1EC12AE8: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7fec08b60 is on thread 1's stack
==104811==
==104811== Invalid read of size 4
==104811== at 0x1EC114A1: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7febf91fc is on thread 1's stack
==104811==
==104811== Invalid read of size 8
==104811== at 0x1EC11500: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7fec08b80 is on thread 1's stack
==104811==
==104811== Invalid read of size 8
==104811== at 0x1EC11505: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7fec08b50 is on thread 1's stack
==104811==
==104811== Invalid write of size 8
==104811== at 0x1EC1150D: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7fec08bc8 is on thread 1's stack
==104811==
==104811== Invalid write of size 8
==104811== at 0x1EC11516: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x270000022E: ???
==104811== Address 0x7fec08bb0 is on thread 1's stack
...
==104811== Invalid write of size 4
==104811== at 0x1EC16040: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x3FE3E3C19C7D4003: ???
==104811== by 0x3F00D0619426B3E0: ???
==104811== by 0x3EB2FBF51BD20191: ???
==104811== by 0x3E930B23957D69FE: ???
==104811== by 0x3ED66A84782075BE: ???
==104811== by 0x3ED095BFE1A40C42: ???
==104811== by 0x3E9ACE8B5731D3C1: ???
==104811== by 0x3EDC1B33D305BFEE: ???
==104811== by 0x3EB8689EEF212500: ???
==104811== by 0x3ED8F95C5F95FCB9: ???
==104811== by 0x3ED08EBC6EC100EE: ???
==104811== Address 0x7febf91f8 is on thread 1's stack
==104811==
==104811== Invalid write of size 8
==104811== at 0x1EC13AB8: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0xBFA9B4BB3ADE24DC: ???
==104811== by 0x1: ???
==104811== by 0x7FEBF8DBF: ???
==104811== by 0x7: ???
==104811== by 0x1EE880C3: ???
==104811== by 0x1EC16427: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0x7FEBF920F: ???
==104811== by 0x7FEBF9217: ???
==104811== Address 0x7febf9218 is on thread 1's stack
==104811==
==104811== Invalid read of size 4
==104811== at 0x1EC168CE: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811== by 0xBFE3D3E0E12CDA09: ???
==104811== by 0x7FEC089FF: ???
==104811== by 0x7FEBFD047: ???
==104811== by 0x7FEC00E3F: ???
==104811== by 0x4: ???
==104811== by 0x7FEC08A2F: ???
==104811== by 0xB: ???
==104811== by 0x7FEC08B7F: ???
==104811== by 0x7FEBF921F: ???
==104811== by 0x7FEBF8F8F: ???
==104811== by 0x2: ???
==104811== Address 0x7febf91f8 is on thread 1's stack
...
It gets over 1000 errors before even reaching "Begin Initial Fit Attempt". Maybe an off by 1 error somewhere?
I don't get errors on simple models... Maybe it's something to do with ordinal or multigroup code? I'll have to work sometime later to narrow it down to a minimal repeatable example.
Log in or register to post comments
In reply to valgrind by khusmann
errors
Log in or register to post comments
In reply to errors by jpritikin
updated
Log in or register to post comments
In reply to valgrind by khusmann
github
Log in or register to post comments
In reply to yikes! by jpritikin
fitfunctions
Log in or register to post comments
In reply to fitfunctions by khusmann
Add a fitfunction to an existing model to overwrite the old one
To swap the fit function for a model, just add it to the model, and it will replace the existing one. So
m2 = mxModel(m1, mxFitFunctionWLS())
Log in or register to post comments
In reply to Add a fitfunction to an existing model to overwrite the old one by tbates
Thanks!
Log in or register to post comments
In reply to build error by khusmann
devtools
Sorry, no.
Log in or register to post comments
In reply to bad thresholds by jpritikin
bad thresholds
(# true / # false)
Indicator 1: 2051 / 11147
Indicator 2: 2182 / 9264
Indicator 3: 2917 / 7715
Indicator 4: 2432 / 7411
Indicator 5: 2136 / 7063
any way to handle these?
Log in or register to post comments
In reply to bad thresholds by khusmann
not too extreme
Log in or register to post comments