You are here

Error in runHelper... xn out of range

25 posts / 0 new
Last post
khusmann's picture
Offline
Joined: 08/11/2019 - 19:19
Error in runHelper... xn out of range

Hello,

I'm running a relatively complex model with binary outcomes (but well under the max 20 ordinal vars). I'm fixing mean=0, var=1 and letting thresholds vary. Things periodically fail with:

error in runHelper(model, frontendStart, intervals, silent, suppressWarnings,  : xn out of range

Starting values can seem to cause this (even when mxTryHardOrdinal is jiggling around values close to the optimum some tries will fail out with this), also increasing complexity (doing a multiple groups analysis), also sometimes decreasing mvnRelEps will cause this.

Any ideas of what might be going on or how to debug this further?

Thanks for the help!

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
bad thresholds

This happens when the thresholds are too far away from the mean. Do you have any binary indicators which have an extreme proportion of responses (almost all 1 or 0)?

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
better diag

I made some changes to improve the diagnostic output. Can you build from source 6bf6d3a8587 and try again?

khusmann's picture
Offline
Joined: 08/11/2019 - 19:19
build error

When I install_github I get this build error:

** libs
make: *** No rule to make target 'omxSymbolTable.h', needed by 'Compute.o'.  Stop.

I checked out the repo manually and did a "make cran-install" and it seems to be compiling now -- is there a way to do a cran-install from install_github?

khusmann's picture
Offline
Joined: 08/11/2019 - 19:19
diagnostic output

Here's the output I get:

[0] xn =  matrix(c(    # 2x1
 -inf
, nan), byrow=TRUE, nrow=2, ncol=1)
[1] xn =  matrix(c(    # 2x1
 nan
, inf), byrow=TRUE, nrow=2, ncol=1)
[1] lower =  matrix(c(    # 5x1
 nan
, -inf
, -inf
, -inf
, nan), byrow=TRUE, nrow=5, ncol=1)
[0] lower =  matrix(c(    # 5x1
 -inf
, -inf
, -inf
, -inf
, -inf), byrow=TRUE, nrow=5, ncol=1)
[1] upper =  matrix(c(    # 5x1
 inf
, nan
, nan
, nan
, inf), byrow=TRUE, nrow=5, ncol=1)
[0] upper =  matrix(c(    # 5x1
 nan
, nan
, nan
, nan
, nan), byrow=TRUE, nrow=5, ncol=1)
jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
curious

I wonder how those NaNs got there. Gotta think about that.

As a work around, you can try mxFitFunctionML(jointConditionOn = "continuous").

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
variances?

Did you set a lower bound of 1e-3 on your ordinal variances?

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
try again

Can you try again with e6b94c0e02f3 ?

khusmann's picture
Offline
Joined: 08/11/2019 - 19:19
it's working!

> Did you set a lower bound of 1e-3 on your ordinal variances?

all my ordinal vars are set fixed variance = 1, mean = 0, thresh are free.

> Can you try again with e6b94c0e02f3 ?

Awesome, this seems to be working... it's been running over 10mins now and usually dies after 2min. I'll let you know if it makes it to the end...

Thanks for your help!!

khusmann's picture
Offline
Joined: 08/11/2019 - 19:19
Can confirm

Just wanted to confirm that this is working for the large multigroup models I was trying to run. Thanks again!

khusmann's picture
Offline
Joined: 08/11/2019 - 19:19
uh oh

Is there any way these changes could result in a segfault? I'm getting this error occasionally when I'm running:

 
  *** caught segfault ***                                                                                                                                                   
 address (nil), cause 'memory not mapped'                                                                                                                                   
 
 Traceback:                                                                                                                                                                 
  1: runHelper(model, frontendStart, intervals, silent, suppressWarnings,     unsafe, checkpoint, useSocket, onlyFrontend, useOptimizer,     beginMessage)                  
  2: mxRun(model = model, suppressWarnings = T, unsafe = T, silent = T,     intervals = intervals, beginMessage = T)                                                        
  3: runWithCounter(model, numdone, silent, intervals = F)                                                                                                                  
  4: doTryCatch(return(expr), name, parentenv, handler)                                                                                                                     
  5: tryCatchOne(expr, names, parentenv, handlers[[1L]])                                                                                                                    
  6: tryCatchList(expr, classes, parentenv, handlers)                                                                                                                       
  7: tryCatch(expr, error = function(e) {    call <- conditionCall(e)    if (!is.null(call)) {        if (identical(call[[1L]], quote(doTryCatch)))             call <- sys. call(-4L)        dcall <- deparse(call)[1L]        prefix <- paste("Error in", dcall, ": ")        LONG <- 75L        sm <- strsplit(conditionMessage(e),                   "\n")[[1L]]        w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w")        if (is.na(w))             w <- 14L + nchar(dcall, type = "b") +                   nchar(sm[1L],                 type = "b")        if (w > LONG)             prefix <- paste0(prefix, "\n  ")    }    else prefix <- "Error : "    msg <- paste0(prefix,      conditionMessage(e), "\n")    .Internal(seterrmessage(msg[1L]))    if (!silent && isTRUE(getOption("show.error.messages"))) {        cat(msg, file = outFile)        .      Internal(printDeferredWarnings())    }    invisible(structure(msg, class = "try-error", condition = e))})                                                                  
  8: try(runWithCounter(model, numdone, silent, intervals = F))                                                                                                             
  9: withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning"))                                                                                        
 10: suppressWarnings(try(runWithCounter(model, numdone, silent, intervals = F)))                                                                                           
 11: mxTryHard(model = model, greenOK = greenOK, checkHess = checkHess,     finetuneGradient = finetuneGradient, exhaustive = exhaustive,     OKstatuscodes =                OKstatuscodes, wtgcsv = wtgcsv, ...)                                                                                                                                       
 12: mxTryHardOrdinal(model, intervals = T, showInits = T)                                                                                                                  
 An irrecoverable exception occurred. R is aborting now ... 

Let me know if there's any other debug info I can send on this!

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
yikes!

Does this happen with mxFitFunctionML(jointConditionOn = "continuous") too?

khusmann's picture
Offline
Joined: 08/11/2019 - 19:19
parallel problems

I'll give jointConditionOn a shot in a bit; the cluster I'm working on is down right now for maintenance. One thing I figured out though, is that it works just fine if I limit it to a single thread...

I'm wondering now if it might have to do with the way the cluster will suspend low priority jobs and then resume when resources become available again; I'm wondering if OpenMx might not respond well to this when working with multiple threads. Once the server gets back up I'll see if I can run it in a way that I can be sure it won't get suspended and let you know. (I'll try the "continuous" option as well)

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
threads

> if it might have to do with the way the cluster will suspend low priority jobs and then resume when resources become available again

No, if it obtains a SEGV then it's an OpenMx bug. If you can provide a gdb stack trace then it might help track the problem down.

khusmann's picture
Offline
Joined: 08/11/2019 - 19:19
gdb stack trace

Here's the stack trace -- let me know if this is enough or I should recompile OpenMx with debugging (is that enabled in the ./configure script?)

...
[New Thread 0x2aaabb963700 (LWP 16596)]
[New Thread 0x2aaabb15f700 (LWP 16597)]
[New Thread 0x2aaabaf5e700 (LWP 16598)]
[New Thread 0x2aaabbd65700 (LWP 16599)]
[New Thread 0x2aaabb762700 (LWP 16600)]
[New Thread 0x2aaabb561700 (LWP 16601)]
[New Thread 0x2aaabb360700 (LWP 16602)]
[New Thread 0x2aaabbb64700 (LWP 16603)]
[Thread 0x2aaabbb64700 (LWP 16603) exited]
[Thread 0x2aaabb360700 (LWP 16602) exited]
[Thread 0x2aaabaf5e700 (LWP 16598) exited]
[Thread 0x2aaabb561700 (LWP 16601) exited]
[Thread 0x2aaabb762700 (LWP 16600) exited]
[Thread 0x2aaabbd65700 (LWP 16599) exited]
 
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2aaabb963700 (LWP 16596)]
0x00002aaab81a9474 in phi_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.212.el6_10.3.x86_64 sssd-client-1.13.3-60.el6_10.2.x86_64
(gdb)
(gdb) bt
#0  0x00002aaab81a9474 in phi_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so
#1  0x00002aaab81a9658 in limits_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so
#2  0x00002aaab81acc89 in master.0.mvnfnc_ () from /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so
#3  0x0000000000000000 in ?? ()

I'll try "continuous" next to see if it crashes too.

khusmann's picture
Offline
Joined: 08/11/2019 - 19:19
valgrind

Ok, now running it with valgrind (with debug symbols) and getting this (even when using only a single thread):

...
==104811== Invalid write of size 8
==104811==    at 0x1EC12AD5: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811==    by 0x270000022E: ???
==104811==  Address 0x7fec08b80 is on thread 1's stack
==104811==
==104811== Invalid write of size 8
==104811==    at 0x1EC12ADA: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811==    by 0x270000022E: ???
==104811==  Address 0x7fec08b68 is on thread 1's stack
==104811==
==104811== Invalid write of size 8
==104811==    at 0x1EC12AE8: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811==    by 0x270000022E: ???
==104811==  Address 0x7fec08b60 is on thread 1's stack
==104811==
==104811== Invalid read of size 4
==104811==    at 0x1EC114A1: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811==    by 0x270000022E: ???
==104811==  Address 0x7febf91fc is on thread 1's stack
==104811==
==104811== Invalid read of size 8
==104811==    at 0x1EC11500: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811==    by 0x270000022E: ???
==104811==  Address 0x7fec08b80 is on thread 1's stack
==104811==
==104811== Invalid read of size 8
==104811==    at 0x1EC11505: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811==    by 0x270000022E: ???
==104811==  Address 0x7fec08b50 is on thread 1's stack
==104811==
==104811== Invalid write of size 8
==104811==    at 0x1EC1150D: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811==    by 0x270000022E: ???
==104811==  Address 0x7fec08bc8 is on thread 1's stack
==104811==
==104811== Invalid write of size 8
==104811==    at 0x1EC11516: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811==    by 0x270000022E: ???
==104811==  Address 0x7fec08bb0 is on thread 1's stack
...
==104811== Invalid write of size 4
==104811==    at 0x1EC16040: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811==    by 0x3FE3E3C19C7D4003: ???
==104811==    by 0x3F00D0619426B3E0: ???
==104811==    by 0x3EB2FBF51BD20191: ???
==104811==    by 0x3E930B23957D69FE: ???
==104811==    by 0x3ED66A84782075BE: ???
==104811==    by 0x3ED095BFE1A40C42: ???
==104811==    by 0x3E9ACE8B5731D3C1: ???
==104811==    by 0x3EDC1B33D305BFEE: ???
==104811==    by 0x3EB8689EEF212500: ???
==104811==    by 0x3ED8F95C5F95FCB9: ???
==104811==    by 0x3ED08EBC6EC100EE: ???
==104811==  Address 0x7febf91f8 is on thread 1's stack
==104811==
==104811== Invalid write of size 8
==104811==    at 0x1EC13AB8: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811==    by 0xBFA9B4BB3ADE24DC: ???
==104811==    by 0x1: ???
==104811==    by 0x7FEBF8DBF: ???
==104811==    by 0x7: ???
==104811==    by 0x1EE880C3: ???
==104811==    by 0x1EC16427: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811==    by 0x7FEBF920F: ???
==104811==    by 0x7FEBF9217: ???
==104811==  Address 0x7febf9218 is on thread 1's stack
==104811==
==104811== Invalid read of size 4
==104811==    at 0x1EC168CE: ??? (in /storage/work/k/kdh38/eclsk-dcs/.snakemake/conda/2d6218a5/lib/R/library/OpenMx/libs/OpenMx.so)
==104811==    by 0xBFE3D3E0E12CDA09: ???
==104811==    by 0x7FEC089FF: ???
==104811==    by 0x7FEBFD047: ???
==104811==    by 0x7FEC00E3F: ???
==104811==    by 0x4: ???
==104811==    by 0x7FEC08A2F: ???
==104811==    by 0xB: ???
==104811==    by 0x7FEC08B7F: ???
==104811==    by 0x7FEBF921F: ???
==104811==    by 0x7FEBF8F8F: ???
==104811==    by 0x2: ???
==104811==  Address 0x7febf91f8 is on thread 1's stack
...

It gets over 1000 errors before even reaching "Begin Initial Fit Attempt". Maybe an off by 1 error somewhere?

I don't get errors on simple models... Maybe it's something to do with ordinal or multigroup code? I'll have to work sometime later to narrow it down to a minimal repeatable example.

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
errors

Gah! I can't recall which version you're trying. Maybe you can update to the HEAD of master?

khusmann's picture
Offline
Joined: 08/11/2019 - 19:19
updated

Just updated to HEAD of master and still get the memory leaks. I'll start an issue on github once I narrow it down to a MWE

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
github

Also, since you're not running a release, maybe we should open a GitHub issue and troubleshoot there. These forums are mostly for modeling questions. Software problems are more suited to GitHub.

khusmann's picture
Offline
Joined: 08/11/2019 - 19:19
fitfunctions

is it possible to change the fitfunction of an already built model? I'm using umxSuperModel to build my multigroup, which uses mxFitFunctionMultigroup under the hood. Can I take the super model and then change the fitfunction after it's already built, or do I need to stop using supermodel and build the multigroup model manually?

tbates's picture
Offline
Joined: 07/31/2009 - 14:25
Add a fitfunction to an existing model to overwrite the old one

Using umxSuperModel is fine.

To swap the fit function for a model, just add it to the model, and it will replace the existing one. So

m2 = mxModel(m1, mxFitFunctionWLS())
khusmann's picture
Offline
Joined: 08/11/2019 - 19:19
Thanks!

Thanks Tim!

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
devtools

> is there a way to do a cran-install from install_github?

Sorry, no.

khusmann's picture
Offline
Joined: 08/11/2019 - 19:19
bad thresholds

yup, definitely have extreme responses:

(# true / # false)
Indicator 1: 2051 / 11147
Indicator 2: 2182 / 9264
Indicator 3: 2917 / 7715
Indicator 4: 2432 / 7411
Indicator 5: 2136 / 7063

any way to handle these?

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
not too extreme

Those data don't seem all that extreme. Seems like the model should work.