You are here

NPSOL and Linux: BLAS/LAPACK routine ' DGEQR' error code -6

14 posts / 0 new
Last post
janneadolf's picture
Offline
Joined: 02/06/2015 - 11:29
NPSOL and Linux: BLAS/LAPACK routine ' DGEQR' error code -6

I am having problems using NPSOL on a linux grid computing system:
After installing OpenMx via getOpenMx.R and setting NPSOL as the default optimiser I run an OpenMx demo script (e.g., OneFactorPathDemo.R) and get the following error message:

Error in runHelper(model, frontendStart, intervals, silent, suppressWarnings, : BLAS/LAPACK routine ' DGEQR' gave error code -6

Things are fine when I use the other optimizers - but were also in the past with NPSOL.

After checking OpenMx/npsol directory, the person in charge of the grid generated the hypothesis that the latest OpenMx version might not include NPSOL for linux users.
Apparently, only windows and osx feature in there. Is this the case?

OpenMx version: 2.8.3 [GIT v2.8.3]
R version: R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (Debian GNU/Linux 9.3 (stable))
Default optimiser: NPSOL
NPSOL-enabled?: Yes
OpenMP-enabled?: Yes

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
gcc version?

NPSOL is not yet supported with gcc 7.x and newer.

janneadolf's picture
Offline
Joined: 02/06/2015 - 11:29
The gcc version is 6.3

The gcc version is 6.3

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
Looks okay?

gcc 6.3 should be fine and your build says "NPSOL-enabled?: Yes" so it seems like NPSOL is enabled. I'm not sure what to suggest.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
Log from compiler, etc.?

Have you inspected the log that gets printed to the terminal during installation of the package, to see if it contains any irregularities (warnings, for example?).

janneadolf's picture
Offline
Joined: 02/06/2015 - 11:29
log file

Yes, there are some warnings, potentially also related to NPSOL. I cannot interpret this easily, so I'll attach the file. Would be of help if you could take a look! (The test.R script that was run just contains a working bit from original OpenMx demo code.)

File attachments: 
AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
How are you installing?

First off, just so you know, when building an NPSOL-enabled package binary under Linux/GNU, the script pulls the NPSOL library, libnpsol.a, from the OpenMx Development Team's virginia.edu server. That's why you only see NPSOL for Windows and MacOS in the source repository.

I don't see anything in the log you posted that looks like an obvious sign of something wrong. How are you installing the package? I'm guessing you've locally cloned the source repository from Github, and are using 'make install' at the command shell?

mkrause's picture
Offline
Joined: 01/22/2018 - 12:01
Hi there!

Hi there!

After checking OpenMx/npsol directory, the person in charge of the grid generated the hypothesis ...

I'm this person. :)

I don't see anything in the log you posted that looks like an obvious sign of something wrong. How are you installing the package? I'm guessing you've locally cloned the source repository from Github, and are using 'make install' at the command shell?

We were building with source('http://openmx.psyc.virginia.edu/getOpenMx.R') .

I tried to create a reproducible environment that generates the error and I think I found the source of the problem. It seems that the error is in combination with libopenblas, which is the default BLAS implementation on our cluster. When using the default libblas3 everything seems to work, but I'm reluctant to just ditch libopenblas because of the peformance gains.

Since we are currently preparing to deploy singularity containers (debian pkgs available at neurodebian) I thought it would be a nice idea to put the erroneous environment in such a container recipe (attached). Note that this recipe will clobber the current directory with R packages.

It should be as simple as:

$ sudo singularity build openmx-issue-4323.simg Recipe.txt
$ singularity run openmx-issue-4323.simg

This also works well with vagrant on macOS.

I know that the container approach would also be the solution to the problem at hand. We could just push an oldstable debian with libopenblas and be done with it, but we're not quite there yet.

What do you suggest we do? Can we easily rebuild against libblas3 and would that hurt performance a lot?
Or is that maybe an indication for a bug in npsol that should be investigated further?

File attachments: 
mkrause's picture
Offline
Joined: 01/22/2018 - 12:01
Recipe doesn't work, sorry

Ugh, of course I made a mistake in the Recipe file.

Here is a container that actually works: https://www.singularity-hub.org/containers/1471

You can download and start the compilation+test with in one go:

singularity run shub://octomike/singularity-containers:openmx-4323
jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
singularity

That's pretty cool that you can reproduce the problem using singularity. However, we still can't determine what to fix because NPSOL is proprietary and source code is only available with a non-disclosure agreement. None of our developers want to try to debug NPSOL. So you have two choices. Either link against regular blas and use NPSOL or link against openblas and don't use NPSOL. We're having trouble supporting NPSOL anyway, it fails with gcc-7, so you might want to move your work to our open-source optimizers anyway.

AdminNeale's picture
Offline
Joined: 03/01/2013 - 14:09
Hoping gfortran 7.2 Npsol library is on its way

I think I've managed to build an NPSOL library under gcc 7.2 - hope to test over the weekend. I like the idea of using libOpenBlas also - performance gains are always nice. For local builds we plan to experiment with the Intel compilers for the same reason.

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
wrong link order?

I have a new idea about the possible cause. Up until a few minutes ago, we linked libraries in this order: $(FLIBS) $(BLAS_LIBS) $(LAPACK_LIBS) $(NPSOL_LDFLAGS). I have corrected the order to: $(NPSOL_LDFLAGS) $(NLOPT_LDFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS). I wonder if this will solve your problem?

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
Interesting. I'll be sure to

Interesting. I'll be sure to bring up this issue with the other developers at our next meeting.

Can we easily rebuild against libblas3 and would that hurt performance a lot?

I suspect it might not be so easy, at least if you've compiled R against libopenblas. Per the R Installation and Administration Manual,

Note that under Unix (but not under Windows) if R is compiled against a non-default BLAS and --enable-BLAS-shlib is not used (it is the default on all platforms except AIX), then all BLAS-using packages must also be. So if R is re-built to use an enhanced BLAS then packages such as quantreg will need to be re-installed; they may be under other circumstances.

As for the performance cost, I am not entirely sure. Note that quite a bit of OpenMx's numerical linear algebra is done in its compiled C++ backend, using the Eigen library (which is not a BLAS implementation). Something you could try is to install OpenMx from CRAN (which doesn't distribute NPSOL), once with libblas3 and again with libopenblas, and compare benchmarks run under both installations. Indeed, if NPSOL isn't critical for your users' purposes, it seems the CRAN build would work OK on your system.

BTW, are you using the same LAPACK with libblas3 and libopenblas? I ask because DGEQR (the cause of the error with NPSOL) is a LAPACK subroutine.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
NPSOL

I, Joshua, and other OpenMx developers talked about this issue at our meeting today. The consensus is essentially what Joshua and I have already posted. NPSOL is proprietary, and therefore we are not keen to try and debug it. In fact, because of its incompatibility with gcc version 7, we are considering the possibility of deprecating it. If you want an NPSOL-enabled build of OpenMx, you'll need to compile it against a BLAS that works with NPSOL. However, we rather doubt you will find a huge difference in OpenMx's performance depending on which BLAS you use, since most of the linear algebra done in OpenMx's backend is done via Eigen, not BLAS.

BTW, are you using the same LAPACK with libblas3 and libopenblas? I ask because DGEQR (the cause of the error with NPSOL) is a LAPACK subroutine.

P.S. We do appreciate your effort at creating a reproducible example of the issue!

Log in or register to post comments