You are here

The OpenMx website will be down for maintenance from 9 AM EDT on Tuesday, September 17th, and is expected to return by the end of the day on Wednesday, September 18th. During this period, the backend will be updated and the website will get a refreshed look.

System is exactly singular

6 posts / 0 new
Last post
Julia's picture
Offline
Joined: 03/29/2012 - 09:40
System is exactly singular

Hi all,

I am running a trivariate Cholesky ADE model with one binary and two continuous variables. As profile-based confidence intervals took more than 24h to run and a lot of them were missing I decided to try bootstrapping. I ran mxBootstrap with 1000 replicates to get 95% CI and all looked fine except CI's for dominant genetic correlation rD. I get the following error:

> mxEval(rD, boot$MZ)
[,1] [,2] [,3]
[1,] 1.00000000 -0.56438885 -0.57273008
[2,] -0.56438885 1.00000000 -0.32630007
[3,] -0.57273008 -0.32630007 1.00000000

> mxBootstrapEval(rD, boot, bq=c(0.025,0.975))
Error: The following error occurred while evaluating the subexpression 'solve(sqrt(ADE.I * ADE.D))' during the evaluation of 'rD' in model 'ADE' : Lapack routine dgesv: system is exactly singular: U[1,1] = 0

rD was defined as following:

corD      <- mxAlgebra( expression=solve(sqrt(I*D))%&%D, name ="rD" )

If I am not mistaken the reason for that is the fact that the first element of D was estimated as zero. But I don't know how to get around this problem. Running 1000 replicates for this bootstrap took more than 17 hours, so I would really like to avoid rerunning it again, but I guess this is not possible?

Thank you in advance!
Julia

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
diagnostic & possible workaround
If I am not mistaken the reason for that is the fact that the first element of D was estimated as zero.

Maybe. I would say that the reason is that at least one of the diagonal elements of D was estimated as zero (or possibly that one of the off-diagonal elements of D was estimated as equal to its two corresponding diagonal elements). If you do

omxBootstrapEval(D, boot)

you can see the estimates of D's elements from each bootstrap replication, and verify whether some of the diagonal elements are being estimated at zero. If I correctly understand how omxBootstrapEval() works, then each row of its output corresponds to a bootstrap replication, and the elements of each row of output are the elements of D, in column-major order.

As a workaround, you could try

rDboot <- omxBootstrapEval(cov2cor(D),boot)
apply(rDboot, 2, quantile, probs=c(0.025,0.975), type=1, na.rm=TRUE)

to get ordinary bootstrap quantile intervals.

Perhaps mxBootstrapEval() needs to be made more robust to this sort of thing.

Julia's picture
Offline
Joined: 03/29/2012 - 09:40
Indeed

Thank you, Rob! There were indeed diagonal elements (first elements) in D which were estimated as zeros. In five replications.

> dummy = omxBootstrapEval(D, boot)
> which(dummy[,1]==0)
[1]  77  87 125 972 976
> which(dummy[,5]==0)
integer(0)
> which(dummy[,9]==0)
integer(0)

When I ran what you sugested next, I get the following:

> rDboot <- omxBootstrapEval(cov2cor(D),boot)
Warning messages:
1: In cov2cor(c(0, 0, 0, 0, 0.0436062325150783, 0.0158601213564099,  :
  diag(.) had 0 or NA entries; non-finite result is doubtful
2: In cov2cor(c(0, 0, 0, 0, 0.0602353527164806, 0.0225896913128754,  :
  diag(.) had 0 or NA entries; non-finite result is doubtful
3: In cov2cor(c(0, 0, 0, 0, 0.0105604458655485, 0.00794684550788506,  :
  diag(.) had 0 or NA entries; non-finite result is doubtful
4: In cov2cor(c(0, 0, 0, 0, 0.0433256522219743, 0.0104454428727477,  :
  diag(.) had 0 or NA entries; non-finite result is doubtful
5: In cov2cor(c(0, 0, 0, 0, 0.00102029470219143, 0.00657858226215936,  :
  diag(.) had 0 or NA entries; non-finite result is doubtful
> apply(rDboot, 2, quantile, probs=c(0.025,0.975), type=1, na.rm=TRUE)
      [,1]       [,2]       [,3]       [,4] [,5]       [,6]       [,7]       [,8] [,9]
2.5%     1 -0.9995622 -0.9999974 -0.9995622    1 -0.9999944 -0.9999974 -0.9999944    1
97.5%    1  0.9999901  0.4048525  0.9999901    1  0.9950811  0.4048525  0.9950811    1

I am not sure how reliable those CI's are. Would that be possible/ok to remove those replications where the first element of D got estimated as zero from the CI estimation?

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
rDboot
I am not sure how reliable those CI's are.

I don't know how much D variance you're estimating in each of the three traits, but if it's not very much, then you can get legitimate confidence intervals for the dominance-genetic correlations that span most of [-1,1].

Would that be possible/ok to remove those replications where the first element of D got estimated as zero from the CI estimation?

Since it's only 5 out of 1000 replications, I doubt it will make a difference if you drop them. In effect, the na.rm=TRUE is supposed to already be dropping the NAs from the columns of rDboot.

Julia's picture
Offline
Joined: 03/29/2012 - 09:40
Thanks a lot!

Thanks a lot, Rob! The D variance is indeed small.

I think I will try to increase the number of replications to see whether confidence intervals will get a bit more narrow. Is there any reference which say how many of them it is good to have?

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
B
Is there any reference which say how many of them it is good to have?

I think sometime when I was in grad school I read a publication by Bradley Efron that had rules-of-thumb for setting B, the number of bootstrap replication, but I've never been able to find it. I'll refer you to the following two references, which contain discussion about how to select a value of B to achieve a desired level of Monte-Carlo precision:

  • Efron, B., & Tibshirani, R. J. (1994). An Introduction to the Bootstrap. Boca Raton: Taylor & Francis Group.
  • Shao, J., & Tu, D. (1995). The Jackknife and Bootstrap. New York: Springer-Verlag New York, Inc.

    As discussed in those two books, B can be calculated as a function of sample size, the desired coverage probability of the interval, and certain properties of the true data-generating distribution or of the statistic's true sampling distribution.