Hello, I often got NA in confidence interval when fitting univariate or bivariate SEM and changing optimizers didn't help. Here is the code of a bivariate ACE model, and the output when I use 'SLSQP' optimizer . I wonder how can I aviod NA when estimating CI of parameters and algebra. Thank you so much!
vars <-c('tea_5g','smk_num_4g') covars <- c("age","region_type") nv <- 2 # number of variables per twin ncv <- 2 # number of covariates ntv <- nv*2 # number of variables per pair nth <- 3 # number of max thresholds = the number of category - 1 selVars <-paste(vars,c(rep(1,nv),rep(2,nv)),sep="") #c('tea2g1', 'smk_num_4g1', 'tea2g2', 'smk_num_4g2' )# covVars <- paste(covars,c(rep(1,ncv),rep(2,ncv)),sep="") mzData <- subset(tea4g_smk4g_nodrink,zygosity1==0,c(selVars,covVars)) dzData <- subset(tea4g_smk4g_nodrink,zygosity1==1,c(selVars,covVars)) sum(is.na(mzData)) sum(is.na(dzData)) mzData <- na.omit(mzData) dzData <- na.omit(dzData) dim(mzData);dim(dzData) # Generate Descriptive Statistics mzData[,selVars] <-mxFactor(mzData[,selVars], levels=c(0:nth) ) dzData[,selVars] <-mxFactor(dzData[,selVars], levels=c(0:nth) ) # Set Starting Values svb <- 0.1 # sv for beta,get from tryhardordinal svLTh <- -1.5 # start value for first threshold svITh <- 1 # start value for increments svTh <- matrix(rep(c(svLTh,(rep(svITh,nth-1)))),nrow=nth,ncol=nv) # start value for thresholds lbTh <- matrix(rep(c(-3,(rep(0.001,nth-1))),nv),nrow=nth,ncol=nv) # lower bounds for thresholds svCor <- .5 # start value for correlations # Create Labels labTh <- function(lab,vars,nth) { paste(paste("t",1:nth,lab,sep=""),rep(vars,each=nth),sep="") } labdata <- cbind(paste("data.",covars[1],c(1,2),sep = ""),paste("data.",covars[2],c(1,2),sep = "")) #c("data.age1","data.age2","data.region_type1","data.region_type2") labeta <-cbind(paste("B",covars,"_",vars[1],sep = ""),paste("B",covars,"_",vars[2],sep = "")) lathmz <- c(labTh("MZ","tea_5g",nth),labTh("Z","smk_num_4g",nth),labTh("MZ","tea_5g",nth),labTh("Z","smk_num_4g",nth)) lathdz <- c(labTh("DZ","tea_5g",nth),labTh("Z","smk_num_4g",nth),labTh("DZ","tea_5g",nth),labTh("Z","smk_num_4g",nth)) ### Some parameters included in all "submodels": baseACE <- mxModel('Base', mxMatrix( type="Full", nrow=ncv, ncol=nv, free=T, values=svb,labels=labeta, name="Beta"), mxMatrix( type="Zero", nrow=1, ncol=nv, name="meanG" ), # or type="Zero" # Create Matrices for Variance Components mxMatrix( type="Symm", nrow=nv, ncol=nv, free=TRUE, values=c(2,0.1,2), name="A" ), # label=laLower("A",nv), values=valDiag(svPa,nv),# mxMatrix( type="Symm", nrow=nv, ncol=nv, free=TRUE, values=c(2,0.1,2), name="C" ), mxMatrix( type="Symm", nrow=nv, ncol=nv, free=c(F,T,F), values=c(1, 0.1, 1), name="E" ),# fix E to avoid warning,without warning, E can be freely estimated mxAlgebra( expression=A+C+E, name="V" ), mxMatrix( type="Iden", nrow=nv, ncol=nv, name="I"), mxAlgebra( expression=solve(sqrt(I*V)), name="iSD"), mxAlgebra( expression=solve(sqrt(I*V))%&%V, name ="Rph"), mxAlgebra( expression=solve(sqrt(I*A))%&%A, name ="Ra" ), #cov2cor() mxAlgebra( expression=solve(sqrt(I*C))%&%C, name ="Rc" ), mxAlgebra( expression=solve(sqrt(I*E))%&%E, name ="Re" ), # Algebra to compute standardized variance components mxAlgebra( expression=A/V, name="h2"), mxAlgebra( expression=C/V, name="c2"), mxAlgebra( expression=E/V, name="e2"), # the proportion of ace contributing to the overlap mxAlgebra( expression=sqrt(h2[1,1])*sqrt(h2[2,2])*Ra[1,2]/Rph[1,2],name = "pa"), mxAlgebra( expression=sqrt(c2[1,1])*sqrt(c2[2,2])*Rc[1,2]/Rph[1,2],name = "pc"), mxAlgebra( expression=sqrt(e2[1,1])*sqrt(e2[2,2])*Re[1,2]/Rph[1,2],name = "pe") ) modelMZ <- mxModel( "MZ", mxMatrix( type="Full", nrow=2, ncol=ncv, free=F, label=labdata, name="MZDefVars"), mxAlgebra( expression= cbind(Base.meanG + MZDefVars[1,] %*% Base.Beta,Base.meanG + MZDefVars[2,] %*% Base.Beta), name="expMeanMZ" ), mxMatrix( type="Full", nrow=nth, ncol=ntv, free=T, values=svTh, labels=lathmz, name="ThMZ"), mxMatrix( type="Lower", nrow=nth, ncol=nth, free=FALSE, values=1, name="inc" ), mxAlgebra( expression= inc %*% ThMZ, name="threMZ" ), mxAlgebra( expression= rbind( cbind(Base.A+Base.C+Base.E , Base.A+Base.C), cbind(Base.A+Base.C , Base.A+Base.C+Base.E)), name="expCovMZ" ), mxData( observed=mzData, type="raw" ), mxExpectationNormal( covariance="expCovMZ", means="expMeanMZ", dimnames=selVars, thresholds="threMZ" ), mxFitFunctionML() ) modelDZ <- mxModel( "DZ", mxMatrix( type="Full", nrow=2, ncol=ncv, free=F, label=labdata, name="DZDefVars"), mxAlgebra( expression= cbind(Base.meanG + DZDefVars[1,] %*% Base.Beta,Base.meanG + DZDefVars[2,] %*% Base.Beta), name="expMeanDZ" ), mxMatrix( type="Full", nrow=nth, ncol=ntv, free=T, values=svTh,labels=lathdz, name="ThDZ"), mxMatrix( type="Lower", nrow=nth, ncol=nth, free=FALSE, values=1, name="inc" ), mxAlgebra( expression= inc %*% ThDZ, name="threDZ" ), mxAlgebra( expression= rbind( cbind(Base.A+Base.C+Base.E , 0.5%x%Base.A+Base.C), cbind(0.5%x%Base.A+Base.C , Base.A+Base.C+Base.E)), name="expCovDZ" ), mxData( observed=dzData, type="raw" ), mxExpectationNormal( covariance="expCovDZ", means="expMeanDZ", dimnames=selVars, thresholds="threDZ" ), mxFitFunctionML() ) Conf1 <- mxCI (c ('Base.h2[1,1]','Base.c2[1,1]','Base.e2[1,1]','Base.h2[2,2]','Base.c2[2,2]','Base.e2[2,2]')) Conf2 <- mxCI (c ('Base.Ra[2,1]', 'Base.Rc[2,1]', 'Base.Re[2,1]','Base.Rph[2,1]') ) Conf3 <- mxCI (c ("Base.pa","Base.pc","Base.pe") ) AceModel <- mxModel( "ACE", baseACE,modelMZ, modelDZ, Conf1, Conf2,Conf3, mxFitFunctionMultigroup(c('MZ.fitfunction','DZ.fitfunction'))) # ------------------------------------------------------------------------------ # 4) RUN AceModel AceFit <- mxTryHardOrdinal(AceModel, intervals=T,extraTries = 15,OKstatuscodes=c(0,1,5)) (AceSumm <- summary(AceFit,verbose = T))
First off, is your installation of OpenMx NPSOL-enabled? You can tell from the output of
mxVersion()
. If it is, and you haven't tried switching optimizers to NSPOL, then try that first.Another possibility is to use
mxBootstrap()
, and then usemxBootstrapEval()
andmxBootstrapEvalByName()
to get confidence intervals for elements of MxAlgebras.I expect bootstrapping will work OK if you use enough replications, but if not, you could make a custom compute plan to use a derivative-free optimizer for the confidence-limit search:
If all else fails, you could use
mxSE()
to get standard errors for your MxAlgebra elements for which you still don't have valid CIs, and form Wald confidence intervals for them. But, only do that as a last resort.I have already tried NSPOL and CI still contained NA.
I am sorry it is my first time to use
mxBootstrap()
, and I don't know why this error was turned out after running this code:testBoot <- mxBootstrap(AceFit,10)
and the error was:
Error in strsplit(e1, imxSeparatorChar, fixed = TRUE) :
non-character argument
Can you tell what was wrong?
Thanks!
You may be encountering a bug. Try running
traceback()
right after you get the error message and report what it says.Unfortunately this is not helpful. I already figured out the call stack. The question is how to reproduce the problem. See my post below.
diana, what is your
mxVersion()
output?> mxVersion()
OpenMx version: 2.12.2 [GIT v2.12.2]
R version: R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32
Default optimizer: NPSOL
NPSOL-enabled?: Yes
OpenMP-enabled?: No
The latest release of OpenMx is v2.14.11. Try updating, and see if the issue persists afterward.
I updated OpenMx to the latest version but it didn't work.
> mxVersion()
OpenMx version: 2.14.11 [GIT v2.14.11-dirty]
R version: R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32
Default optimizer: NPSOL
NPSOL-enabled?: Yes
OpenMP-enabled?: No
> testBoot <- mxBootstrap(AceFit,10)
Error in strsplit(e1, imxSeparatorChar, fixed = TRUE) :
non-character argument
> traceback()
6: strsplit(e1, imxSeparatorChar, fixed = TRUE)
5: unlist(strsplit(e1, imxSeparatorChar, fixed = TRUE))
4: FUN(X[[i]], ...)
3: vapply(data, function(e1) {
path <- unlist(strsplit(e1, imxSeparatorChar, fixed = TRUE))
if (length(path) == 1) {
e1 <- paste(path, "data", sep = imxSeparatorChar)
}
e1
}, "")
2: mxComputeBootstrap(data, model@compute)
1: mxBootstrap(AceFit, 10)
Please provide a minimal working example
Thank you very much. I am trying to sample the data and make sure the data can reproduce the problem.
Hello! I am so sorry for asking questions without a minimal working example. My data generated by
mxGenerateData()
was uploaded and the code was paseted above.Thank you very much for your help!
I second Joshua's request for a minimal working example.
mxGenerateData()
may be useful for the purpose.Thank you so much! The data I uploaded was generated by the code:
The code entailed to reproduce the problem is above pasted and you can skip the step of generating the mzData and dzData.
So, all in all, the problem I have is the NA in the CI for elements of MxAlgebras, and the error of running
testBoot <- mxBootstrap(AceFit,10)
Thank you very much for your help !
Ah, finally. This bug is fixed in v2.14.11-24-gfd7b726f5. I think you can get your code working without an upgrade if you do,
Thanks for your help.
Thank you for your debugging, but there still exists an error.
results:
Running ACE with 20 parameters
MxComputeGradientDescent(NPSOL) evaluations 52732 fit 56009.5 change -8.307e-005Error in runHelper(model, frontendStart, intervals, silent, suppressWarnings, :
Can only compute CIs once
Yes, if you're going to use this work-around then you'll need to omit mxCI from the bootstrapped model.
1) When I use mxCI to compute CI, Base.Ra is 0.79(NA,NA)
2) When I use
Base.Ra is 0.79(0.18,NA)
When I use
Base.Ra is 0.79(0.18,NA)
3) When I use
result is :
SE 2.5% 97.5%
[1,] 1.790181e-16 1.0000000 1.00000
[2,] 2.512318e-01 0.5509924 1.20444
[3,] 2.512318e-01 0.5509924 1.20444
[4,] 1.489520e-16 1.0000000 1.00000
How can I get the value and CI of Ra?
4)When I use
an error and two warnings exist:
Error: The following error occurred while evaluating the subexpression 'solve(sqrt(Base.I * Base.A))' during the evaluation of 'Base.Ra' in model 'ACE' : Lapack routine dgesv: system is exactly singular: U[1,1] = 0
In addition: Warning messages:
1: Only 38% of the bootstrap replications converged acceptably. Accuracy is much less than the 50 replications requested. Examine table(model$compute$output$raw$statusCode)
2: In sqrt(c(-0.137782917394753, 0, 0, 0.97874966037632)) : NaNs produced
I wonder why the error and warnings exist after I change the number of resampling replications, is there anything wrong?
Is it ok to use the result calculated by replying resample 10 times?
In your script, 'Base.Ra' is an MxAlgebra that evaluates to a 2x2 matrix.
mxBootstrapEvalByName()
is outputting the standard errors and confidence limits for the 4 elements of that matrix. To see the elements themselves, just domxEvalByName("Base.Ra", AceFit, T)
ormxEval(Ra, AceFit$Base, T)
.More replications means more calls to
mxRun()
, and therefore more opportunities for errors or warnings to be raised. That's all there is to it.Absolutely not. If you want 95% CIs, you need to be using at minimum 1000 replications.
What does the corresponding row of the CI details table say about that upper confidence limit?
Thanks for your reply!
The CI details are in the picture I uploaded.
It seems that those two methods both fails, and I have to use
mxSE()
to calculate CI, is that right?Any luck with bootstrapping? If not, then it still looks like you have valid CIs for everything you want except
Base.Ra[2,1]
, right? So, you'd only need to get SEs for that algebra element.Something I forgot to mention is that, if you're going to calculate CIs from SEs, they ought to be robust SEs. I suggest the following syntax:
The second line will output the SE of the genetic correlation.
BTW, which method did you use to get the results shown in your attachment, CIdetails.png ?
1)The code getting the results shown in that attachment is as follows:
And I want to make sure if the CI of the Base.Ra is value±1.96*SE, is that right?
2) There exists an error when I run
imxRobustSE()
right after running the AceModel.> AceFitSE <- imxRobustSE(AceFit)
Error in if (grep(pattern = model@submodels[[i]]$name, x = contributingModelNames)) { :
argument is of length zero
I also don't know why(T_T)
OK, a Nelder-Mead success story.
Correct (assuming you want a 95% CI).
That's another bug. But you can work around it by changing the fitfunction object in your MxModel, like this:
I'm sorry, do you mean run
AceFit$fitfunction$groups <- c("MZ","DZ")
right after running
AceFit <- mxTryHardOrdinal(AceModel, intervals=T,extraTries = 15,OKstatuscodes=c(0,1,5),exhaustive=F)
?If so, running
AceFit <- imxRobustSE(AceFit)
still reports the same error as before.In that case, when you make
AceModel
, domxFitFunctionMultigroup(c('MZ','DZ'))
instead. And if that doesn't help, then I must not understand the nature of this bug.It turns out I did misunderstand the nature of the bug. I pushed a commit that prevents the specific error you're seeing, but it still doesn't make
imxRobustSE()
work with your model. I think one possible workaround might be to put your "Base" MxModel inside your MZ and DZ MxModels. But, I'm not sure it's worth the trouble for just one confidence interval. You could just proceed directly tomxSE("Base.Ra[2,1]", AceFit)
.Edit: if you try the workaround, you will see warnings about "submodels of submodels". In your case those warnings are ignorable, since "Base" contains no data of its own.
I can execute
summary(mxBootstrap(twinACEFit, 10))
against univACEP. So it would be helpful if you could post exact instructions to reproduce the problem.There are two things i observed with ML based CIs in openmx: first is the number of NAs you get depend on the starting values, even if your model is fitted, the CI estimation depends on the starting values you given. Second: the problem occurs more i think, when there is a possibility of values around zero, or either a negative or positive value can appaear in the algebra you would like to compute the CI for. Maybe it has somthing to do with possible singularities in an algebra, i dont know. For example in my cholesky models i get NA-s for the cross correlation oftne times, which is H/V or U/V etc, which has singularities where the elements of V can be zeros, so estimation around zeros could be problematic? I am not sure, since i dont know how the numeric algorithms work at all, just some guesses, and observations. One advice: try to set the starting values as the exact estimates, sometimes it can reduce, or even eliminate NAs in the CIs.
For the mods: if i said somthing misleading, or plain wrong, feel free to delete my comment, i just wanted to state some observations, maybe they can be helpful, but maybe i am in the wrong.
Thanks for your suggestion! I will try to change start values and see if it works.
ReddyM, I'm not sure I follow your post. Could you describe the specific procedure you've used to "set the starting values to the exact estimates" that leads to different results for the confidence limits? Does your procedure throw warnings about the model being changed since it was last run?
I run the model with some values for the starting values for the free parameters, and after it ran, i rerun the model with different starting values, it did not throw warnings, i reran it from the beginning. I did this because i wanted to see if CIs depend on the choise of starting values or not (not the actual values, that would be strange). The computed limits are the same , but this problem with NAs is not, for some starting values i got more NAs for limits, and for some starting values i got less NAs, i thought maybe if i choose my starting values really close to my estimates it will minimize the number of NAs. But to tell you the truth i didn't really continue with this experimetn, i just used bootstrapping after, so i have no idea how NAs and starting values have this relation. When i did this it kind of looked like the number of NAs are generally less, when i choose the starting values closer to the estimates of the free paramters, but i am not sure.
Classic Mx - which didn’t have this problem afaik - would start CI searches at the solution, which should not vary between runs if the model is identified. If the model uses ordinal FIML there may be slight differences in the solutions due to limited numerical precision. Does CI search start at the solution?
Yes, AFAIK. Which is why I was and remain confused by ReddyM's comment.
I thought it starts at the starting values. That is how i justified this behaviour, but if it starts at the solution, than i have no idea, why this happened... sorry for the confusion.
ReddyM, I believe I owe you an apology. In a different thread, another user shows that confidence-interval results can indeed be influenced by the choice of start values for the primary optimization, even though the different start values lead to the same point estimates at the solution, out to 6 decimal places. I reproduced that user's report and talked it over with the other developers, and we agreed that the confidence-limit search really can be sensitive to differences in the point estimates beyond 7 significant figures. So, you were indeed correct that using different start values can make the difference between a confidence limit being reported as
NA
or not.