You are here

Confidence Intervals

13 posts / 0 new
Last post
a9mike's picture
Offline
Joined: 06/28/2013 - 17:01
Confidence Intervals
AttachmentSize
Binary Data refined CSS Model 2.R4.8 KB

I'm not sure the best way to get confidence intervals for my estimates. I'm doing a bifactor model with a large dataset (HRS) with lots of missing data, and the mxCI are taking impossibly long (days long).

Anyone have suggestions? Would bootstrapping be faster? If so, what would that script look like? (I've never bootstrapped before)

bwiernik's picture
Offline
Joined: 01/30/2014 - 19:39
Definitely bootstrap

Bifactor models typically have issues with Likelihood based confidence intervals, especially if one of your group factors shouldn't really be there (if it's entirely defined by the specific variance of one or two items because the items that load on the group factor really only load onto the general factor).

You're right that you should use bootstrapping. If you're not familiar with bootstrapping, the idea is that you create a sampling distribution for a statistic by repeatedly drawing samples with replacement from your data and then computing the statistic for each redrawn sample.

You have two major choices when it comes to bootstrapping. If you have raw data, you can use empirical bootstrapping, where you draw samples from your actual data and run models on them. As an alternative, if you are willing to accept distributional assumptions, you can combine bootstrapping with Monte Carlo simulation and generate simulated samples based on the expected or observed covariance matrix from your model. This is your only option if you don't have raw data, and it has also received quite a bit of support in simulation studies.

Here is a set of functions to generate bootstrapped confidence intervals for your model.

First, this is a helper function to extract the expected covariance matrix from an mxModel:

umxCov <- function(model,latent=TRUE, manifest=TRUE){
  mA <- mxEval(A,model)
  mS <- mxEval(S,model)
  mI <- diag(1, nrow(mA))
  mE <- solve(mI - mA)
  mCov <- (mE) %*% mS %*% t(mE) # The model-implied covariance matrix
  mV <- NULL
  if(latent) mV <- model@latentVars 
  if(manifest) mV <- c(mV,model@manifestVars)
  return(mCov[mV, mV]) # return only the selected variables
}

Second, here is the actual bootstrap function. This function has these parameters:
- model is an optimized mxModel
- samp is the raw data matrix used to estimate model
- type is the kind of bootstrap you want to run. "par.expected" and "par.observed" use parametric Monte Carlo bootstrapping based on your expected and observed covariance matrices, respective. "empirical" uses empirical bootstrapping based on samp.
- std specifies whether you want CIs for unstandardized or standardized parameters
- rep is the number of bootstrap samples to compute.
- conf is the confidence value
- dat specifies whether you want to store the bootstrapped data in the output (useful if you want to do multiple different analyses, such as mediation analyses)

umxCIpboot = function(model, samp=NULL, type=c("par.expected", "par.observed", "empirical"), std=TRUE, rep=1000, conf=95, dat=FALSE) {
  if(type=="par.expected") exp = umxCov(model,latent=FALSE)
  if(type=="par.observed") {
    if(model$data@type=="raw") {
      exp = var(mxEval(data,model))
    } else { if(model$data@type=="sscp") {
                        exp = mxEval(data,model)/(model$data@numObs-1)
                      } else {
                          exp = mxEval(data,model)
                        }
               }
      }  
  }
  N = round(model@data@numObs)
  pard = t(data.frame("mod"=summary(model)$parameters[,5+2*std],row.names=summary(model)$parameters[,1]))
  pb = txtProgressBar(min=0,max=rep,label="Computing confidence intervals",style=3)
  #####
  if(type=="empirical") {
    if(length(samp)==0) {
      if(model$data@type=="raw") samp = mxEval(data,model) else stop("No raw data supplied for empirical bootstrap.")
    }
    for(i in 1:rep){
      bsample.i = sample.int(N,size=N,replace=TRUE)
      bsample = var(samp[bsample.i,])
      mod = mxRun(mxModel(model,mxData(observed=bsample,type="cov",numObs=N)),silent=TRUE)
      pard = rbind(pard,summary(mod)$parameters[,5+2*std])
      rownames(pard)[nrow(pard)]=i
      setTxtProgressBar(pb,i)
    }
  }
  else {
    for(i in 1:rep){
      bsample = var(mvrnorm(N,rep(0,nrow(exp)),exp))
      mod = mxRun(mxModel(model,mxData(observed=bsample,type="cov",numObs=N)),silent=TRUE)
      pard = rbind(pard,summary(mod)$parameters[,5+2*std])
      rownames(pard)[nrow(pard)]=i
      setTxtProgressBar(pb,i)
    }
  }
  low=(1-conf/100)/2
  upp=((1-conf/100)/2)+(conf/100)
  LL=apply(pard,2,FUN=quantile,probs=low) #lower limit of confidence interval
  UL=apply(pard,2,FUN=quantile,probs=upp) #upper quantile for confidence interval
  LL4=round(LL,4)
  UL4=round(UL,4)
  ci = cbind(LL4, UL4)
  colnames(ci) = c(paste((low*100),"%",sep=""),paste((upp*100),"%",sep=""))
  p   = summary(model)$parameters[,c(1,2,3,4,c(5:6+2*std))]
  if(dat) return(list("Type"=type,"bootdat"=data.frame(pard),"CI"=cbind(p,ci)))
  return(list("CI"=cbind(p,ci)))
}
tbates's picture
Offline
Joined: 07/31/2009 - 14:25
added to library(umx)

That's a nice pair of functions!

I've added them to the umx library with some modifications and crediting this conversation and your name. Comments on all umx package functions welcome.

library(devtools)
install_github("tbates/umx")
library(umx)
?umxCI_boot #shows usage example
?umxGetExpectedCov
martonandko's picture
Offline
Joined: 02/19/2015 - 06:55
Trouble getting CI for twin ACE model

I'm trying to get bootstrapped CI for the estimated variables (A, C, E) using Neale's posted twin ACE script (http://openmx.psyc.virginia.edu/sites/default/files/UnivariateTwinAnalysis_MatrixRaw-3.R) using the umxCI_boot function, but I get the following warning message:

umxCI_boot(model = twinACEModel, type = "par.observed")
Error in umxCI_boot(model = twinACEModel, type = "par.observed") :
trying to get slot "type" from an object of a basic class ("NULL") with no slots

What am I doing wrong? What is the difference between the type = "par.observed" and type = "par.expected" options?

mhunter's picture
Offline
Joined: 07/31/2009 - 15:26
Update

Make sure you have the most recent versions of umx and OpenMx.

source("http://openmx.psyc.virginia.edu/getOpenMx.R")
library(OpenMx)
 
library(devtools)
install_github("tbates/umx")
library(umx)
martonandko's picture
Offline
Joined: 02/19/2015 - 06:55
Same problem

Thank you for your quick reply on both of my questions! Unfortunately I still get the same error message as before after updating OpenMx and umx. I'm attaching the model and the fitted object(just changed the file extension to .txt, so I could upload it), could someone have a look into it, what might be the problem?

bwiernik's picture
Offline
Joined: 01/30/2014 - 19:39
Where's mxData?

The umxCI_boot function is looking for an MxData object in the model from which to run the bootstraps. Neither of the two models you included in your file have anything in their data slot. You can see this by running:

model$data

Without a data object, the bootstrap function can't simulate samples or resample data points to do the bootstrap. How did you estimate twinACEFit without an mxData statement?

RobK's picture
Offline
Joined: 04/19/2011 - 21:00
Data are in submodels

Note that while the "top-level" model, twinACEFit (or twinACEModel) contains no MxData object, its submodels 'MZ' and 'DZ' do.

bwiernik's picture
Offline
Joined: 01/30/2014 - 19:39
That's the issue

I see. I don't do behavioral genetics, so I haven't had much need to use submodels. When I wrote the umxCI_boot function, I didn't have submodels in mind. The function looks for the parameters and data objects for the top-level model. At present, it doesn't work with mxModel objects where the data and parameters are stored in different submodels. I can't make any promises as to if or when I will have the time to expand its functionality, though I know tbates always welcomes patches and pull requests for the umx package. https://github.com/tbates/umx

martonandko's picture
Offline
Joined: 02/19/2015 - 06:55
Calculate CI-s using for cycle?

Thank you all for looking into the problem! Basically I could just run my model in a for cycle replacing the raw data by resampling my original sample with replacement let say 2000 times, and then calculate the path coefficients and standardized variance components each time, and then just take the 2.5% and 97.5% percentile of these distributions to get the CI-s? Is there an easy way of only replacing the raw data of the submodels using the omxSetParameters command, I just can't get it to work, I have to redefine the hole model in every cycle.

neale's picture
Offline
Joined: 07/31/2009 - 15:14
Care with naive bootstrap

In, e.g., a factor model, there can be some invariance to sign. So a factor model in which all loadings are positive gives the same fit as one in which all loadings are negative. You would want to, e.g., constrain the first loading to be positive to avoid very wide (incorrect) estimated bootstrap CI's.

To replace data, I would use something like

newBootStrapSampleMxData <- mxData(newStuff)
mymodel <- mxModel(mymodel, newBootStrapSampleMxData)
martonandko's picture
Offline
Joined: 02/19/2015 - 06:55
Some bootstrap samples don't converge, omit them?

Thanks a lot for straitening things out! Using univariate twin ACE/ADE models, calculating standardized path coefficients (A, C/D, E) for each sample would resolve this problem, since they are always positive, right?
In case of small samples where bootstrap samples might result in a distorted sample distributions and thus the model might not converge, is it reasonable to only include bootstrap samples into the bootstrap calculations where the model was able to converge, where the Mx status is Green?

neale's picture
Offline
Joined: 07/31/2009 - 15:14
Check identification

I assume you have not been able to obtain conventional standard errors? If they look reasonable, they can be used to derive confidence intervals. If these are not available, then take a look at identification. Although the model is likely identified in principle, it is possible that it suffers empirical under-identification, because two factors are essentially identical in structure. A signature of this would be paths estimated at or close to zero. Possibly, fixing one or two of them to zero would clear up the standard error calculation, and alleviate the need to estimate likelihood-based or bootstrap CIs

With factor models, there is a risk that factor loadings from a factor could all flip sign, and provide identical fit. This type of under-identification could mess up all three types of confidence interval estimation, but I think you are immune to that potential problem here because you have fixed a factor loading to 1 and estimated factor variances instead.