Confidence Intervals
Posted on

Attachment | Size |
---|---|
refined CSS Model 2.R | 4.8 KB |
Forums
I'm not sure the best way to get confidence intervals for my estimates. I'm doing a bifactor model with a large dataset (HRS) with lots of missing data, and the mxCI are taking impossibly long (days long).
Anyone have suggestions? Would bootstrapping be faster? If so, what would that script look like? (I've never bootstrapped before)
Definitely bootstrap
You're right that you should use bootstrapping. If you're not familiar with bootstrapping, the idea is that you create a sampling distribution for a statistic by repeatedly drawing samples with replacement from your data and then computing the statistic for each redrawn sample.
You have two major choices when it comes to bootstrapping. If you have raw data, you can use empirical bootstrapping, where you draw samples from your actual data and run models on them. As an alternative, if you are willing to accept distributional assumptions, you can combine bootstrapping with Monte Carlo simulation and generate simulated samples based on the expected or observed covariance matrix from your model. This is your only option if you don't have raw data, and it has also received quite a bit of support in simulation studies.
Here is a set of functions to generate bootstrapped confidence intervals for your model.
First, this is a helper function to extract the expected covariance matrix from an mxModel:
umxCov <- function(model,latent=TRUE, manifest=TRUE){
mA <- mxEval(A,model)
mS <- mxEval(S,model)
mI <- diag(1, nrow(mA))
mE <- solve(mI - mA)
mCov <- (mE) %*% mS %*% t(mE) # The model-implied covariance matrix
mV <- NULL
if(latent) mV <- model@latentVars
if(manifest) mV <- c(mV,model@manifestVars)
return(mCov[mV, mV]) # return only the selected variables
}
Second, here is the actual bootstrap function. This function has these parameters:
- model is an optimized mxModel
- samp is the raw data matrix used to estimate model
- type is the kind of bootstrap you want to run. "par.expected" and "par.observed" use parametric Monte Carlo bootstrapping based on your expected and observed covariance matrices, respective. "empirical" uses empirical bootstrapping based on samp.
- std specifies whether you want CIs for unstandardized or standardized parameters
- rep is the number of bootstrap samples to compute.
- conf is the confidence value
- dat specifies whether you want to store the bootstrapped data in the output (useful if you want to do multiple different analyses, such as mediation analyses)
umxCIpboot = function(model, samp=NULL, type=c("par.expected", "par.observed", "empirical"), std=TRUE, rep=1000, conf=95, dat=FALSE) {
if(type=="par.expected") exp = umxCov(model,latent=FALSE)
if(type=="par.observed") {
if(model$data@type=="raw") {
exp = var(mxEval(data,model))
} else { if(model$data@type=="sscp") {
exp = mxEval(data,model)/(model$data@numObs-1)
} else {
exp = mxEval(data,model)
}
}
}
}
N = round(model@data@numObs)
pard = t(data.frame("mod"=summary(model)$parameters[,5+2*std],row.names=summary(model)$parameters[,1]))
pb = txtProgressBar(min=0,max=rep,label="Computing confidence intervals",style=3)
#####
if(type=="empirical") {
if(length(samp)==0) {
if(model$data@type=="raw") samp = mxEval(data,model) else stop("No raw data supplied for empirical bootstrap.")
}
for(i in 1:rep){
bsample.i = sample.int(N,size=N,replace=TRUE)
bsample = var(samp[bsample.i,])
mod = mxRun(mxModel(model,mxData(observed=bsample,type="cov",numObs=N)),silent=TRUE)
pard = rbind(pard,summary(mod)$parameters[,5+2*std])
rownames(pard)[nrow(pard)]=i
setTxtProgressBar(pb,i)
}
}
else {
for(i in 1:rep){
bsample = var(mvrnorm(N,rep(0,nrow(exp)),exp))
mod = mxRun(mxModel(model,mxData(observed=bsample,type="cov",numObs=N)),silent=TRUE)
pard = rbind(pard,summary(mod)$parameters[,5+2*std])
rownames(pard)[nrow(pard)]=i
setTxtProgressBar(pb,i)
}
}
low=(1-conf/100)/2
upp=((1-conf/100)/2)+(conf/100)
LL=apply(pard,2,FUN=quantile,probs=low) #lower limit of confidence interval
UL=apply(pard,2,FUN=quantile,probs=upp) #upper quantile for confidence interval
LL4=round(LL,4)
UL4=round(UL,4)
ci = cbind(LL4, UL4)
colnames(ci) = c(paste((low*100),"%",sep=""),paste((upp*100),"%",sep=""))
p = summary(model)$parameters[,c(1,2,3,4,c(5:6+2*std))]
if(dat) return(list("Type"=type,"bootdat"=data.frame(pard),"CI"=cbind(p,ci)))
return(list("CI"=cbind(p,ci)))
}
Log in or register to post comments
In reply to Definitely bootstrap by bwiernik
added to library(umx)
I've added them to the umx library with some modifications and crediting this conversation and your name. Comments on all umx package functions welcome.
library(devtools)
install_github("tbates/umx")
library(umx)
?umxCI_boot #shows usage example
?umxGetExpectedCov
Log in or register to post comments
In reply to added to library(umx) by tbates
Trouble getting CI for twin ACE model
umxCI_boot(model = twinACEModel, type = "par.observed")
Error in umxCI_boot(model = twinACEModel, type = "par.observed") :
trying to get slot "type" from an object of a basic class ("NULL") with no slots
What am I doing wrong? What is the difference between the type = "par.observed" and type = "par.expected" options?
Log in or register to post comments
In reply to Trouble getting CI for twin ACE model by martonandko
Update
source("http://openmx.psyc.virginia.edu/getOpenMx.R")
library(OpenMx)
library(devtools)
install_github("tbates/umx")
library(umx)
Log in or register to post comments
In reply to Update by mhunter
Same problem
Log in or register to post comments
In reply to Same problem by martonandko
Where's mxData?
model$data
Without a data object, the bootstrap function can't simulate samples or resample data points to do the bootstrap. How did you estimate twinACEFit without an mxData statement?
Log in or register to post comments
In reply to Where's mxData? by bwiernik
Data are in submodels
twinACEFit
(ortwinACEModel
) contains no MxData object, its submodels 'MZ' and 'DZ' do.Log in or register to post comments
In reply to Data are in submodels by RobK
That's the issue
Log in or register to post comments
In reply to That's the issue by bwiernik
Calculate CI-s using for cycle?
Log in or register to post comments
In reply to Calculate CI-s using for cycle? by martonandko
Care with naive bootstrap
To replace data, I would use something like
newBootStrapSampleMxData <- mxData(newStuff)
mymodel <- mxModel(mymodel, newBootStrapSampleMxData)
Log in or register to post comments
In reply to Care with naive bootstrap by neale
Some bootstrap samples don't converge, omit them?
In case of small samples where bootstrap samples might result in a distorted sample distributions and thus the model might not converge, is it reasonable to only include bootstrap samples into the bootstrap calculations where the model was able to converge, where the Mx status is Green?
Log in or register to post comments
Check identification
With factor models, there is a risk that factor loadings from a factor could all flip sign, and provide identical fit. This type of under-identification could mess up all three types of confidence interval estimation, but I think you are immune to that potential problem here because you have fixed a factor loading to 1 and estimated factor variances instead.
Log in or register to post comments