bootstrap coverage probability and the bootstrap replication

Posted on
No user picture. Veronica_echo Joined: 02/23/2018
Hi everyone,

I am conducting a simulation study of a growth model and would like to evaluate the bootstrap CP of it. I kept the simulation replication as 1000 and set bootstrap replication as 1000 and 2000, respectively. The results seemed wiered, since the CPs of bootstrap 1000 (all of them were located between (0.93, 0.97)) were much better than those of bootstrap 2000 (some CPs were quite low, say 0.86). Any advice about this issue? Should I increase the bootstrap replication to a larger number, say 5000? Thank you in advance.

Replied on Mon, 05/07/2018 - 15:36
Picture of user. AdminRobK Joined: 01/24/2014

If your target coverage probability is 0.95 (as your post suggests), then increasing the number of bootstrap replications to 5000 makes sense.
Replied on Tue, 05/08/2018 - 22:21
No user picture. Veronica_echo Joined: 02/23/2018

In reply to by AdminRobK

Thanks for your kind advice! I am trying it now. I have one more question about the simulation study. During the process to have 1,000 effective replications (i.e. no errors, no warnings), I got 283 errors (detailed information could be found in the attached) and 14 warnings (code 6). May I know some possible solutions to this issue? I am using the "ture" values of parameters (the ones I set to generate data) as initial values when fitting the model, so I assume nothing can be done for that part. I am using the default optimizer, do I need try other two? Any advice would be appreciated!
Replied on Wed, 05/09/2018 - 11:28
Picture of user. AdminRobK Joined: 01/24/2014

In reply to by Veronica_echo

During the process to have 1,000 effective replications (i.e. no errors, no warnings), I got 283 errors (detailed information could be found in the attached) and 14 warnings (code 6). May I know some possible solutions to this issue?

If you're using mxRun() to initially (i.e., before any bootstrapping or jackknifing) run the model in each replication, you could replace mxRun() with mxTryHard() or one of its wrappers. If you do, you'll probably want to read the man page for mxTryHard().

Are you running your simulation in a 'for' loop? If so, you could instead run it in a 'while' loop, e.g.,

i <- 1
while(i <= 1000){
# do simulation stuff here;
# be sure to somewhere define boolean variable `modelRanWell` as TRUE if no errors or warnings, FALSE otherwise
if(modelRanWell){i <- i+1}
}

, which will keep it running until it gives you 1000 "effective" replications.

I am using the "ture" values of parameters (the ones I set to generate data) as initial values when fitting the model, so I assume nothing can be done for that part.

I don't know what sort of model you're using to generate and fit to data. But, the error message in your attachment makes it sound as though the model-expected covariance matrix was non-PD at the start values. Using mxTryHard() should help in cases where the start values are poor for the current dataset. You could also calculate empirical start values from the dataset in each replication, but I can't make any specific suggestions there without knowing more about the model you're fitting.

I am using the default optimizer, do I need try other two?

None of the 3 main optimizers can get off the ground if the covariance matrix is non-PD at the start values, and only 14 warnings in 1000 replications is pretty good. It sounds as though CSOLNP, which is the on-load default optimizer, is working well for you. Using mxTryHard(), if you're not already, should really cut down on the number of errors and warnings you get.

Replied on Wed, 05/09/2018 - 23:29
No user picture. Veronica_echo Joined: 02/23/2018

In reply to by AdminRobK

Thanks for your kind and prompt advice. I am going to use the mxTryHard() instead of mxRun() to make multiple tries. For the simulation, I am using the repeat loop with try() function, I guess it is similar to the while loop. I am fitting a growth curve model, is the empirical initial values helpful? Thank you very much!
Replied on Thu, 05/10/2018 - 10:30
Picture of user. AdminRobK Joined: 01/24/2014

In reply to by Veronica_echo

For the simulation, I am using the repeat loop with try() function, I guess it is similar to the while loop.

OK. I guess your script just breaks out of the loop eventually, when some criterion is satisfied?

I am fitting a growth curve model, is the empirical initial values helpful?

I bet you could get really good start values via lmer(), from package 'lme4'.

I'm a bit surprised that fitting a growth-curve model at its true parameter values, to data generated under that model, would lead to a non-PD covariance matrix.

Replied on Thu, 05/10/2018 - 19:27
No user picture. Veronica_echo Joined: 02/23/2018

In reply to by AdminRobK

Yes, when the number of effective replications is 1000, it breaks out the loop.

My growth model has definition variables, that might be an explaination to non-PD issue? If so, could I have a smaller number of errors by decreasing the range of definition variables (current setting: scaled equally-spaced time and $dv\sim unif(t_{j}-0.45, t_{j}+0.45)$)?

When I use lme4::lmer() or nlme::nlme(), I guess I should use "reml" instead of "ml"?

Replied on Mon, 05/14/2018 - 15:35
Picture of user. AdminRobK Joined: 01/24/2014

In reply to by Veronica_echo

When I use lme4::lmer() or nlme::nlme(), I guess I should use "reml" instead of "ml"?

"ml" is actually closer to what OpenMx does than "reml", but it shouldn't matter much for your purposes, since you're just trying to get start values.

My growth model has definition variables, that might be an explaination to non-PD issue?

I'm not sure, though I doubt it.