One of my Phd students recently received reviewer comments for a submitted paper in which she presented 9 univariate ACE models on separate outcome measures. One of the reviewers asked: “Please indicate how correction for multiple comparisons was handled for the genetic modeling. Given that several brain regions were being analyzed, what statistical threshold was used?”

First of all, is any such correction really needed for 9 outcomes?

Second, how would we go about performing such a correction in openmx? What output I'm used to seeing reported in papers often includes the p-value only for selection of the best fitting model, and the path loadings obtained from the model, but no p-values for these path loadings.

Thanks.

Let me make sure I understand the situation. Your student reported results of 9 monophenotype ACE models, for some metric on 9 brain regions, correct? How many different models did she fit for each phenotype? If she fit more than one model per trait, how did she select the "best" model?

To further RobK's comment, was there a fixed set of models to be fit for each phenotype, or did the number and type of models vary depending on what was found from the modeling itself?

Possibly, false discovery rate correction could be used. On the whole though, with genetic model estimates I don't think that these procedures are very helpful. By analogy, it is as though one has decided that 20,000 feet is a significantly high part of planet earth, so one draws a map that has a few contours around the Himalayas and the Andes, but is blank white otherwise. For the purposes of global navigation, the map is largely useless. It answers the question of whether there are significantly high bits of the Earth, but whether there are significantly highly heritable regions is rarely the question in structural MRI studies. For practical purposes, it is typically better to have a global map (recognizing its errors and all) than a map of where the big mountains are.

She compared monophenotype ACE vs AE vs CE vs E for each of the 9 brain regions, selecting the 'best' model based on log likelihood and AIC.

A reviewer can understandably become uneasy if a manuscript reports a large number of hypothesis tests without any Type I error correction. I was wondering if the manuscript presents some results as statistical hypothesis tests when doing so isn't necessary or even accurate. Assuming you are only going to report the results of a single "best" model for each brain region, I would suggest the following. First, for each region, compare the AICs of all four candidate models to one another once all have been fitted (and not in a pairwise manner), and select the one with the smallest AIC. Do not frame this model selection in terms of statistical "significance." Model-selection by AIC is not a statistical hypothesis test in any sense; instead, it approximates cross-validation, for a particular loss metric and given certain assumptions. Second, from each best model, report confidence intervals only for parameters of substantive interest to the study. You could acknowledge in the text that the confidence intervals will only have marginal coverage probability of 95%, whereas their joint coverage probability would be lower than that.

I have two questions:

1. Are you saying there is a way to estimate the models entirely without specifying the N or SE?