Hi all,

I have a quick question to ask about model selection.

Let’s say I am interested to find out the ACE for years of music training, and for the full model I got the following estimates: A = 51% (CI: 0.18-0.89), C = 33.6% (CI: 0-0.65), E = 15.4% (CI: 0.10-0.24).

The fit statistics of the submodels are as follows:

base comparison ep minus2LL df AIC diffLL diffdf p 1 HomoAce <NA> 6 1022.900 193 636.8998 NA NA NA 2 HomoAce AE 5 1024.610 194 636.6104 1.710576 1 1.909107e-01 3 HomoAce CE 5 1033.580 194 645.5804 10.680594 1 1.082653e-03 4 HomoAce E 4 1109.442 195 719.4423 86.542486 2 1.612642e-19

As the AE model’s AIC is slightly lower than the full model, does that mean that the rule of thumb would be to choose AE as the best fitting model? The estimates for the AE model are: A = 84.5% (CI = 0.77-0.90), E = 15.5% (CI = 0.10-0.23).

I am feeling a bit unsure about choosing AE as the model because intuitively, it doesn’t seem to make sense if shared environment component isn’t important for years of music training. Are there other factors that I should take into consideration for model selection?

Thanks and best regards,

Yi Ting Tan

It's always a good idea to let prior data and theory inform the set of candidate models under consideration. If a models that lack a shared-environment component aren't plausible, then exclude them from the candidate set, and don't fit them in the first place. However, in twin analyses of a single phenotype, trying models that "drop C" is so commonplace that you would have to provide a pretty compelling rationale for not trying any.

Based on the output in your post, the AE model has the smallest AIC (though not by much), and thus would be considered the "best" model of the four under consideration (conditional on sample size). You have a pretty small sample, right? It may be that you lack power to distinguish the shared-environmental component from zero.