Hello,

I have a relatively large dataset (500 same sex twin pairs) where the raw correlations of the observed phenotypes between MZs and DZs show shared environment effects (e.g., MZ-.39 vs. DZ-.29). Moreover, if I enter only these correlations into a script in the old Mx program, I get significant shared environment effects.

Nevertheless, when I built the script in openMx and did model comparisons, the AE model was not significantly worse than the ACE model, suggesting that it is better to adopt the AE, and not the ACE model. Similarly, the CE model was not significantly worse than the ACE model. Only the E model was significantly worse.

I would like to ask two questions in that matter:

1) In general, when E is worse than ACE, but CE and AE are not worse than ACE, which model is the best model and what should I conclude from such results?

2) what could be the statistical/ theoretical reasons not to find shared environment effects in the openMx method of models comparison, although the raw data indicates that there are such effects?

Thank you very much

If you are going to select only a single "best" model, then my advice is to select the one with smallest AIC, and draw your conclusions from its point estimates and confidence intervals.

There are a number of possible reasons, and many of them have to do with how MZ and DZ correlations are typically calculated directly from data. One likely culprit is if the correlations are calculated without adjustment for sex and age. If age has an effect on the phenotype, and if most twins in the sample were assessed at the same or at close to the same age as their co-twin, then the shared-environmental component will be inflated by the variance attributable to age. Likewise, if sex has an effect on the phenotype, and if there are few or no opposite-sex DZ twin pairs in the sample, then the shared-environmental component will be inflated by the variance attributable to sex. In contrast, it is standard practice to adjust for age and sex when fitting twin models in OpenMx.

Another possible reason is that, when most people directly calculate MZ and DZ correlations, what they end up calculating is an interclass correlation (which is what a Pearson correlation between two different variables will always be), and the value of an interclass correlation will change if you reverse which twin is "twin #1" and which is "twin #2" in some of the pairs. But in actual fact, the ordering of twins in a pair is almost always arbitrary, so the interclass correlation is not the ideal statistic. Instead, the < href="https://en.wikipedia.org/wiki/Intraclass_correlation">intraclass correlation is preferred. But twin models, as fitted in OpenMx, estimate the same phenotypic variance for twin #1 and twin #2, and therefore their results do not depend upon twin ordering either.

Yet another possible reason is missing data, at least if the directly calculated MZ and DZ correlations are computed from a listwise-complete subset of each group (which throws away information).

Finally, bear in mind that twin models in OpenMx model the phenotypic mean and raw variance-covariance parameters, not correlations. Correlations can lead to different conclusions. Imagine if you calculated the MZ and DZ covariance matrix in R, through the

`cov()`

function, applied to each group's data. Suppose further that the MZ and DZ twins had the same covariance, but the MZ twins had smaller variances. If you were to calculate MZ and DZ correlations with the`cor()`

function instead, the MZ correlation would be greater than the DZ correlation, even though the covariances were equal.