Hi OpenMX community!

I have a probably strait forward question. I'm running simple Univariate ACE models with covariates. My dependent variable has non-normal distribution. Should I normalize my data, for example using the R function scale? Should I normalize my covariates also? My estimates only change slightly, my Chi-square values are basically the same, but my CFI and TLI values change a little bit, should this be of concern?

Thanks for your help!

scale() will give your dependent variable a mean of zero and SD of 1.

However, it will be just as non-normally distributed as it was before...

If your data are badly skewed, or in the wrong measurement unit (perhaps they should be log() or sort() transformed, you need to transform them.

If they have more than just a bit of skew or kurtosis, you may need to treat them as an ordinal outcome.

A hist() of the data would help

T. Bates is exactly right about

`scale()`

, which linearly transforms the variable to have zero mean and unit variance--no linear transformation will change the shape of the distribution.Are you incorporating the covariates into the model as "definition variables?" That is, does your MxModel have at least one MxMatrix that has labels starting with

`'data.'`

? If so (and you probably are, if you're adapting an example script), then you don't need to worry about the distributions of the covariates, because you are modeling the distribution of the phenotype conditional on the observed values of the covariates, rather than the joint distribution of the phenotype and covariates.With that in mind, note that the distributional assumption* is that the phenotype is normally distributed, conditional on covariates. So, what you'd really want to look at would be graphs of the residuals from a regression of the phenotype onto those covariates. I daresay that simply using the

`lm()`

function in R is good enough for this purpose.(*To be pedantically correct, the distributional assumption is that, conditional on covariates, the phenotype of Twin #1 and Twin #2 are jointly bivariate normal. This is why I endorse replacing the phrase "univariate ACE model" with "monophenotype ACE model" or "single-trait ACE model.")

Thank you all for your help!

Thanks for enlightening the use of the

`scale()`

function. Basically then, linear transformations don’t alter the distribution, thus the estimates and the fit statistics are basically the same, though the –2LL values are completely different.Is there an established approach to assess multivariate normality for twin models? Using the MVN package’s MVN-tests my data showed regarding the whole population:

Should I be concerned that they all showed non-normal distribution (see also attached 3D plot)? Should the subsets of MZ and DZ data be each MVN, or is it enough for the whole population?

On the univariate level the distribution doesn’t seem that bad (see attached hist.), based on SW-test:

Could I switch some of the siblings between each other maybe to decrease the skew of “Column2”, maybe helping to improve the MV normality?

If I transform the data based on some non-linear transformation, then the estimated A,C and E values are true for the original trait also, not just the transformed one?

Yes I’m incorporating the covariates into the model as definition variables, fortunately the residual plots seem reasonable, if they wouldn’t, I would have to try and normalize them also, right?

Again, try using

`lm()`

to obtain residuals, and analyze the residuals using the tools available in package 'MVN'--unless I misunderstand, and that's what you already did here(?). Keep in mind that regressing out the covariates with`lm()`

is only approximately the same as adjusting for them as definition variables in OpenMx (so I guess the ideal thing to do would be to fit the MxModel and extract residuals from it). In any event, though, if the covariates don't have large effects, then the graphs etc. of the residuals probably won't be much different from those of the un-residuallized phenotype.How large is your sample? If it's pretty large, then even modest departures from normality can appear quite statistically significant.

You mean "sample" instead of "population," right? Anyhow, I think you'll want to evaluate MZ and DZ separately. Unless your phenotype has a heritability of zero, the distribution of MZ and DZ data pooled together would be a mixture of two bivariate distributions having different covariances. Thus, even if the phenotype were bivariate normal in each zygosity group, it might not look bivariate normal when the groups are pooled together.

No! If you nonlinearly transform the phenotype and analyze it in an MxModel, your parameter estimates will only apply to the transformed phenotype, not the untransformed phenotype. Statistics like twin correlations can and will change from pre- to post-nonlinear-transformation.

I feel it's worth mentioning that SEM is generally pretty robust to violations of normality. I don't have references for this off the top of my head, but some quick internet searching finds plenty of info.

http://www.dilipmutum.com/2011/07/normality-issues-in-sem.html

Parameter estimates from mildly non-normal data are often fine. However, standard errors and chi-square statistics are often biased in this case.

Thank you Michael and Robert for our comments!

I actually used the MVN package to asses to multivariate normality of my raw data (around 80 twin pairs in MZ and DZ), because I thought the dependent variables have to have bivariate normal distribution (multivariate in case of multivariate twin analysis). Should the residuals too in case of covariates?

Thanks for pointing it out, that MZ and DZ groups should be handled separately. The normality tests for the raw data are now better, and based on Michael's comment, I'm glad slight non-normality shouldn't be a problem.

If the raw phenotype looks OK, then the residuals probably will too. I guess it wouldn't hurt to check them if you're still concerned.