Normalize data?
Posted on
Cindy.s
Joined: 07/04/2015
Forums
Hi OpenMX community!
I have a probably strait forward question. I'm running simple Univariate ACE models with covariates. My dependent variable has non-normal distribution. Should I normalize my data, for example using the R function scale? Should I normalize my covariates also? My estimates only change slightly, my Chi-square values are basically the same, but my CFI and TLI values change a little bit, should this be of concern?
Thanks for your help!
scale does not normalize data
However, it will be just as non-normally distributed as it was before...
If your data are badly skewed, or in the wrong measurement unit (perhaps they should be log() or sort() transformed, you need to transform them.
If they have more than just a bit of skew or kurtosis, you may need to treat them as an ordinal outcome.
A hist() of the data would help
Log in or register to post comments
distributional assumption
scale()
, which linearly transforms the variable to have zero mean and unit variance--no linear transformation will change the shape of the distribution.Are you incorporating the covariates into the model as "definition variables?" That is, does your MxModel have at least one MxMatrix that has labels starting with
'data.'
? If so (and you probably are, if you're adapting an example script), then you don't need to worry about the distributions of the covariates, because you are modeling the distribution of the phenotype conditional on the observed values of the covariates, rather than the joint distribution of the phenotype and covariates.With that in mind, note that the distributional assumption* is that the phenotype is normally distributed, conditional on covariates. So, what you'd really want to look at would be graphs of the residuals from a regression of the phenotype onto those covariates. I daresay that simply using the
lm()
function in R is good enough for this purpose.(*To be pedantically correct, the distributional assumption is that, conditional on covariates, the phenotype of Twin #1 and Twin #2 are jointly bivariate normal. This is why I endorse replacing the phrase "univariate ACE model" with "monophenotype ACE model" or "single-trait ACE model.")
Log in or register to post comments
In reply to distributional assumption by AdminRobK
Should MVN-tests be used to assess normality?
Thanks for enlightening the use of the
scale()
function. Basically then, linear transformations don’t alter the distribution, thus the estimates and the fit statistics are basically the same, though the –2LL values are completely different.Is there an established approach to assess multivariate normality for twin models? Using the MVN package’s MVN-tests my data showed regarding the whole population:
Henze-Zirkler's Multivariate Normality Test
---------------------------------------------
p-value : 0.01049907
Mardia's Multivariate Normality Test
---------------------------------------
p.value.skew : 0.01001787
p.value.kurt : 0.00282801
Royston's Multivariate Normality Test
---------------------------------------------
p-value : 0.03322675
Should I be concerned that they all showed non-normal distribution (see also attached 3D plot)? Should the subsets of MZ and DZ data be each MVN, or is it enough for the whole population?
On the univariate level the distribution doesn’t seem that bad (see attached hist.), based on SW-test:
Shapiro-Wilk's Normality Test`
Variable Statistic p-value Normality
1 Column1 0.9910 0.7551 YES
2 Column2 0.9661 0.0123 NO
Could I switch some of the siblings between each other maybe to decrease the skew of “Column2”, maybe helping to improve the MV normality?
If I transform the data based on some non-linear transformation, then the estimated A,C and E values are true for the original trait also, not just the transformed one?
Yes I’m incorporating the covariates into the model as definition variables, fortunately the residual plots seem reasonable, if they wouldn’t, I would have to try and normalize them also, right?
Log in or register to post comments
In reply to Should MVN-tests be used to assess normality? by Cindy.s
Again, try using lm() to
lm()
to obtain residuals, and analyze the residuals using the tools available in package 'MVN'--unless I misunderstand, and that's what you already did here(?). Keep in mind that regressing out the covariates withlm()
is only approximately the same as adjusting for them as definition variables in OpenMx (so I guess the ideal thing to do would be to fit the MxModel and extract residuals from it). In any event, though, if the covariates don't have large effects, then the graphs etc. of the residuals probably won't be much different from those of the un-residuallized phenotype.How large is your sample? If it's pretty large, then even modest departures from normality can appear quite statistically significant.
You mean "sample" instead of "population," right? Anyhow, I think you'll want to evaluate MZ and DZ separately. Unless your phenotype has a heritability of zero, the distribution of MZ and DZ data pooled together would be a mixture of two bivariate distributions having different covariances. Thus, even if the phenotype were bivariate normal in each zygosity group, it might not look bivariate normal when the groups are pooled together.
No! If you nonlinearly transform the phenotype and analyze it in an MxModel, your parameter estimates will only apply to the transformed phenotype, not the untransformed phenotype. Statistics like twin correlations can and will change from pre- to post-nonlinear-transformation.
Log in or register to post comments
In reply to Again, try using lm() to by AdminRobK
Robustness to non-normality
http://www.dilipmutum.com/2011/07/normality-issues-in-sem.html
Parameter estimates from mildly non-normal data are often fine. However, standard errors and chi-square statistics are often biased in this case.
Log in or register to post comments
In reply to Robustness to non-normality by mhunter
Observed phenotype and covariates have to have bivariate normal?
I actually used the MVN package to asses to multivariate normality of my raw data (around 80 twin pairs in MZ and DZ), because I thought the dependent variables have to have bivariate normal distribution (multivariate in case of multivariate twin analysis). Should the residuals too in case of covariates?
Thanks for pointing it out, that MZ and DZ groups should be handled separately. The normality tests for the raw data are now better, and based on Michael's comment, I'm glad slight non-normality shouldn't be a problem.
Log in or register to post comments
In reply to Observed phenotype and covariates have to have bivariate normal? by Cindy.s
residuals probably OK
If the raw phenotype looks OK, then the residuals probably will too. I guess it wouldn't hurt to check them if you're still concerned.
Log in or register to post comments