variable with inherently skewed distribution
Posted on
deManzano
Joined: 04/09/2015
Forums
Hi everyone,
I would like to perform classical twin modeling on a variable with an inherently very skewed distribution (positive). I asked around a bit and the suggestion I got was to cut my original continuous data into bins, treat the variable as ordinal, and use a threshold liability model. However, this approach still assumes an underlying normal distribution. Would there be any way of generalizing the modelling in OpenMx (as e.g. in generalized linear regression)? Do you have any other suggestions or recommendations?
Cheers,
Örjan
I would like to perform classical twin modeling on a variable with an inherently very skewed distribution (positive). I asked around a bit and the suggestion I got was to cut my original continuous data into bins, treat the variable as ordinal, and use a threshold liability model. However, this approach still assumes an underlying normal distribution. Would there be any way of generalizing the modelling in OpenMx (as e.g. in generalized linear regression)? Do you have any other suggestions or recommendations?
Cheers,
Örjan
non-normal phenotype
That approach is commonly used in behavior genetics, but I am not a fan. First of all, it deliberately trades away an interval- or possibly even ratio-scale variable for an ordinal one (see Stevens, 1946). Secondly, the decisions about how many levels the ordinal variable should have, and where the cutpoints should be set, are often (perhaps usually) arbitrary, at least to some extent.
Even if the observed phenotype is non-normal, it's not necessarily unreasonable to assume an underlying normal continuum. The observed non-normality may simply be the result of how the underlying trait is measured. By analogy from psychometrics, the distribution of observed test scores is a function of the true distribution of the latent trait and the measurement properties of the instrument. If variance in the underlying trait is the result of a large number of small genetic and environmental influences, combining additively, then it's reasonable to assume the trait is approximately normal in distribution, by virtue of the Central Limit Theorem.
Consider the following as alternatives to ordinalizing:
1. Apply a suitable transformation to the variable, such as a logarithmic or square root transformation, and make it clear in your paper/talk that you analyzed the transformed, not raw, phenotype. This is especially defensible if the transformation has theoretical justification (e.g., the cube root of body mass as approximately normal). One disadvantage arises if the raw phenotype has easily interpreted units--the interpretation of the results of your analysis of the transformed phenotype may not be so simple. Also, bear in mind that the assumption of monophenotype FIML twin analysis is that the twin pairs are independent realizations on bivariate-normal random vectors. Just because the transformation improves the univariate marginal normality of the trait for twin A and B does not necessarily mean it has improved its joint bivariate-normality.
2. Leave the trait as-is, analyze it in the usual way to obtain point estimates, but use non- or semi-parametric methods for your inferences. The skewness is unlikely to bias the point estimates, but severe skewness can greatly bias normal-theory parametric inference. As of v2.7.11, OpenMx has a built-in nonparametric bootstrapping feature. There's also
imxRobustSE()
, but it doesn't work with multigroup MxModels (though it is possible to use a single-group MxModel for twin analysis, through clever use of definition variables).3. Fit your twin models with a different bivariate distribution that adequately models the skewness. Unfortunately, OpenMx lacks a simple, built-in way to do this. But, it possibly could be kludged with, say,
mxFitFunctionRow()
andmxFitFunctionAlgebra()
, ormxFitFunctionR()
.Log in or register to post comments
In reply to non-normal phenotype by AdminRobK
non-normal phenotype
Log in or register to post comments
In reply to non-normal phenotype by deManzano
a few other remarks
Right. If there's a pile-up of scores at the floor or ceiling of a scale, no transformation in the world is going to help.
How about cutting the data into the largest number of bins such that every cell in the resulting contingency table has a nonzero frequency?
Also, bear in mind that it's common practice to adjust for age and sex in twin analyses. If you treat the phenotype as continuous, the bivariate normal distribution might be a better approximation once you condition on age and sex.
Log in or register to post comments