Hi everyone,
I have some doubts with regard to missing data and incomplete twin pairs and I do not know what is the best option.
For example, If I am fitting a bivariate model what should I do with one twin pair where one of the twins does not have the information for one of the variables? I mean if the bivariate twin analysis is with variable1 and variable 2 and one twin does not have the information for variable 1. Also, what should I do if one twin pair does not have information for variable 1 but does for variable 2.
Finally, what should I do with unpaired twins, should I remove them from my data set? I would like to know how the program treat missing data and unpaired twins and what is the correct option.
Thank you so much
Don't remove incomplete data rows from the analysis. Data rows that contain missing phenotype scores--even those containing singleton twins--still contain information about unknown parameters.
If you're analyzing raw data with an MxFitFunctionML object, OpenMx does full-information maximum-likelihood estimation (FIML), which is not biased by missing data unless the missing-data mechanism is NMAR (Not Missing At Random).
For reference, I refer you to:
Rubin, D. B. (1976). Inference and missing data. (1976). Biometrika, 63(3), 581-592.
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147-177.
Thank you so much Rob, fantastic answer as always. I used to think the same. However, I had a problem with a bivariate saturated model (1 ordinal variable 1 binary variable) and I decided to try without the unpaired twins.
Here the results with all twins:
Here without unpaired twins:
So, I thought that maybe unpaired twins could be the problem and I wanted to be sure. However, if the unpaired twins are not the problem I don’t know what should I do.
Here is the full saturated script just in case something is wrong. But I guess the problem should be with the data.
Please let me know if you need something else and million thanks in advance.
In your comparison table i don't see major problems. The covariates are hugely significant, the twin 1 vs. twin 2 means are non-significantly different (.342) but there is marginally significant MZ vs. DZ means (.018). Depending on how large your sample size is, this may not be unreasonable. A very small real effect in a very large sample may turn out significant. The only reason it seems not to in the unpaired case is probably just because of throwing about 10% of your data (sample size) away.
Before I go further, let me allow you to expand on your concern. BTW I agree with Rob that FIML with everyone you've got is the way to go.
Thank you so much for these really helpful comments I really appreciate it. You are right my sample is big 965 twin pairs and 218 unpaired twins.
Since, the covariates (sex and age) are hugely significant should I think about sex-limitation model? I don’t know when is suitable fit that model. I want to be sure that I am fitting the correct model.
Thank you so much guys again.
Large sex differences in mean do not necessarily imply that variance components will differ. These are different models. Ideally, we should start with a hypothesis about how things work, but an exploratory analysis of GxSex might seem worthwhile since at least one statistic differs. It would also be useful to know about the variances in males vs. females - if the variances differ it is reasonably to ask why - is there more genetic variation in men than women, for example. Note that in multivariate sex limitation modeling there are advantages to a correlated factors model over a Cholesky, because the latter can mask measurement non-invariance between the sexes, and has uncomfortable statistical properties (e.g., changing the order of the variables changes the fit of the model, because it is not saturated across sex, even though it is within each one). See Neale, M.C., Røysamb, E., and Jacobson, K. (2006c). Multivariate genetic analysis of sex limitation and g x e interaction. Twin research and human genetics : the official journal of the International Society for Twin Studies, 9(4):481-89 for details.
Thank you so much I really appreciate it. Just one more thing, I am working with a dichotomous variable then I think if the mean differs necessarily the variance will differ, am I wrong?
My means are 0.14 for males and 0.28 for females. Should I do an exploratory GxSex model?
Thank you so much again!