Bivariate ACE model with moderator
I am currently working on a project on how school achievement moderates intelligence among young adults. My sample consists of 4,084 German twins aged 17 and 23-24, respectively, who have information on both GPA and intelligence. However, the twins' GPAs come from various types of secondary schools so to check whether I need to use school type as a moderator in my GxE analyses, I first want to see what happens if I run separate bivariate variance decomposition analyses for the three school types constraining their parameters equal.
In other words, I would like to set up a three-group program running the bivariate variance decomposition analyses for all three school types at once. I have tried to modify Dr. Maes’ twoACEc.R script, which uses the Cholesky parametrization to estimate a bivariate ACE model. When I just modify it to include my two variables, GPA and IQ, it produces reasonable estimates (the estimates are similar to the ones I obtain using the umxACEv function). However, when I try to take school type into account by building six mxModels (3 school types X MZ & DZ), adding them all to the multigroup function, and running the final model, some of my path coefficients are above |1| and some of my genetic and environmental correlations are negative. Can you help me figure out what the problem is in the attached script "GPA-moderation of intelligence (example).R"?
I use the standardized residuals of my two variables of interest as data input.
All the best,
Emilie
Standardized?
It's not really possible to help you without the data file, which I expect you cannot share due to privacy issues. However, it would be possible for you to share a similar, randomly generated data file by using the mxGenerateData() function. If you do that and the fake data(!) also show the problem, we could look at that.
My initial thoughts are:
1. You are not looking at standardized solutions because with a Cholesky parameterization the correlation cannot end up outside the range of -1 to +1.
2. It is probably a Bad Idea to use a moderator which may have been caused by the variable you are looking at. In the present case, it seems highly likely that school achievement is partly caused by intelligence. The Purcell model can accrue substantial biases if this assumption is violated.
3. Sometimes negative genetic and environmental covariances are to be expected, even when the within-person observed correlation is positive. In your case, this may be occurring because N is too small for at least one of the subgroups you have defined. However, note that a within person correlation of .1 might come about because of .2 contribution from environmental sources and -.1 from genetic sources. There is no need for the different sources to have the same sign of correlation.
Hope this helps!
Log in or register to post comments
Same problems with simulated data
I have now tried to generate a new, simulated data set ("TL_simulation.csv"). However, it seems that it is my script that is wrong because I still experience the same kind of problems with the simulated data file (see the attached script "GPA-moderation of intelligence (example).R"). With regard to your first point, I am pretty sure that I am looking at the standardized output, but, of course, something might have gone wrong when I tried to standardize my estimates such that they are not truly standardized. I am not completely sure that I understand your second point?
Best regards,
Emilie
Log in or register to post comments
Use SE's for now?
I see the issue with Confidence Intervals being printed as NA. As the summary notes, with verbose=T one can see that there are active bound constraints at the solutions of the searches for the CIs. In my opinion, printing NA in these cases is not helpful, but I am not the OpenMx dictator and others disagree. If you calculate estimate +/- 1.96*se then you will probably get reasonable agreement with confidence intervals, although some may hit or exceed boundary values (e.g., 1 for a correlation). It would be interesting to see how far the not-printed CIs are from these statistics. If we could get a look at them at all...
I recommend that you remove all the lbounds! With your example script, it returned all the likelihood-based CIs very nicely:
confidence intervals:
lbound estimate ubound note
twoACE.US_s1[1,7] 2.7622823e-12 2.3787388e-02 0.380695606
twoACE.US_s1[1,9] 2.3730239e-01 5.2548829e-01 0.627789101
twoACE.US_s1[1,11] 3.2578623e-01 4.5072432e-01 0.557189131
twoACE.US_s1[2,7] -1.2648491e+00 -2.3501974e-01 1.284443876
twoACE.US_s1[2,9] -6.3544096e-01 5.8246561e-01 1.367774302
twoACE.US_s1[2,11] 8.9633394e-02 6.5255412e-01 1.288291329
twoACE.US_s1[2,8] 7.9002471e-14 1.3759429e-01 0.523668335
twoACE.US_s1[2,10] 2.2800910e-02 3.4215950e-01 0.526293004
twoACE.US_s1[2,12] 3.9194936e-01 5.2024622e-01 0.658898738
twoACE.US_s2[1,7] 5.3758214e-12 2.4702608e-03 0.119588156
twoACE.US_s2[1,9] 3.6653599e-01 4.7754030e-01 0.533532667
twoACE.US_s2[1,11] 4.6456539e-01 5.1998944e-01 0.579442713
twoACE.US_s2[2,7] -1.0970761e+00 -4.7811645e-02 0.451436129
twoACE.US_s2[2,9] 4.5927513e-01 9.5784097e-01 1.940133555
twoACE.US_s2[2,11] -3.4826444e-01 8.9970672e-02 0.488314704
twoACE.US_s2[2,8] 6.0666861e-14 1.0244595e-02 0.183611120
twoACE.US_s2[2,10] 3.5710450e-01 5.0418675e-01 0.560270183
twoACE.US_s2[2,12] 4.2489259e-01 4.8556865e-01 0.540932612
twoACE.US_s3[1,7] 2.4819866e-13 2.4819866e-13 0.139318811 !!!
twoACE.US_s3[1,9] 2.5369645e-01 3.8204249e-01 0.447059059
twoACE.US_s3[1,11] 5.5114529e-01 6.1795751e-01 0.687954103
twoACE.US_s3[2,7] -2.3862687e-01 1.6512006e-12 0.630901884
twoACE.US_s3[2,9] 7.3582592e-01 1.2716111e+00 1.723964955
twoACE.US_s3[2,11] -7.3253015e-01 -2.7161107e-01 0.020392368
twoACE.US_s3[2,8] 6.3346161e-13 6.3346161e-13 0.116730799 !!!
twoACE.US_s3[2,10] 4.2533366e-01 5.2451781e-01 0.574430672
twoACE.US_s3[2,12] 4.2009295e-01 4.7548219e-01 0.530138798
I'll bring this case up at the developers' meeting and ask them how happy everyone would be if their CIs reported as follows. At least we ought to advise eliminating some of the bounds (aka box constraints). IMO.
confidence intervals:
lbound estimate ubound note
twoACE.US_s1[1,7] NA 2.3783917e-02 NA !!!
twoACE.US_s1[1,9] NA 5.2548942e-01 NA !!!
twoACE.US_s1[1,11] NA 4.5072666e-01 0.5571892 !!!
twoACE.US_s1[2,7] NA -2.3503545e-01 NA !!!
twoACE.US_s1[2,9] NA 5.8247679e-01 NA !!!
twoACE.US_s1[2,11] NA 6.5255865e-01 NA !!!
twoACE.US_s1[2,8] NA 1.3759220e-01 NA !!!
twoACE.US_s1[2,10] NA 3.4216103e-01 NA !!!
twoACE.US_s1[2,12] NA 5.2024677e-01 NA !!!
twoACE.US_s2[1,7] NA 2.4695322e-03 NA !!!
twoACE.US_s2[1,9] NA 4.7754056e-01 NA !!!
twoACE.US_s2[1,11] NA 5.1998991e-01 NA !!!
twoACE.US_s2[2,7] NA -4.7800651e-02 NA !!!
twoACE.US_s2[2,9] NA 9.5783528e-01 NA !!!
twoACE.US_s2[2,11] NA 8.9965372e-02 NA !!!
twoACE.US_s2[2,8] NA 1.0242924e-02 NA !!!
twoACE.US_s2[2,10] NA 5.0418810e-01 NA !!!
twoACE.US_s2[2,12] NA 4.8556898e-01 NA !!!
twoACE.US_s3[1,7] NA 2.0478024e-08 NA !!!
twoACE.US_s3[1,9] NA 3.8204245e-01 NA !!!
twoACE.US_s3[1,11] 0.55114536 6.1795753e-01 NA !!!
twoACE.US_s3[2,7] NA 1.0128763e-07 NA !!!
twoACE.US_s3[2,9] NA 1.2716104e+00 NA !!!
twoACE.US_s3[2,11] NA -2.7161051e-01 NA !!!
twoACE.US_s3[2,8] NA 1.5286060e-08 NA !!!
twoACE.US_s3[2,10] NA 5.2451759e-01 NA !!!
twoACE.US_s3[2,12] 0.42009278 4.7548240e-01 NA !!!
Log in or register to post comments
In reply to Use SE's for now? by AdminNeale
Path coefficients exceed |1|
Removing the lbounds solved the issue with the confidence intervals.
However, my main problem is that some of the estimated path coefficients exceed |1|. Moreover, the genetic correlations for school type 1 and school type 2 are negative, which does not make sense as we are looking at the genetic influences on GPA and intelligence. Therefore, I think I have messed up something when I tried to modify Dr. Maes’ twoACEc.R script to include three bivariate ACE models, one for each type of secondary school, in one mxFitFunctionMultigroup. Can you help me figure out what the problem is?
Log in or register to post comments
Try with each of the groups separately
If you try the three samples separately, you may get a good sense of what to expect when they are analyzed jointly.
Note that an estimate of a path coefficient is just that - an estimate. Sometimes the model may be configured so that correlations greater than 1 may be seen for some of the model parameters. If you were to split the indicators of a one-factor model to make a correlated two-factor model, and allow the factors to correlate, you could see estimates of the factor correlation exceed 1. You might see this about half the time if the one-factor model was the true world model, because the method would estimate the factor correlation (it's 1.0 in the population) but sampling error might make the estimate a bit bigger or smaller.
Now, when you look at what the model predicts for the correlations/covariances (check out mxGetExpected) - even though it may have a factor correlation of 1.8 or something silly - the predicted correlations among the measures should look fairly sane, with no predicted correlations greater than 1.0 in absolute value.
When we partition a phenotypic correlation, some of the constituent correlations may be of opposite sign. This isn't that unusual, although it isn't random because we usually apply multivariate BG methods to partition known substantial phenotypic correlations. However, if you imaging applying the method to pairs of variables whose within-person correlation is zero, you would likely get some positive covariance from A, C or E, with some negative covariance generated by one or more of the remaining sources. The model fitting will essentially 'try' to get all the variances and covariances right. How well it does is measured by various goodness-of-fit statistics such as those reported by summary().
HTH!
Log in or register to post comments
Thank you!
Log in or register to post comments
maybe try verifying your
Log in or register to post comments