Attachment | Size |
---|---|

GPA-moderation of intelligence (example).R | 11.05 KB |

Hi all,

I am currently working on a project on how school achievement moderates intelligence among young adults. My sample consists of 4,084 German twins aged 17 and 23-24, respectively, who have information on both GPA and intelligence. However, the twins' GPAs come from various types of secondary schools so to check whether I need to use school type as a moderator in my GxE analyses, I first want to see what happens if I run separate bivariate variance decomposition analyses for the three school types constraining their parameters equal.

In other words, I would like to set up a three-group program running the bivariate variance decomposition analyses for all three school types at once. I have tried to modify Dr. Maes’ twoACEc.R script, which uses the Cholesky parametrization to estimate a bivariate ACE model. When I just modify it to include my two variables, GPA and IQ, it produces reasonable estimates (the estimates are similar to the ones I obtain using the umxACEv function). However, when I try to take school type into account by building six mxModels (3 school types X MZ & DZ), adding them all to the multigroup function, and running the final model, some of my path coefficients are above |1| and some of my genetic and environmental correlations are negative. Can you help me figure out what the problem is in the attached script "GPA-moderation of intelligence (example).R"?

I use the standardized residuals of my two variables of interest as data input.

All the best,

Emilie

Hi

It's not really possible to help you without the data file, which I expect you cannot share due to privacy issues. However, it would be possible for you to share a similar, randomly generated data file by using the mxGenerateData() function. If you do that and the fake data(!) also show the problem, we could look at that.

My initial thoughts are:

1. You are not looking at standardized solutions because with a Cholesky parameterization the correlation cannot end up outside the range of -1 to +1.

2. It is probably a Bad Idea to use a moderator which may have been caused by the variable you are looking at. In the present case, it seems highly likely that school achievement is partly caused by intelligence. The Purcell model can accrue substantial biases if this assumption is violated.

3. Sometimes negative genetic and environmental covariances are to be expected, even when the within-person observed correlation is positive. In your case, this may be occurring because N is too small for at least one of the subgroups you have defined. However, note that a within person correlation of .1 might come about because of .2 contribution from environmental sources and -.1 from genetic sources. There is no need for the different sources to have the same sign of correlation.

Hope this helps!

Dear Dr. Neale,

I have now tried to generate a new, simulated data set ("TL_simulation.csv"). However, it seems that it is my script that is wrong because I still experience the same kind of problems with the simulated data file (see the attached script "GPA-moderation of intelligence (example).R"). With regard to your first point, I am pretty sure that I am looking at the standardized output, but, of course, something might have gone wrong when I tried to standardize my estimates such that they are not truly standardized. I am not completely sure that I understand your second point?

Best regards,

Emilie

Hi

I see the issue with Confidence Intervals being printed as NA. As the summary notes, with verbose=T one can see that there are active bound constraints at the solutions of the searches for the CIs. In my opinion, printing NA in these cases is not helpful, but I am not the OpenMx dictator and others disagree. If you calculate estimate +/- 1.96*se then you will probably get reasonable agreement with confidence intervals, although some may hit or exceed boundary values (e.g., 1 for a correlation). It would be interesting to see how far the not-printed CIs are from these statistics. If we could get a look at them at all...

I recommend that you remove all the lbounds! With your example script, it returned all the likelihood-based CIs very nicely:

I'll bring this case up at the developers' meeting and ask them how happy everyone would be if their CIs reported as follows. At least we ought to advise eliminating some of the bounds (aka box constraints). IMO.

Hi again,

Removing the lbounds solved the issue with the confidence intervals.

However, my main problem is that some of the estimated path coefficients exceed |1|. Moreover, the genetic correlations for school type 1 and school type 2 are negative, which does not make sense as we are looking at the genetic influences on GPA and intelligence. Therefore, I think I have messed up something when I tried to modify Dr. Maes’ twoACEc.R script to include three bivariate ACE models, one for each type of secondary school, in one mxFitFunctionMultigroup. Can you help me figure out what the problem is?

Hi

If you try the three samples separately, you may get a good sense of what to expect when they are analyzed jointly.

Note that an estimate of a path coefficient is just that - an estimate. Sometimes the model may be configured so that correlations greater than 1 may be seen for some of the model parameters. If you were to split the indicators of a one-factor model to make a correlated two-factor model, and allow the factors to correlate, you could see estimates of the factor correlation exceed 1. You might see this about half the time if the one-factor model was the true world model, because the method would estimate the factor correlation (it's 1.0 in the population) but sampling error might make the estimate a bit bigger or smaller.

Now, when you look at what the model predicts for the correlations/covariances (check out mxGetExpected) - even though it may have a factor correlation of 1.8 or something silly - the predicted correlations among the measures should look fairly sane, with no predicted correlations greater than 1.0 in absolute value.

When we partition a phenotypic correlation, some of the constituent correlations may be of opposite sign. This isn't that unusual, although it isn't random because we usually apply multivariate BG methods to partition known substantial phenotypic correlations. However, if you imaging applying the method to pairs of variables whose within-person correlation is zero, you would likely get some positive covariance from A, C or E, with some negative covariance generated by one or more of the remaining sources. The model fitting will essentially 'try' to get all the variances and covariances right. How well it does is measured by various goodness-of-fit statistics such as those reported by summary().

HTH!

Thank you so much for your assistance! I do not know why I had not thought of running the bivariate ACE models for the three samples separately, but it turned out that the parameter estimates actually were similar to the ones I obtain when I use my multi-group model to estimate separate parameters for the three school types at once.

maybe try verifying your results with umx. I'm not well versed in research on GPAs and cognitive ability. However, I would further investigate a negative genetic correlation to rule out errors. Control for age and sex and see if that does make a difference. Also, check out the MZ/DZ balance within each separate model (for each school type) and how many twin pairs you have available.