How to fix the genetic correlation larger than 1

Posted on
No user picture. diana Joined: 04/17/2019
Hi everyone, I am fitting a bivariate ACE model and I am very confused about the Ra (genetic correlation of two traits) that is larger than 1. Since Ra is a correlation coefficient, it should range from -1 to 1, am I right? I don't know how to explain this result.

In the ACE model, Ra=1.37 ,Rc=-0.04, Re=0.22.
Then I fitted ADE model, Ra=0.07, Rd=NA, Re=0.22.

The results of the goodness-of-fit test between saturated model and ACE or ADE are as follows.

base comparison ep minus2LL df AIC diffLL diffdf p
1 Sat 24 7734.750 7728 -7721.250 NA NA NA
2 Sat ACE 15 7749.792 7737 -7724.208 15.04198 9 0.08978923
3 Sat ADE 15 7749.741 7737 -7724.259 14.88488 9 0.09414712

Thus, I wonder why Ra can be larger than 1?
And in this case, should I report Ra of ACE model (if so, how to explain it), or report Ra of ADE model?

I have enclosed the a sample data and my code for your review.

Thank you for your time.

Replied on Tue, 12/03/2019 - 15:50
Picture of user. AdminRobK Joined: 01/24/2014

The explanation is very simple. The matrices named 'A', 'C', 'D', and 'E' in your script are merely symmetric matrices, and there is nothing in place to ensure that they are valid covariance matrices, i.e. symmetric *and* positive (semi-)definite. Since they aren't necessarily covariance matrices, standardizing them doesn't necessarily give you correlation matrices, and thus their off-diagonal elements aren't restricted to [-1,1].

Have you tried fitting an AE model?

Replied on Thu, 12/05/2019 - 08:44
No user picture. diana Joined: 04/17/2019

In reply to by AdminRobK

1. I fitted a model which contained C for trait 1 and C for trait 2 but excluded C for correlation between trait 1 and trait 2,and the Ra is still larger than 1. The AE model cannot be accepted as it deteriorated the fit of the model compared with ACE model, although the Ra=0.22.

2. I wrote my script according to [the demo in this page](https://static1.squarespace.com/static/58b2481a9f7456906a3b9600/t/5bd10731e79c70a4980526ea/1540425521921/twoACEvb.pdf), whether was there something wrong in my script or does this way (using A,C and E rather than path coeffcients) cannot estimate genetic and environmental correlations at all? Is using path coefficients a, c and e the only way to get correlation matrices for Ra Rc Re?

If using A, C and E matrices cannot generate correlation matrices, then I am confused what's the function of such methods based on variance components, since in my mind bivariate twin model is performed to get genetic and environmental correlations between two traits.

3. When fitting submodels, I tend to drop C for trait 1, C for trait 2 and C correlation between trait 1 and trait 2 step by step. I want to know is this OK? Or what's your preference when fitting submodels?
#C for p1 dropped AE-ACE
dropcp1 <- omxSetParameters(AceModel,labels=c("Base.C[1,1]","Base.C[1,2]"),free=F,values=0,name="dropcp1")
dropcp1Fit <- mxTryHardOrdinal(dropcp1, intervals=F,extraTries = 15,OKstatuscodes=c(0,1,5),exhaustive=F)

#C for p2 dropped ACE-AE
dropcp2 <- omxSetParameters(AceModel,labels=c("Base.C[2,2]","Base.C[1,2]"),free=F,values=0,name="dropcp2") #not sure#
dropcp2Fit <- mxTryHardOrdinal(dropcp2, intervals=F,extraTries = 15,OKstatuscodes=c(0,1,5),exhaustive=F)

#C for p1p2 overlap dropped
dropc12 <- omxSetParameters(AceModel,labels="Base.C[1,2]",free=F,values=0,name="dropc12")
dropc12Fit <- mxTryHardOrdinal(dropc12, intervals=T,extraTries = 15,OKstatuscodes=c(0,1,5),exhaustive=F)

I look forward to your reply. Thank you so much!

Sincerely

Replied on Thu, 12/05/2019 - 16:04
Picture of user. AdminRobK Joined: 01/24/2014

In reply to by diana

I wrote my script according to the demo in this page, whether was there something wrong in my script

I didn't look at your script very closely, but the odd genetic correlations you're getting aren't likely to be the result of a problem with your script. Rather, the demo script you linked *deliberately* allows the *A*, *C*, *D*, and *E* matrices to be non-positive-definite. The rationale for that is discussed in a recent paper, as well as elsewhere on these forums.

or does this way (using A,C and E rather than path coeffcients) cannot estimate genetic and environmental correlations at all? Is using path coefficients a, c and e the only way to get correlation matrices for Ra Rc Re?

If using A, C and E matrices cannot generate correlation matrices, then I am confused what's the function of such methods based on variance components, since in my mind bivariate twin model is performed to get genetic and environmental correlations between two traits.

The direct-symmetric parameterization (which is what your script is using) does allow for estimation of 'Ra', 'Rc', and 'Re', provided that the estimated *A*, *C*, and *E* matrices are positive-definite. In your dataset, the MLE of *A* in the ACE model (and probably also *D* in the ADE model, by the look of it) is not positive-definite, rendering the "genetic correlation" much less interpretable as such. You could, of course, use a different parameterization that ensures positive-definiteness, and thus more interpretable results, but doing so introduces statistical complications you need to be aware of, and I'm willing to bet it will worsen model fit as well.

When fitting submodels, I tend to drop C for trait 1, C for trait 2 and C correlation between trait 1 and trait 2 step by step. I want to know is this OK? Or what's your preference when fitting submodels?

Decide *a priori* which models you want to fit, then fit them, and compare their AICs.

Replied on Tue, 12/03/2019 - 17:29
Picture of user. tbates Joined: 07/31/2009

Rob's comment is spot on: the script you're using is building direct variance models, where A C and E are constructed directly (these are added to generate the expected covariance among the variables in and that's what's tested against the observed covariances in the data)

The nomenclature will get a bit confusing, as in conventional ACE models, what is modelled is little a c and e. These contribute a lower triangle of paths down onto the observed variables, and big A (and C and E) are created as the matrix product of these lower triangular matrices, e.g. mxAlgebra(name = "A", a %*% a)

So I'd call these "ACEv" models (or something similar).

Note, see umxACEv and umxACE for examples where the script is wrapped up into a function.