Hello,
I am a little bit confused about how to calculate the phenotypic correlation between variable 1 and variable 2, in a bivariate model with one binary variable and one continuous variable.
I mean if I use this: mxEval(cov2cor(V), fitADE, T). I get a value of 0.200 whereas if I calculate the correlation in SPSS (also in R) I get a value of 0.150.
I do not know if I am doing something wrong. Should the results be the same right?
I have tried with different methods, with and without covariables etc… But I get the same value.
Can I get this value from the saturated model?
Thank you so much in advance.
What you're calculating from OpenMx is a model-expected biserial correlation, which is the correlation between the continuous variable and the normally distributed continuum underlying the binary variable. In contrast, directly calculating the correlation using the default method in most statistical software will give you a point-biserial correlation, which is simply the Pearson correlation between a continuous variable and a binary variable numerically coded with zeroes and ones. In general, the two values won't be the same.
Thank you so much.
I see, it is the difference between biserial correlation and point-biserial correlation.
Also, Open Mx uses tetrachoric for the correlation between two binary variables and polychoric for ordinal variables right?
What result should I report? I mean what is more correct I know the difference is not too big but I would like to know what is the correct correlation.
Essentially, yes. In completely saturated model and with no missing data, the values you get from OpenMx for all four of those types of correlation should match what you get when calculating those correlations a different way. But in general, that won't be true if you're fitting a non-saturated model with OpenMx.
I think reporting the biserial correlation makes more sense, since it reflects what your OpenMx analyses assume about a normally distributed continuum underlying the binary trait. If you want to report it merely as a descriptive statistic, then report it from the saturated model. If it's somehow relevant to your substantive conclusions, then you might want to report it from the "best" model.
Thank you!
And how can I get the result from the saturated model?
I mean I can get the correlations with:
mxEval(cov2cor(DZ.covDZ), fit, T) and mxEval(cov2cor(MZ.covMZ), fit, T)
But I do not know how to get the phenotypic correlation for the whole sample at the same time in the saturated model.
Thank you so much in advance.
Oh, I see. Then I guess I would report the correlation as calculated from the highest-dimensional (most parameters) model that constrains MZ and DZ twin means and variances.
Perfect!!
Thank you so much Rob!
Hello again,
I am a little bit confused since I am trying to get the phenotypic correlation from the saturated model but I am not sure if I am doing it correctly.
I am trying to get the phenotypic correlation from the highest-dimensional model as you told me but unless I equate all the means, variances and covariances. I get different phenotypic correlation for MZ and DZ. I mean when I run these lines:
mxEval(cov2cor(DZ.covDZ), fitEMVZ, T)
mxEval(cov2cor(MZ.covMZ), fitEMVZ, T)
I get different values (for MZ and DZ) for the phenotypic correlations. The values are similar but not the same
Here my script:
But I do not know if equate all the covariances is correct. What parameters should I equate in models EMTVO and EMVZ to do it correctly? And to get the phenotypic correlation for the whole sample?
By the way, should in the univariate models the correlations for MZ (also DZ) should be the same across the models ACE AE and saturated if there is no missing data? I mean in the univariate model I usually get a value for rMZ (for example 0.70) and in the ACE a value slightly different (0.65) I do not know why the value is different form one model to another.
Thank you so much and sorry for the inconveniences
I have seen that probably my script was wrong since the ordinal variable is dichotomous and I have fixed its variance.
I have changed the script (I am not sure if correctly). But I still have different values for MZ y DZ.
Here the script modified:
Should I equate more parameters? Am I doing something wrong?
Thanks
Hi Juan
First, I think you should either constrain the variance of your ordinal phenotype to 1, or fix the residual E to some non-zero value. The second approach is a bit kinder to the optimizer, but the first yields standardized estimates, which is likely the metric that you prefer (correlation not covariance).
Second, a saturated model is usually one where every statistic - mean/threshold/variance/covariance is estimated with a different free parameter. In your case, this model is 'too saturated' because it will yield a different estimated phenotypic correlation for twin 1 and twin 2, and for MZ and DZ twins (4 different phenotypic correlations altogether). What you seem to need is a sort of semi-saturated model where all the variances, covariances, means and thresholds are equated between twin 1 and twin 2, and across MZ and DZ. The covariance across twins is not relevant here, unless you want to control for their non-independence so as to get a better estimate of the standard error of the correlation. If the SE is not of interest, then simply set up 2 columns of data - the continuous and ordinal measures - and obtain the polychoric correlation by estimating the saturated model (and then standardizing as you have with cov2cor).
Controlling for non-independence effects on the standard error of the correlation could be done several ways. You might set up the same block of phenotypic covariances for all 4 blocks (T1, T2, MZ and DZ), but let the off-diagonal blocks differ between MZ and DZ pairs. I'm thinking something like
rbind(cbind(P,M),cbind(M,P)) for the MZs and rbind(cbind(P,D),cbind(D,P)) for the DZs, where the same parameter labels are in place for the P in both groups. Does this make sense to you?
HTH
Mike
Hi Michael,
Thank you so much for your response. Your information is really helpful for me. I know what I have to do now.
However, I am not sure if I know how to do it. I am trying to constrain the variance of the ordinal phenotype to 1. I had thought to do it with mxconstraint but covMZ_4_4, covMZ_8_8, covDZ_4_4, covDZ_8_8 are not present in my model.
I have also tried it with these lines:
matUnv <- mxMatrix( type="Unit", nrow=nvo, ncol=1, name="Unv1" )
matOc <- mxMatrix( type="Full", nrow=1, ncol=nv, free=FALSE, values=oc, name="Oc" )
var1 <- mxConstraint( expression=diag2vec(Oc%&%covMZ)==Unv1, name="Var1" )
var2 <- mxConstraint( expression=diag2vec(Oc%&%covDZ)==Unv1, name="Var2" )
But that does not work either.
I have also changed these lines to do it (fixing the threshold to 0) but I am not sure if it is correct:
thinMZ <- mxMatrix( type="Full", nrow=1, ncol=ntvo, free=FALSE, values=0, name="thinMZ", labels=c("tauMZ1","tauMZ2") )
thinDZ <- mxMatrix( type="Full", nrow=1, ncol=ntvo, free=FALSE, values=0, name="thinDZ", labels=c("tauDZ1","tauDZ2") )
How could I fix the variance for the ordinal phenotype?
Also, I do not know how to set up 2 columns of data and put it in to the model. Should I create 2 different matrices?
Thank you so much again. I have been trying to solve it but I am not able to do it (sorry)
Best wishes.