Bivariate Cholesky with Covariates - Best Practice?
Posted on

Forums
I'm doing a bivariate Cholesky decomposition between two variables that were measured at two different time points. I know that it's normally best practice to adjust variables for age and sex effects, and I see many different threads on this board that have discussed how to add covariates to an ACE model. Two questions:
1. For my purposes, should I regress out sex and age at time point 1 for the first variable and then regress out sex and age at time point 2 for the second variable?
2. Is there any problem with using residualized variables rather than building the adjustment into the OpenMX script?
Many thanks for any insights.
correct each variable for its own age at measurement
> and then regress out sex and age at time point 2 for the second variable?
Yes. You are correct to control for the age at measurement for each of the time points, to avoid confounding measurement separation with causation.
> 2. Is there any problem with using residualized variables rather than
> building the adjustment into the OpenMX script?
The main difference is that you lose some information about the means in the OpenMx model: however, you already captured that in the regression, so no practical difference. Either way, if there are missing variables, you will lose data rows, so no difference there either.
The alternative to including these in the means model is to include the covariates as a block in the expected covariance structure. In this case, so the cell for variance for var1 would include a contribution of variance due to age_time1 etc.
The `umx` package has a function that handles this (`umxACEcov`), but it assumes the covariates apply to all variables, so you'd need to customize it.
Log in or register to post comments
In reply to correct each variable for its own age at measurement by tbates
Thank you -- very helpful!
Log in or register to post comments
Agree with time-specific covs
Second, residuals first approach is good if the data are continuous, like height or weight. It may be ok if the data are factor or scale scores, but here there is a risk of sweeping unpleasant measurement issues under the rug, to the detriment of the type 1 error rate.
I do not see how residuals for ordinal data can be usefully pre-computed. One might tackle it through, e.g., multiple imputation, but it's likely quite messy and would (to my mind) require simulation to establish that bias, variance and type 1 error rates are not compromised compared to other methods such as including the (ordinal) regressions on the covariates in the model.
Log in or register to post comments
In reply to Agree with time-specific covs by neale
Thanks -- in this case I'm
Log in or register to post comments
Best practice is to build adjustment into script
I doubt it makes much difference in most cases, but best practice is unambiguously to incorporate the regression into the OpenMx script. Since you're estimating biometrical variance components, you're interested in the decomposition of variance in the part of the phenotypes that cannot be linearly predicted from nuisance covariates. The thing is, in general, doing OLS regression before running residuals through OpenMx, versus incorporating the regression into the MxModel, will give slightly different regression coefficients, meaning that you will be biometrically decomposing slightly different "residuals." OLS regression is not optimal in a maximum-likelihood sense when the residuals are correlated, and you know your observations are clustered within twin pairs, so using it to adjust the phenotypes before the main analysis therefore cannot be considered "best practice."
Using some kind of generalized least-squares regression (instead of OLS) is more defensible, but why go to the trouble when it's not that hard to just use the covariates in OpenMx?
Log in or register to post comments
In reply to Best practice is to build adjustment into script by AdminRobK
Good points
The other thing that occurs to me is that the differences aren't really comparable to the idea that forgetting about clustering doesn't change the parameter estimates - merely their standard errors. I'm not sure how consistent this is with the preceding discussion though...
Log in or register to post comments
In reply to Best practice is to build adjustment into script by AdminRobK
Thanks for the word -- I'm
Log in or register to post comments
In reply to Thanks for the word -- I'm by JuliaJoplin
Add them all
Log in or register to post comments
In reply to Best practice is to build adjustment into script by AdminRobK
Starting with Two Covariates & Bivariate
Ultimately, I want to run a scalar sex difference model (standard variance components are the same for both genders). For covariates, I want just ageatt1 and sex for the first variable, and ageatt2 and sex for the second variable. This is probably the most complex thing I've done with openMX and my experience is pretty limited, so I thought I'd start by just trying to see if I could adapt a script using only 2 covariates in the bivariate case.
I used this older script that seems to be well-used on this board:
http://openmx.psyc.virginia.edu/sites/default/files/UnivariateTwinAnalysis_MatrixRaw-3.R
require(OpenMx)
require(psych)
setwd("C:/Users/user/Desktop/R")
# BIVARIATE
mydata <- read.delim("Choleskytest.dat", header=T, sep="\t", na.strings=".")
attach(mydata)
mydata$caster1 <- mydata$CAST
mydata$caster2 <- mydata$CASTb
mydata$ogo1 <- mydata$ORGAN
mydata$ogo2 <- mydata$ORGANb
# Select Variables for Analysis
Vars <- c('caster','ogo')
nv <- 2 # number of variables
ntv <- nv*2 # number of total variables
selVars <- c("caster1", "ogo1", "caster2", "ogo2")
mzpData <- subset(mydata, ZYGOSITY=="0", selVars)
dzpData <- subset(mydata, ZYGOSITY=="1", selVars)
ageT1MZ <- as.vector(subset(mydata, ZYGOSITY=="0", AGE1))
ageT2MZ <- as.vector(subset(mydata, ZYGOSITY=="0", AGE1b))
ageT1DZ <- as.vector(subset(mydata, ZYGOSITY=="1", AGE1))
ageT2DZ <- as.vector(subset(mydata, ZYGOSITY=="1", AGE1b))
sexT1MZ <- as.vector(subset(mydata, ZYGOSITY=="0", SEX))
sexT2MZ <- as.vector(subset(mydata, ZYGOSITY=="0", SEXb))
sexT1DZ <- as.vector(subset(mydata, ZYGOSITY=="1", SEX))
sexT2DZ <- as.vector(subset(mydata, ZYGOSITY=="1", SEXb))
mzData <- data.frame(mzpData, ageT1MZ, ageT2MZ, sexT1MZ, sexT2MZ)
dzData <- data.frame(dzpData, ageT1DZ, ageT2DZ, sexT1DZ, sexT2DZ)
bivACEModel <- mxModel("bivACE",
mxModel("ACE",
# Matrices a, c, and e to store a, c, and e path coefficients
mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE, values=.8, labels=c("a11","a21","a22"), name="a" ),
mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE, values=.8, labels=c("c11","c21","c22"),name="c" ),
mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE, values=.8, labels=c("e11","e21","e22"),name="e" ),
# Matrices a, c, and e to store a, c, and e path coefficients
mxAlgebra( expression=a %*% t(a), name="A" ),
mxAlgebra( expression=c %*% t(c), name="C" ),
mxAlgebra( expression=e %*% t(e), name="E" ),
# Declare a matrix for the definition variable regression parameters, called beta
mxMatrix( type="Full", nrow=1, ncol=2, free=TRUE, values= 0, label=c("betaAge", "betaSex"), name="beta"),
# Matrices A, C, and E compute variance components
mxAlgebra( expression=A+C+E, name="V" ),
mxMatrix( type="Iden", nrow=nv, ncol=nv, name="I"),
mxAlgebra( expression=solve(sqrt(I*V)), name="iSD"),
mxAlgebra(iSD %*% a, name="sta"),
mxAlgebra(iSD %*% c, name="stc"),
mxAlgebra(iSD %*% e, name="ste"),
mxAlgebra(A/V, name="StandardizedA"),
mxAlgebra(C/V, name="StandardizedC"),
mxAlgebra(E/V, name="StandardizedE"),
mxAlgebra(solve(sqrt(I*A)) %&% A, name="CorA"),
mxAlgebra(solve(sqrt(I*C)) %&% C, name="CorC"),
mxAlgebra(solve(sqrt(I*E)) %&% E, name="CorE"),
mxAlgebra(solve(sqrt(I*V)) %&% V, name="CorP"),
## Note that the rest of the mxModel statements do not change for bivariate/multivariate case
# Matrix & Algebra for expected means vector
mxMatrix( type="Full", nrow=1, ncol=nv, free=TRUE, values= 20, name="Mean" ),
mxAlgebra( expression= cbind(Mean,Mean), name="expMean"),
# Algebra for expected variance/covariance matrix in MZ
mxAlgebra( expression= rbind ( cbind(A+C+E , A+C),
cbind(A+C , A+C+E)), name="expCovMZ" ),
# Algebra for expected variance/covariance matrix in DZ, note use of 0.5, converted to 1*1 matrix
mxAlgebra( expression= rbind ( cbind(A+C+E , 0.5%x%A+C),
cbind(0.5%x%A+C , A+C+E)), name="expCovDZ" )
),
mxModel("MZ",
mxData( observed=mzData, type="raw" ),
# Algebra for making the means a function of the definition variables age and sex
mxMatrix( type="Full", nrow=2, ncol=2, free=F, label=c("data.ageT1MZ","data.sexT1MZ","data.ageT2MZ","data.sexT2MZ"), name="MZDefVars"),
mxAlgebra( expression=ACE.expMean + ACE.beta %*% MZDefVars, name="expMeanMZ"),
mxFIMLObjective( covariance="ACE.expCovMZ", means="expMeanMZ", dimnames=selVars ) ),
mxModel("DZ", mxData( observed=dzData, type="raw" ),
mxMatrix( type="Full", nrow=2, ncol=2, free=F, label=c("data.ageT1DZ","data.sexT1DZ","data.ageT2DZ","data.sexT2DZ"), name="DZDefVars"),
mxAlgebra( expression=ACE.expMean + ACE.beta %*% DZDefVars, name="expMeanDZ"),
mxFIMLObjective( covariance="ACE.expCovDZ", means="expMeanDZ", dimnames=selVars ) ),
mxAlgebra( expression=MZ.objective + DZ.objective, name="minus2sumloglikelihood" ),
mxAlgebraObjective("minus2sumloglikelihood"),
mxCI(c('ACE.sta', 'ACE.stc', 'ACE.ste')),
mxCI(c('ACE.StandardizedA', 'ACE.StandardizedC', 'ACE.StandardizedE')),
mxCI(c('ACE.CorA', 'ACE.CorC', 'ACE.CorE', 'ACE.CorP'))
)
When I run it, I get the error:
Error: The definition variable 'MZ.data.ageT1MZ' in matrix 'MZ.MZDefVars' refers to a data set that does not contain a column with name 'ageT1MZ'
I've looked at this for so long - am I missing an obvious problem with how I've set up my data or made the modification to the bivariate case?
Log in or register to post comments
In reply to Starting with Two Covariates & Bivariate by JuliaJoplin
Data column names?
colnames(mzData)
orhead(mzData)
at the R prompt). This matters because everything after"data."
in a definition-variable label has to be a column name of the raw dataset.You could try defining the dataframe differently, say,
mzData <- data.frame(mzpData, ageT1MZ=ageT1MZ, ageT2MZ=ageT2MZ, sexT1MZ=sexT1MZ, sexT2MZ=sexT2MZ)
Log in or register to post comments
In reply to Data column names? by AdminRobK
Ah, thanks! I will try this
It would be great to try to make the other adjustments I make when I use it. Obviously, to include the third covariate for age, I need to add a third variable with names like "secondageT1MZ, secondageT2MZ" etc.
So I would add the third variable in here, changing ncol to 3
# Declare a matrix for the definition variable regression parameters, called beta
mxMatrix( type="Full", nrow=1, ncol=2, free=TRUE, values= 0, label=c("betaAge", "betaSex"), name="beta"),
And then add it down here, making the matrix nrow = 3, ncol = 2? The part I'm missing is how I should make the first age variable only apply to the first variable and the second age variable only apply to the second variable.
mxModel("MZ",
mxData( observed=mzData, type="raw" ),
# Algebra for making the means a function of the definition variables age and sex
mxMatrix( type="Full", nrow=2, ncol=2, free=F, label=c("data.ageT1MZ","data.sexT1MZ","data.ageT2MZ","data.sexT2MZ"), name="MZDefVars"),
mxAlgebra( expression=ACE.expMean + ACE.beta %*% MZDefVars, name="expMeanMZ"),
mxFIMLObjective( covariance="ACE.expCovMZ", means="expMeanMZ", dimnames=selVars ) ),
mxModel("DZ", mxData( observed=dzData, type="raw" ),
mxMatrix( type="Full", nrow=2, ncol=2, free=F, label=c("data.ageT1DZ","data.sexT1DZ","data.ageT2DZ","data.sexT2DZ"), name="DZDefVars"),
mxAlgebra( expression=ACE.expMean + ACE.beta %*% DZDefVars, name="expMeanDZ"),
mxFIMLObjective( covariance="ACE.expCovDZ", means="expMeanDZ", dimnames=selVars ) ),
Log in or register to post comments
In reply to Ah, thanks! I will try this by JuliaJoplin
If I were you, I would make
cbind()
ing together the expressions for each data column's model-expected mean.A completely different alternative would be to set up the model using RAM path specification. Adjusting for covariates would be much simpler if you just treat them as manifest variables, like the phenotypes, and declare the requisite single-headed arrows from each covariate to each phenotype. It's also a better way of dealing with missing data on the covariates. Of course, specifying the Cholesky parameterization can be rather tedious with MxPaths, but then you don't HAVE to set up the model for the covariance structure that way--any valid parameterization would do--but the Cholesky helps to ensure the model-expected covariance matrix stays in the parameter space.
Log in or register to post comments
In reply to If I were you, I would make by AdminRobK
Really appreciate
# Declare a matrix for the definition variable regression parameters, called beta
mxMatrix( type="Full", nrow=1, ncol=2, free=TRUE, values= 0, label=c("bage_caster", "bage_ogo"), name="betaage"),
mxMatrix( type="Full", nrow=1, ncol=2, free=TRUE, values= 0, label=c("bsex_caster", "bsex_ogo"), name="betasex"),
mxMatrix( type="Full", nrow=1, ncol=2, free=TRUE, values= 0, label=c("bage2_caster", "bage2_ogo"), name="betaage2"),
and then the final parts where I include the definition variables. Here's an example of how it looks for DZ female twins.
mxModel("DZf", mxData( observed=dzfData, type="raw" ),
mxMatrix( type="Full", nrow=1, ncol=2, free=FALSE, label=c("data.ageT1", "data.ageT2"), name="obs_age" ),
mxMatrix( type="Full", nrow=1, ncol=2, free=FALSE, label=c("data.age2T1", "data.age2T2"), name="obs_age2" ),
mxMatrix( type="Full", nrow=1, ncol=2, free=FALSE, label=c("data.sexT1", "data.sexT2"), name="obs_sex" ),
mxAlgebra( expression= ACE.expMeanF + ACE.betasex %x% obs_sex + ACE.betaage %x% obs_age + ACE.betaage2 %x% obs_age2, name="expMeanF"),
mxFIMLObjective( covariance="ACE.expCovDZf", means="expMeanF", dimnames=selVars ) ),
How does this look? Then, to make it so that only age applies to the first variable and age2 applies to the second variable, I do:
bivACE2Model <- omxSetParameters( bivACEModel , labels=c("bage_ogo", "bage2_caster"), free=FALSE, values=0 )
bivACEFit <- mxRun(bivACE2Model)
bivACESumm <- summary(bivACEFit)
Log in or register to post comments
In reply to Really appreciate by JuliaJoplin
continued
11 bage_caster ACE.betaage 1 1 -0.0001455675 1.165522e-04
12 bsex_caster ACE.betasex 1 1 -8.6169894850 NA
13 bsex_ogo ACE.betasex 1 2 -8.6098029796 NA
14 bage2_ogo ACE.betaage2 1 2 -0.0001212564 7.210192e-05
15 meanM ACE.Mm 1 1 0.2195347776 2.927471e-02
16 meanF ACE.Mf 1 1 8.4248059740 NA
1) Any ideas why it says NA for the standard error for betasex and meanF? All my twins are same sex twins, if that matters.
2) I know that it is important to adjust for age and sex in these models. Is it standard practice to adjust for an age*sex interaction as well? Is it it important to also adjust for age^squared, or is that only important if there's a lot of variability in the age?
3) Does the warning message about mxFIMLObjective actually mean anything about my results? I know warnings are different than errors and that the new code is a slightly different process, but I wasn't sure if it would actually impact my findings.
Log in or register to post comments
In reply to continued by JuliaJoplin
Sorry about the spam filter.
I'm looking at your syntax for creating the model-expected means vector. You're Krönecker-multiplying a 1x2 matrix by a 1x2 matrix. That will give you a 1x4 result. But if I understand you correctly, you have two phenotypes, assessed at two timepoints, in twins. Therefore, the mean vector should have 2x2x2=8 elements. But I also notice your
selVars
object contains only four variable names, when it seems to me it should contain 8. Who's confused here--me or you?Edit: I think I'm the one confused. You have two phenotypes, but one phenotype was assessed at a different time from the other, right? If so, that would make a 1x4 result right.
The model is unidentified; you have more parameters than you need. If you're estimating separate intercepts by sex, then there's no need to also include a regression coefficient for sex. You've already implicitly adjusted for sex by estimating separate intercepts.
I don't see that done on a routine basis, but it's worth considering if you have a priori reason to believe that higher-order terms are of concern.
Nope.
Log in or register to post comments
In reply to Sorry about the spam filter. by AdminRobK
Thank you! I'll probably
And yes, you're right -- it was just two phenotypes, each assessed at a different time.
Log in or register to post comments
In reply to Sorry about the spam filter. by AdminRobK
Interesting finding?
For my residualized variable, the MZ and DZ twin correlations are each .06 lower. Not a huge amount, but it actually surprised me it would be that much.
I also wondered - at the univariate level, should there be any difference between what I get when I use this version of the syntax:
# Declare a matrix for the definition variable regression parameters, called beta
mxMatrix( type="Full", nrow=1, ncol=1, free=TRUE, values= 0, label=c("bage_caster"), name="betaage"),
mxMatrix( type="Full", nrow=1, ncol=1, free=TRUE, values= 0, label=c("bsex_caster"), name="betasex"),
with:
mxModel("MZm",
mxData(observed=mzData , type="raw"),
mxMatrix( type="Full", nrow=1, ncol=2, free=FALSE, label=c("data.ageT1", "data.ageT2"), name="obs_age" ),
mxMatrix( type="Full", nrow=1, ncol=2, free=FALSE, label=c("data.sexT1", "data.sexT2"), name="obs_sex" ),
mxAlgebra( expression= ACE.expMean + ACE.betasex %x% obs_sex + ACE.betaage %x% obs_age, name="expMean"),
mxFIMLObjective( covariance="ACE.expCovMZm", means="expMean", dimnames=selVars ) ),
compared to this other version:
with this:# Declare a matrix for the definition variable regression parameters, called beta
mxMatrix( type="Full", nrow=1, ncol=2, free=TRUE, values= 0, label=c("bage", "betasex"), name="beta"),
mxModel("MZm",
mxData(observed=mzData , type="raw"),
mxMatrix( type="Full", nrow=2, ncol=2, free=FALSE, label=c("data.ageT1", "data.ageT2", "data.sexT1", "data.sexT2"), name="DefVars" ),
mxAlgebra( expression= ACE.expMean + ACE.beta %*%DefVars, name="expMean"),
mxFIMLObjective( covariance="ACE.expCovMZm", means="expMean", dimnames=selVars ) ),
The estimates are very very close but not identical.
Log in or register to post comments
In reply to Interesting finding? by JuliaJoplin
Very very close definition
Log in or register to post comments
In reply to Interesting finding? by JuliaJoplin
by row...?
FALSE
, and you don't use thebyrow
argument tomxMatrix()
anywhere in your syntax. So if the global option is set toFALSE
, your syntaxmxMatrix( type="Full", nrow=2, ncol=2, free=FALSE, label=c("data.ageT1", "data.ageT2", "data.sexT1", "data.sexT2"), name="DefVars" )
is going to give you an MxMatrix the labels matrix of which looks like
[,1] [,2]
[1,] "data.ageT1" "data.sexT1"
[2,] "data.ageT2" "data.sexT2"
And thus, the matrix product
ACE.beta %*%DefVars
would result incbind(bage*data.ageT1 + betasex*data.ageT2, bage*data.sexT1 + betasex*datasexT2)
which isn't what you want.
Log in or register to post comments
In reply to by row...? by AdminRobK
That was it!
It's still really interesting to me how the residualized variables end up with a fairly different result (albeit residualized through the OLS method that doesn't correct for correlated errors). In contrast, the results for the model that incorporates covariates are almost exactly the same as a model that does not. Definitely would love to read a paper on this, if you know of any.
Log in or register to post comments
In reply to That was it! by JuliaJoplin
In contrast, the results for
Really? So the regression coefficients for age and sex are nearly zero?
I don't know of a paper on the matter that is relevant to twin analyses per se, but you might find any text on feasible generalized least-squares regression to be informative. Anyhow, look at it this way. We can fully decompose the variance of a phenotype into a component due to A, component due to C, a component due to E, a component due to age, and a component due to sex. The latter two are equal to the squared regression coefficient times the variance of the covariate. Now, the OLS estimator of the coefficients is the MLE when the residuals are stochastically independent and normally distributed with constant variance. If you use OLS to partial out the two covariates, you are implicitly obtaining MLEs of the coefficients, and thus of the variance attributable to the predictors, from a model in which the residual covariance structure is misspecified (OLS "thinks" that all variance not attributable to covariates is due to E). You're then using residuals calculated from such a model in a different model, with a different covariance structure, for the purpose of estimating the components due to A, C, and E. The resulting MLEs of the biometric variance components are conditional on estimates of the effects of the covariates obtained from a different--and for any trait with nonzero familial variance, WRONG--model.
In contrast, when you estimate the regression coefficients in the same MxModel as the biometric variance components, you are getting estimates that maximize the joint likelihood of all unknown parameters. That is a much more theoretically justifiable way to proceed.
Again, I'd be surprised if using one approach or the other makes a large difference in someone's results, but there can be no question which is "best practice". The "regress out covariates and biometrically analyze the residuals" method is a relic from the 1980s, and in routine twin analyses, there's no good reason to use it in TYOOL 2016.
Log in or register to post comments
In reply to In contrast, the results for by AdminRobK
Thanks
Another question has come up related to this model. In addition to running the bivariate model, I want to run univariate ACE models that correct for age and sex, and prior to that, I want to run saturated models that test assumptions about constraining means, variances to be equal across twin order and zygosity, and all that jazz. I'm using some syntax that's heavily based on the code that's found right here:
http://www.vipbg.vcu.edu/media/course/HGEN619_2015/twinSatAdeCon.R
I think the only difference I have compared to what's in this syntax is here (since I also want to control for gender):
# Matrices for Covariates and linear Regression Coefficients
BAGE <- mxMatrix( type="Full", nrow=1, ncol=1, free=TRUE, values= 0, label=c("bage"), name="beage")
BSEX <- mxMatrix( type="Full", nrow=1, ncol=1, free=TRUE, values= 0, label=c("bsex"), name="besex")
OBSAGE <- mxMatrix( type="Full", nrow=1, ncol=2, free=FALSE, label=c("data.ageTw1", "data.ageTw2"), name="TwAge" )
OBSEX <- mxMatrix( type="Full", nrow=1, ncol=2, free=FALSE, label=c("data.sexTw1", "data.sexTw2"), name="TwSex" )
and then here:
expMeanMZ <- mxAlgebra( expression= meanMZ + besex %x%TwSex + beage %x%TwAge, name="expMeanMZ" )
expMeanDZ <- mxAlgebra( expression= meanDZ + besex %x%TwSex + beage %x%TwAge, name="expMeanDZ" )
Following the steps here http://www.vipbg.vcu.edu/media/course/HGEN619_2015/twinSatAdeCon.R, when I constrain means and variances to be equal across twin order and zygosity and then I do a call for meanMZ, meanDZ, expMeanMZ, and expMeanDZ (e.g., eqMVarsZygFit$MZ.expMeanMZ$result), I see that I have successfully constrained meanMZ to be the same as meanDZ, but (because of the age/gender corection), expMeanMZ is slightly different than expMeanDZ. Then, in the ACE model (similar to the syntax for the bivariate I listed above), I only have a term for one mean (expMean).
Am I missing something conceptually, or is the method I've followed for assumption testing sufficient? I'm just confused that if I was doing this without the age-gender correction, expMeanMZ would = expMeanDZ, and that would match the ACE model assumption to only specify expMean.
Log in or register to post comments
In reply to Thanks by JuliaJoplin
definition variables
expMeanMZ
andexpMeanDZ
both are defined in terms of "definition variables," (i.e. age and sex), which, depending on how precisely age is measured, could potentially be different for every row of your dataset. Per the documentation formxAlgebra()
, the$result
of an algebra that depends upon definition variables will be returned frommxRun()
as computed using the definition-variable values from the first row of the dataset. That's why, generally,expMeanMZ
andexpMeanDZ
won't be equal.Anyhow, you're using the same regression coefficients for age and sex in both zygosity groups, and you've constrained the intercepts (
meanMZ
andmeanDZ
) equal, right? If so, then you've equated the parameters that define the model for the phenotypic mean in both groups, which suffices to test the assumption of "equal means" for MZ and DZ twins.Log in or register to post comments
In reply to definition variables by AdminRobK
Got it
One oddity I'm seeing that is confusing to me -
When I run the saturated model and subsequent submodels (this happens to be output where means and variances have been constrained to be equal across twin order and zygosity, and then variances are constrained to be equal across sex, but I'm getting similar results in the saturated model), I get something like this:
free parameters:
name matrix row col Estimate Std.Error A lbound ubound
1 bage MZm.betaage 1 1 0.3641081 0.03600628
2 bsex MZm.betasex 1 1 -0.2402078 0.04923547
3 mZ MZm.meanMZ 1 1 -6.3533460 0.64029086
4 vZ MZm.expCovMZm var1 var1 0.9287792 0.03110761 1e-04
5 c21MZ MZm.expCovMZm var1 var2 0.6671274 0.03217772 0
6 c21DZ DZm.expCovDZm var1 var2 0.5074411 0.03779345 0
It doesn't make any sense that the estimate for MZm.meanMZ should be -6.35 -- this variable has been standardized.
The other oddity is that when I request confidence intervals in this model, it says that this:
-469.0193463 -370.3375915 -271.5625057
is the range for
MZm.expMeanMZ[1,2]
I've checked the data and I don't get any errors when I run it. Any ideas?
Log in or register to post comments
In reply to Got it by JuliaJoplin
The other oddity is that when
Again, the algebras you've named
expMeanMZ
andexpMeanDZ
depend upon definition variables. You are requesting a CI for a quantity that may possibly be identified by only a single twin pair. CIs for free parameters, likebeage
ormeanDZ
will be much more useful and meaningful.Keep in mind that
MZm.meanMZ
is a regression intercept. It's the predicted value of the phenotype when the predictors equal zero. It's not necessarily easily interpretable, especially if it's been linearly transformed (i.e., transformed from its natural metric). If you want, you could obtain OLS coefficients for the phenotype regressed onto age (and sex) and compare; only worry if there's a huge discrepancy between those results and what you get from OpenMx.BTW, I want to thank you for submitting your questions publicly!! I have a feeling that quite a few users will find this thread informative!
Edit: I reworded my description of an intercept.
Log in or register to post comments
In reply to The other oddity is that when by AdminRobK
On the contrary, thank you
Log in or register to post comments
In reply to That was it! by JuliaJoplin
McGue, M., & Bouchard (1984) on adjusting for age and sex
McGue, M., & Bouchard, T. J., Jr. (1984). Adjustment of twin data for the effects of age and sex. Behavior Genetics, 14, 325-343. doi:10.1007/BF01080045
Log in or register to post comments
In reply to McGue, M., & Bouchard (1984) on adjusting for age and sex by tbates
Another source
Neale, M.C. & Martin, N.G. (1989) The effects of age, sex and genotype on self-reports drunkenness following a challenge dose of alcohol. Behavior Genetics 19: 63-78.
and
Vlietinck, R., Derom, R., Neale, M.C., Maes, H., Van Loon, H., Van Maele, G. Derom, C. & Thiery, M. (1989) Genetic and environmental variation in the birthweight of twins. Behavior Genetics 19: 151-161.
Log in or register to post comments
Where to model covariates: Before, During, or Alongside
As Rob said, the results are similar, and best practice is unambiguously to incorporate the regression into the OpenMx script.
That said, residualizing can be done simply and very flexibly using lm, even controlling for the family structure with
lmer
.The most important thing is to model the means correctly. In my experience, people are more likely to use the correct means model when they do this as an earlier step: This is because writing
wt ~ age + sex + age^2 + age^3
is easier to implement, modify (and debug) than is implementing the same equation in an mxModel*, and therefore more likely to get done. You also have the benefit of having the residual data that you model variances of available outside the model for plotting etc.umx_residualize
is a beginning at implementing this (lm only at present, but can detect wide data, and residualize the whole data set with a single set of regression weights (awful to know that sometimes people will make the mistake of residualizing twin 1 separate from twin 2 or MZs separate from DZs...)The answer longer term, IMHO, is to allow users to specify the means model as a formula in a twin model.
Note 1: If you have any missing data, you might also want to explore modelling the covariates in the variance model. This exploits the power of FIML: Where your
lm
(ordata.definitionVariable
) approach will delete all subjects (and therefore effectively all families) where any single covariate is missing, the variance modelling approach will retain all information from your precious family data, where lm and def var approaches can't.umxACEcov
implements this (see below)Note 2For ordinal variables, you have no choice but to model the means or include the covariates in the model (residualized sex doesn't make a lot of sense :-) ).
Note 2Relevant helpers in the
umx
package are:umx_residualize
(works on wide data and supports formula interface) andumxACEcov
(treat this as beta-level code: Feedback welcome, as it is hoped that by year end all umx twin models will support both styles of covariate modeling).* In OpenMx, each change to the covariate model will require re-sizing and re-labeling matrices to hold the beta weights and data.parameters, as well as re-writing the algebras to compute the means. At run time, the raw data model has to update the betas along with all the other free parameters, making things slower. Moving between phenotypes, and incorporating more complex means models into your data can be error prone.
Log in or register to post comments
In reply to Where to model covariates: Before, During, or Alongside by tbates
Comment on tbates' note 1
That is true, unless the data.definitionVariable is dummy coded - say as 999999 - for those persons in the family (i.e., row of the data) who lack both definition variables and phenotypes. In this case, the whole record would not be deleted and the FIML advantage retained. However, this approach is more demanding of the user.
A very clever OpenMx version might detect this situation automatically, dummy code the definition variable and not delete the record just because the covariates were missing for a variable that itself was missing. However, it is difficult to do this in the general case because the algebras would need to be understood, or in some regular constrained form. One algorithm might be to examine rows with definition variables, take some random start values for all the parameters, and evaluate the likelihood with a variety of random values for the definition variables. In the case of no change in likelihood when the missing definition variables are set to very different values, then almost surely there would be no problem in retaining the row of data with missing definition variables. This type of pre-processing might end up being computationally intensive, when the user could have done it themselves relatively simply with [ ] operators. Anyone got a generic function for this? Something like
dummyCovariateWhereDependentVarAlsoMissing(mydata, dependentVars, definitionVars)
This could ease the issue for both ordinal and continuous cases, but it would require user insight to the data and the model and would be subject to user error. The algorithmic approach could return manipulated dataset (which would be extractable from model$data).
Log in or register to post comments
In reply to Comment on tbates' note 1 by neale
umxPadAndPruneForDefVars
umxPadAndPruneForDefVars
moves in that direction...The approach was to replace is.na(defVar) with an out-of-range constant when all DVs are missing, plus an option to insert mean(defVar) where there was some data. The latter clearly often not valid, but useful to compare parameter estimates between larger and smaller analyses. But I never used this in practice.
I ended up believing that the best approach to preserve data would be to split the data up into models according to missingness on the definition variable list. Then automatically (life's too short) build as much of the means algebra as possible in each case.
Then run them all in under a multi-group container, with parameter estimates equated across the sub-models. This seemed best to me.
Log in or register to post comments
In reply to Where to model covariates: Before, During, or Alongside by tbates
Very interesting and helpful
Log in or register to post comments
In reply to Where to model covariates: Before, During, or Alongside by tbates
Covariates in the covariance structure
We should start teaching that approach in our courses, workshops, etc.
Log in or register to post comments