You are here

Residuals in ACE model with covariates

7 posts / 0 new
Last post
Aurina's picture
Offline
Joined: 01/05/2016 - 01:12
Residuals in ACE model with covariates
AttachmentSize
Binary Data test_height.R5.55 KB

Hello all,

I am really new in twin modeling as well as in R, therefore I would be very grateful if someone could advise me on a few issues regarding the script.

I am trying to run an ACE model with covariates (sex and age) on some twin height data. It would be very helpful to know whether the script (attached: test_height.R) does not have any major flaws and does what is expected of it. And one more thing, I would like to plot the residuals after performing the regression on the data. Could you please help me to find the most straight forward way to do that?

Thanks a lot for your help!
Best Regards,

Aurina

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
It's difficult to evaluate a

It's difficult to evaluate a script without any data available to actually run it. Nonetheless, after quickly looking over it, it appears OK. Does it run, and if so, do its results look reasonable?

To obtain the regression residuals, you could try something like

mzresids <- matrix(NA,nrow=nrow(mzData),ncol=2)
for(i in 1:nrow(mzData)){
    mzresids[i,] <- as.vector(mzData[i,c("twin1","twin2")]) - as.vector(mxEval(expression=expMeanMZ,model=twinACEFit.MZ,compute=T,defvar.row=i))
}

but I'm not sure if both uses of as.vector() are necessary. Then, mzresids would be a matrix of residuals for the MZ twins. You would then do similarly for the DZ twins.

Aurina's picture
Offline
Joined: 01/05/2016 - 01:12
Thanks for your help a lot! I

Thanks for your help a lot!

I was trying to calculate something similar before, however there was the same error message:
when I am running the lines you recommended (as in the script attached), there is an error message regarding the model name:

Error in generateLabelsHelper(model, data.frame()) :
object 'twinACEFit.MZ' not found

However when I change model to
model=twinACEFit (all other lines the same)
the error message is regarding the expression:

Error in eval(expr, envir, enclos) : object 'expMeanMZ' not found

Maybe you know what went wrong in here? In general I have some difficulties understanding how to name and call for variables if they are not assigned as in here (estMean <- mxEval(mean, twinACEFit)) and just given names inside the model. Could you please help me with that, because all the variables I am trying to access from inside the model are "not found".

I couldn't attach the data file here, but please find the output (Model_output.png) and the model summary (Model_summary.png) screenshots attached. Do they look reasonable to you?

Thanks a lot for your time!

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
R symbols versus "Mxnames"

Oops, I meant twinACEFit$MZ, not twinACEFit.MZ.

In general I have some difficulties understanding how to name and call for variables if they are not assigned as in here (estMean <- mxEval(mean, twinACEFit)) and just given names inside the model. Could you please help me with that, because all the variables I am trying to access from inside the model are "not found".

I guess the most important thing to understand is that an MxModel's namespace is not the same thing as R's global environment. R's global environment is its "outermost frame;" it stores all the R objects you create via assignment. An MxModel's namespace comprises what I will call the "Mxnames" of its constituent parts. This namespace is not understood by R, and is only understood by OpenMx. For instance, on line 40 of your script, you create your MxModel. It is an R object, of S4 class 'MxModel', with R symbol 'twinACE'. It happens that you also gave it the Mxname 'twinACE', which matches its R symbol.

In Mxnames, the dot means "member of", a la C/C++. For instance, you use it in line 77 of your script to refer to the MxAlgebra with Mxname 'expCovDZ', which is a member of the MxModel with Mxname 'twinACE'. R does not interpret it that way; a period is just another character in an R symbol. In R, the $ is used to access elements of lists and columns of dataframes. With S4 classes, the behavior of $ can be defined as part of the class, and MxModels are defined so that $ allows the user to access objects within an MxModel by Mxname.

The reason why you need to use twinACEFit$MZ instead of twinACEFit.MZ in the mxEval() statement is that mxEval() needs an MxModel object as the value for argument model. Using twinACEFit$MZ provides it with the MxModel with Mxname 'MZ' contained within the MxModel with R symbol 'twinACEFit'. Using twinACEFit.MZ doesn't work, because there is no object with symbol 'twinACEFit.MZ' in R's global environment.

The reason why you get an error when using model=twinACEFit is that there is no object with Mxname 'expMeanMZ' contained in the matrices, algebras, etc. of twinACEFit. Instead, the algebra with Mxname 'expMeanMZ' is in a submodel, with Mxname 'MZ', which in turn is contained in twinACEFit.

That's the best explanation I can give. Unfortunately, it's not intuitive unless you know OpenMx really well, and even then you can still get confused (like I did).

Aurina's picture
Offline
Joined: 01/05/2016 - 01:12
A very good explanation -

A very good explanation - thanks so much! Very helpful!
However I've tried to run the script with twinACEFit$MZ and got an error that object twinACE.beta was not found. Then I thought that maybe it is looking for twinACE.beta matrix in a MZ submodel only (I am not sure if that's the case) and tried another way (the whole script attached: Test_Height_residuals.R):

mzresids  <- matrix(NA,nrow=nrow(mzData),ncol=2)
for(i in 1:nrow(mzData)){
  mzresids[i,] <- as.matrix(mzData[i,c("twin1","twin2")]) - as.matrix(mxEval(expression=twinACE.expMean + twinACE.beta %*% MZ.MZDefVars, model = twinACEFit, compute=T,defvar.row=i))
}

Now the model is twinACEFit and all names are called from its perspective. Here I've also changed as.vector to as.matrix, because there was an error

Error in mzresids[i, ] <- as.vector(mzData[i, c("twin1", "twin2")]) -  : 
  incorrect number of subscripts on matrix

and it looks that it solved the problem. The result attached in Twin_Height_residuals.png
Is it a correct way to do it?
I'am still not sure, why it didn't work before with expression being expMeanDZ and model twinACEFit$MZ as you recommended... Do you have any ideas on that?

Then I've run the same script on weight data (attached Twin_Weight_residuals.png) and now I have a basic question (this is the first time I am in contact with the concept of residuals, so this might be a really simple one) about the interpretation. Residuals in the case of height are much smaller (up to 5) than in the case of weight (up to 100): would it mean that the relationship between twins in the case of height is well defined by age and sex and it is poorly defined by these covariates in the case of weight?

And one more question - when running the same model without covariates on height additive genetic effects=0.26, however when including covariates it becomes 0.56. Therefore is my interpretation here is any close to plausible: the relationship between twins height here is well defined by the covariates only (age and sex is responsible for the correlation between twin1 and twin2), therefore when not including the into the model it drives down the weight of additive genetic effects? Before running this analysis I was expecting residuals for DZ to be much "flatter" than for MZ twins... (the relationship between DZ twins explained by covariates and between MZ twins it would be much less so).

Thank you so much for your help! It was incredibly helpful!

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
This thread is making me

This thread is making me realize that I've never closely looked at how mxEval() works. But, I'm glad you figured out an mxEval() call that does what you need. Nice work!

In the future, you can save yourself some headaches if you put all of the necessary matrices and algebras into the MZ and DZ submodels, with both submodels having an identical copy of each. Then, the only thing in the overall container model would be the multigroup fitfunction.

Residuals in the case of height are much smaller (up to 5) than in the case of weight (up to 100): would it mean that the relationship between twins in the case of height is well defined by age and sex and it is poorly defined by these covariates in the case of weight?

The difference you're observing could easily be due to the measuring units of the two traits. If weight is measured in kilos in your dataset, you could simply make the residuals smaller by converting to metric tons. Likewise, if height is measured in centimeters, you could make the residuals bigger by converting to millimeters. If you want to know how well each trait is predicted by age and sex, you should compare the variance estimates for each trait in models run with versus without the covariates. Basically, you'd want an index conceptually similar to a multiple R-squared.

when running the same model without covariates on height additive genetic effects=0.26, however when including covariates it becomes 0.56. Therefore is my interpretation here is any close to plausible: the relationship between twins height here is well defined by the covariates only (age and sex is responsible for the correlation between twin1 and twin2), therefore when not including the into the model it drives down the weight of additive genetic effects? Before running this analysis I was expecting residuals for DZ to be much "flatter" than for MZ twins...

Your plots look "OK" to me, in the sense that if you wanted to make scatterplots of twin #1's residuals and twin #2's residuals, you accomplished your goal. There's nothing wrong with inspecting residuals, but it's not clear from your posts whether there are specific questions you're trying to answer by doing so. Anyhow, I assume the figures of 0.26 and 0.56 are standardized variance components (i.e., variance proportions), correct? I would venture a guess that the reason why the proportion of variance attributable to additive-genetic effects is larger when you condition on the covariates is that the covariates are "soaking up" shared-environmental variance. Specifically, if both twins in a given pair in your dataset were always assessed at the same age, and you don't adjust for age, then the effect of age "looks like" shared-environmental variance, because it equally contributes within-pair resemblance of MZ and DZ twins. Because the three proportions need to sum to 1, C's loss is necessarily A's (and E's) gain. I think it will probably be more informative to look at the raw (i.e., unstandardized) variance components instead.

Aurina's picture
Offline
Joined: 01/05/2016 - 01:12
Thanks again! I've tried to

Thanks again!
I've tried to make copies of the necessary matrices in MZ and DZ submodels, however I didn't delete the original ones, therefore, it didn't work - I should have deleted the original ones as well.

The reason I wanted to look at residuals was to check how well covariates explain the correlation between twin1 and twin2 data (as a secondary measure just to ensure that the model does what it was supposed to be doing), therefore if the additive genetic effect is high I would expect MZ residuals to be much more correlated (the relationship is not fully explained by covariates) than in the case of DZ twins (where a big part of the correlation could be explained by covariates). It's true that all twins were examined at the same age, therefore it influenced the shared-environmental variance in both twin pairs.

After you comment about raw variance components I've checked it and it's exactly as you said - when the model doesn't include covariates, shared-environmental variance is very high (10) in comparison to additive genetic variance (4) and with covariates the number are 2.4 and 4 respectively.

I am very grateful for your help - it saved me a lot of time and explained lots of things I had doubts about.

Best Regards,
Aurina