Attachment | Size |
---|---|
![]() | 5.55 KB |
Hello all,
I am really new in twin modeling as well as in R, therefore I would be very grateful if someone could advise me on a few issues regarding the script.
I am trying to run an ACE model with covariates (sex and age) on some twin height data. It would be very helpful to know whether the script (attached: test_height.R) does not have any major flaws and does what is expected of it. And one more thing, I would like to plot the residuals after performing the regression on the data. Could you please help me to find the most straight forward way to do that?
Thanks a lot for your help!
Best Regards,
Aurina
It's difficult to evaluate a script without any data available to actually run it. Nonetheless, after quickly looking over it, it appears OK. Does it run, and if so, do its results look reasonable?
To obtain the regression residuals, you could try something like
but I'm not sure if both uses of
as.vector()
are necessary. Then,mzresids
would be a matrix of residuals for the MZ twins. You would then do similarly for the DZ twins.Thanks for your help a lot!
I was trying to calculate something similar before, however there was the same error message:
when I am running the lines you recommended (as in the script attached), there is an error message regarding the model name:
Error in generateLabelsHelper(model, data.frame()) :
object 'twinACEFit.MZ' not found
However when I change model to
model=twinACEFit (all other lines the same)
the error message is regarding the expression:
Error in eval(expr, envir, enclos) : object 'expMeanMZ' not found
Maybe you know what went wrong in here? In general I have some difficulties understanding how to name and call for variables if they are not assigned as in here (estMean <- mxEval(mean, twinACEFit)) and just given names inside the model. Could you please help me with that, because all the variables I am trying to access from inside the model are "not found".
I couldn't attach the data file here, but please find the output (Model_output.png) and the model summary (Model_summary.png) screenshots attached. Do they look reasonable to you?
Thanks a lot for your time!
Oops, I meant
twinACEFit$MZ
, nottwinACEFit.MZ
.I guess the most important thing to understand is that an MxModel's namespace is not the same thing as R's global environment. R's global environment is its "outermost frame;" it stores all the R objects you create via assignment. An MxModel's namespace comprises what I will call the "Mxnames" of its constituent parts. This namespace is not understood by R, and is only understood by OpenMx. For instance, on line 40 of your script, you create your MxModel. It is an R object, of S4 class 'MxModel', with R symbol 'twinACE'. It happens that you also gave it the Mxname 'twinACE', which matches its R symbol.
In Mxnames, the dot means "member of", a la C/C++. For instance, you use it in line 77 of your script to refer to the MxAlgebra with Mxname 'expCovDZ', which is a member of the MxModel with Mxname 'twinACE'. R does not interpret it that way; a period is just another character in an R symbol. In R, the
$
is used to access elements of lists and columns of dataframes. With S4 classes, the behavior of$
can be defined as part of the class, and MxModels are defined so that$
allows the user to access objects within an MxModel by Mxname.The reason why you need to use
twinACEFit$MZ
instead oftwinACEFit.MZ
in themxEval()
statement is thatmxEval()
needs an MxModel object as the value for argumentmodel
. UsingtwinACEFit$MZ
provides it with the MxModel with Mxname 'MZ' contained within the MxModel with R symbol 'twinACEFit'. UsingtwinACEFit.MZ
doesn't work, because there is no object with symbol 'twinACEFit.MZ' in R's global environment.The reason why you get an error when using
model=twinACEFit
is that there is no object with Mxname 'expMeanMZ' contained in the matrices, algebras, etc. of twinACEFit. Instead, the algebra with Mxname 'expMeanMZ' is in a submodel, with Mxname 'MZ', which in turn is contained in twinACEFit.That's the best explanation I can give. Unfortunately, it's not intuitive unless you know OpenMx really well, and even then you can still get confused (like I did).
A very good explanation - thanks so much! Very helpful!
However I've tried to run the script with twinACEFit$MZ and got an error that object twinACE.beta was not found. Then I thought that maybe it is looking for twinACE.beta matrix in a MZ submodel only (I am not sure if that's the case) and tried another way (the whole script attached: Test_Height_residuals.R):
Now the model is twinACEFit and all names are called from its perspective. Here I've also changed as.vector to as.matrix, because there was an error
and it looks that it solved the problem. The result attached in Twin_Height_residuals.png
Is it a correct way to do it?
I'am still not sure, why it didn't work before with expression being expMeanDZ and model twinACEFit$MZ as you recommended... Do you have any ideas on that?
Then I've run the same script on weight data (attached Twin_Weight_residuals.png) and now I have a basic question (this is the first time I am in contact with the concept of residuals, so this might be a really simple one) about the interpretation. Residuals in the case of height are much smaller (up to 5) than in the case of weight (up to 100): would it mean that the relationship between twins in the case of height is well defined by age and sex and it is poorly defined by these covariates in the case of weight?
And one more question - when running the same model without covariates on height additive genetic effects=0.26, however when including covariates it becomes 0.56. Therefore is my interpretation here is any close to plausible: the relationship between twins height here is well defined by the covariates only (age and sex is responsible for the correlation between twin1 and twin2), therefore when not including the into the model it drives down the weight of additive genetic effects? Before running this analysis I was expecting residuals for DZ to be much "flatter" than for MZ twins... (the relationship between DZ twins explained by covariates and between MZ twins it would be much less so).
Thank you so much for your help! It was incredibly helpful!
This thread is making me realize that I've never closely looked at how
mxEval()
works. But, I'm glad you figured out anmxEval()
call that does what you need. Nice work!In the future, you can save yourself some headaches if you put all of the necessary matrices and algebras into the MZ and DZ submodels, with both submodels having an identical copy of each. Then, the only thing in the overall container model would be the multigroup fitfunction.
The difference you're observing could easily be due to the measuring units of the two traits. If weight is measured in kilos in your dataset, you could simply make the residuals smaller by converting to metric tons. Likewise, if height is measured in centimeters, you could make the residuals bigger by converting to millimeters. If you want to know how well each trait is predicted by age and sex, you should compare the variance estimates for each trait in models run with versus without the covariates. Basically, you'd want an index conceptually similar to a multiple R-squared.
Your plots look "OK" to me, in the sense that if you wanted to make scatterplots of twin #1's residuals and twin #2's residuals, you accomplished your goal. There's nothing wrong with inspecting residuals, but it's not clear from your posts whether there are specific questions you're trying to answer by doing so. Anyhow, I assume the figures of 0.26 and 0.56 are standardized variance components (i.e., variance proportions), correct? I would venture a guess that the reason why the proportion of variance attributable to additive-genetic effects is larger when you condition on the covariates is that the covariates are "soaking up" shared-environmental variance. Specifically, if both twins in a given pair in your dataset were always assessed at the same age, and you don't adjust for age, then the effect of age "looks like" shared-environmental variance, because it equally contributes within-pair resemblance of MZ and DZ twins. Because the three proportions need to sum to 1, C's loss is necessarily A's (and E's) gain. I think it will probably be more informative to look at the raw (i.e., unstandardized) variance components instead.
Thanks again!
I've tried to make copies of the necessary matrices in MZ and DZ submodels, however I didn't delete the original ones, therefore, it didn't work - I should have deleted the original ones as well.
The reason I wanted to look at residuals was to check how well covariates explain the correlation between twin1 and twin2 data (as a secondary measure just to ensure that the model does what it was supposed to be doing), therefore if the additive genetic effect is high I would expect MZ residuals to be much more correlated (the relationship is not fully explained by covariates) than in the case of DZ twins (where a big part of the correlation could be explained by covariates). It's true that all twins were examined at the same age, therefore it influenced the shared-environmental variance in both twin pairs.
After you comment about raw variance components I've checked it and it's exactly as you said - when the model doesn't include covariates, shared-environmental variance is very high (10) in comparison to additive genetic variance (4) and with covariates the number are 2.4 and 4 respectively.
I am very grateful for your help - it saved me a lot of time and explained lots of things I had doubts about.
Best Regards,
Aurina