doubt with univariate moderation result
Posted on
Chen SJ
Joined: 04/02/2020
Attachment | Size |
---|---|
syntax of the univariate moderation.txt | 12.76 KB |
the result of moderation models.jpg | 264.17 KB |
Forums
Hi everyone!
I am trying to run a univariate moderation model with binary moderator(smoke or not) and binary outcome(T2DM or not). However, I have doubts about the result. I have run a full moderation model ( I chose the ACE model, because when calculating the heritability of T2DM and smoke separately, the ACE model was the best fitted model for both of them) and several nested models. Nested models are compared with the full model by Chi square test. The best model is chosed by the Chi square test(p>0.05) and the smallest AIC.
I am trying to run a univariate moderation model with binary moderator(smoke or not) and binary outcome(T2DM or not). However, I have doubts about the result. I have run a full moderation model ( I chose the ACE model, because when calculating the heritability of T2DM and smoke separately, the ACE model was the best fitted model for both of them) and several nested models. Nested models are compared with the full model by Chi square test. The best model is chosed by the Chi square test(p>0.05) and the smallest AIC.
Here is my doubt:
(1) In the full model(ACE_XYZ), the moderation on the addictive genetic variance (βa) is not significant( CI contains 0), while in the best fitted model(ACE_XZ), βa is significant. That's so weird.
(2) In the ACE_YZ model, βe is not significant. However, when dropping βe, the ACE_Y model is significantly worse than the full model(p<0.05).
I am wondering what causes the phenomenon described above. Is my result reliable? How should I choose the best fitting model? The result of my moderation model and the synax I used are in the supplementary material.
I'll be really appreciated if anyone can give some oppinions or suggestions!
convergence
That doesn't surprise me a whole lot. You're estimating one less free parameter, so your estimates of the free parameters will tend to have smaller sampling variance. Also, notice that something similar happens with βc if you compare ACE_XYZ and ACE_YZ.
The comparison of ACE_Y with ACE_XYZ concerns both βa and βe. However, if you compare ACE_Y with ACE_YZ (i.e., you test only βe), you still get a difference of 10.61, which for chi-square on 1df is indeed significant at an alpha of 0.05. So I agree that something seems amiss here. My guess is that either your ACE_Y model did not converge, or that at least one of the confidence limits for βe in the ACE_YZ model did not converge.
If it's the case that ACE_Y did not converge, then try running it with `mxTryHardOrdinal()` instead of `mxRun()`, and see if it can reach a smaller fitfunction value. If it's the case that one of βe's confidence limits from ACE_YZ didn't converge, then that's something easy to diagnose. Make a new MxModel based on the fitted ACE_YZ model, fix βe to -1.05, and run it. Then, do the same thing again, but this time, instead fix βe to 0.42. If each confidence limit is correct, then the fitfunction value at each of the two new solutions should be about 3.84 higher than at the solution of the ACE_YZ model.
However, there could also be something interesting going on. I'd like to see the `summary()` output for your ACE_YZ model, as well as its CI details table. See here for guidance on how to save the table down as a .csv file.
BTW, what is your `mxVersion()` output?
Log in or register to post comments
In reply to convergence by AdminRobK
test convergence
My mxVersion() output is :
OpenMx version: 2.17.2 [GIT v2.17.2]
R version: R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32
Default optimizer: NPSOL
NPSOL-enabled?: Yes
OpenMP-enabled?: No
1. use OpenMx version: 2.17.2
(1) optimizer ---SLSQP
a. test whether ACE_Y coverged or not
I have run all models with mxTryHardOrdinal(), all of the -2LL value and the AIC value were the same as the result of mxRun(). The result of ACE_Y model in mxTryHardOrdinal() was almost the same as that in mxRun().
b. test whether βe's confidence limits in ACE_YZ converged
the -2LL value and the AIC value of ACE_YZ model in mxTryHardOrdinal() were the same as those in mxRun(). For ACE_YZ models, the -2LL and AIC did not change when the start value of βe was set to be 0.1, -1.05, 0.42. However, there was a difference of βe's point estimate between the start value -1.05 and 0.42 (-0.87 vs 0.24), although their CI was the same.
(2) optimizer ---NPSOL
I also tried to change the optimizer into NPSOL. All models were run with mxTryHardOrdinal(). Again, the -2LL and AIC did not change. The point estimate of βe in all ACE_YZ models was the same with those in SLSQP , but the CI was different---- with optimizer NPSOL, the CI of βe did not contain 0 in all models. It is worth noting that when the start value of βe was set 0.42, the point estimate of βe was positive, while when the start value was set 0.1 or -1.05, the point estimate was negative which could lead to different interpretation of the moderation effect.
2. try openmx 2.17.4 in R 4.0.2
The result of openmx 2.17.4 with mxRun() is almost the same with that of openmx 2.17.2. As for βe, -2LL and AIC, they are just the same.
Sorry, I don’t know how to get useful information from modelsumm$CIdetail right now. All results are in the supplementary materials.
Professor Robert, what do you think of the result? Could you please give some advise? Thank you so much!
Log in or register to post comments
In reply to test convergence by Chen SJ
fixed value
NaMEffectsModel2 <- omxSetParameters(NaMEffectsFit, labels=c('eM11'), values=-1.05, free=F,name="ACE_YZ2")
NaMEffectsFit2 <- mxRun(NaMEffectsModel2)
Then, compare the -2LLs of `NaMEffectsFit` and `NaMEffectsFit2`. Is the latter's fit about 3.841 higher than the former's?
Log in or register to post comments
In reply to fixed value by AdminRobK
diffLL is smaller than 3.841 but very close
I have tried again according to your suggestions. Here is the result of using NPSOL optimizer and mxTryHardOrdinal. (BTW,the results of SLQP optimizer and mxRun are almost the same)
base comparison ep minus2LL df AIC diffLL diffdf p 9 4294.340 34022 -63749.66 NA NA NA
1 NaMEffects
2 NaMEffects NaMEffects2 8 4298.175 34023 -63747.83 3.834572 1 0.05020582
3 NaMEffects NaMEffects3 8 4298.039 34023 -63747.96 3.699165 1 0.05443970
NaMEffects means the start value of βe was set to 0.1, free=T
NaMEffects2 means the start value of βe was fixed to -1.05
NaMEffects3 means the start value of βe was fixed to 0.42.
As you can see, the diffLL between NaMEffects and NaMEffects2 is little smaller than 3.841, but very colse, so is the diffLL between NaMEffects and NaMEffects2.
Professor Robert, what do you think of the result, especially in my former reply, using NPSOL optimizer, the CI of βe does not contain 0? Thank you very much!
Log in or register to post comments
In reply to diffLL is smaller than 3.841 but very close by Chen SJ
OK, then it looks as though
Try validating the upper limit of -0.73, just as you did with the confidence limits that SLSQP found. How large is your sample, anyhow? I'd also like to see the standard errors of the free parameters from your attempts to fit the model.
I have a few other remarks as well. First, the parameterization you're using may be making interpretation more difficult than it needs to be. Specifically, each variance component is quadratic in its corresponding free parameters and in the moderator. That's a useful thing if the moderator is continuous, but it doesn't really add anything if the moderator is dichotomous. You could try making the variance components linear in the moderator and in their free parameters. The only downside there, and the reason people typically don't do that, is that your model may end up predicting negative variance components. Just a thought; it might not be worth the trouble.
Also, you should probably be using the so-called "full bivariate" form of the moderation model, under which the moderator is itself biometrically decomposed.
Log in or register to post comments
In reply to OK, then it looks as though by AdminRobK
The upper limit seems fine
(1) The upper limit of -0.73 seems OK. Here is the result:
base comparison ep minus2LL df AIC diffLL diffdf p
1 NaMEffects
2 NaMEffects NaMEffects2 8 4297.926 34023 -63748.07 3.586121 1 0.05826409
(2) My sample size
There are 11301 MZ twin pairs and 5714 DZ twin pairs in my analysis. Though the sample size seems large, there are only 403 persons with T2DM in MZ, 166 persons with T2DM in DZ.
(3)"You could try making the variance components linear in the moderator and in their free parameters". Sorry, I'm quite new for SEM. Do you mean that the moderator directly moderates the variance components(ie, A C E), not through the path coefficient(a, c, e)?
(4) full bivariate moderation
Yes, I have tried the full bivariate moderation model. The results are even more weird. There is a warning in my result: "The model does not satisfy the first-order optimality conditions to the required accuracy..." From the forums, I know that the warning is quite common for categorical variables. The weird thing is that I ran the model twice without any change, but the result was quite different, not only the point estimate, but also the -2LL . Yes, I have set set.seed() with the same number and used mxTryHardOrdinal() with extraTries = 20. Maybe the model chose the different "best fit model" in the twice running.
I have seen the discussion about the bivariate moderation model for the binary variables from the forums. And, I also saw your reply to Mr iloo with the word that "it appears to be inherently difficult to use finite-differences gradient-based optimization with likelihood functions that are defined in terms of the multivariate-normal (and multivariate-t!) probability integral, applied to low-prevalence dichotomous variables." So, I'm wondering whether my binary data fits the bivariate moderation model or not. Is the result of bivariate moderation model with my data reliable?
Thank you so much professor Robert! I'm looking forward for your reply.
Log in or register to post comments
In reply to The upper limit seems fine by Chen SJ
remarks
Basically, yes. Consider the additive-genetic variance component. Currently, it's equal to (a + βₐx)², or a² + 2aβₐx + βₐ²x², where x represents the moderator. If x is 0, then the variance component is just a². If x is 1, then the variance component is a² + 2aβₐ + βₐ². Since the moderator only has two possible values, the variance component only has two possible values, and you don't really gain anything from having the variance component be quadratic in the moderator or coefficients. You could just instead define the variance component as a + βₐx.
iloo's difficulty was computational, not statistical. I would not conclude that the full bivariate moderation model fits your data poorly merely because of similar computational difficulties. If you want, you could try using a custom compute plan with Nelder-Mead, as I describe in iloo's thread.
So something interesting IS happening here. The comparison of ACE_YZ with ACE_Y makes it clear that the data are not consistent with zero moderation of _E_. And yet, there are valid upper 95% confidence limits on opposite sides of zero! T̶h̶e̶ ̶w̶e̶i̶r̶d̶ ̶t̶h̶i̶n̶g̶ ̶i̶s̶,̶ ̶t̶h̶o̶u̶g̶h̶,̶ ̶I̶ ̶c̶a̶l̶c̶u̶l̶a̶t̶e̶ ̶f̶r̶o̶m̶ ̶t̶h̶e̶ ̶i̶n̶f̶o̶r̶m̶a̶t̶i̶o̶n̶ ̶i̶n̶ ̶t̶h̶e̶ ̶'̶C̶I̶ ̶d̶e̶t̶a̶i̶l̶s̶'̶ ̶t̶a̶b̶l̶e̶s̶ ̶t̶h̶a̶t̶ ̶t̶h̶e̶ ̶m̶o̶d̶e̶l̶ ̶w̶o̶u̶l̶d̶ ̶p̶r̶e̶d̶i̶c̶t̶ ̶a̶ ̶*̶n̶e̶g̶a̶t̶i̶v̶e̶*̶ ̶_̶E̶_̶ ̶v̶a̶r̶i̶a̶n̶c̶e̶ ̶a̶t̶ ̶N̶P̶S̶O̶L̶'̶s̶ ̶s̶o̶l̶u̶t̶i̶o̶n̶ ̶o̶f̶ ̶-̶0̶.̶7̶3̶,̶ ̶b̶u̶t̶ ̶n̶o̶t̶ ̶a̶t̶ ̶S̶L̶S̶Q̶P̶'̶s̶ ̶s̶o̶l̶u̶t̶i̶o̶n̶ ̶o̶f̶ ̶0̶.̶4̶2̶:̶
ee <- function(a,b){
out <- c(NA_real_,NA_real_)
out[1] <- (a^2) #<--E variance component for non-smokers
out[2] <- (a+b)^2 #<--E variance component for smokers
return(out)
}
ee(0.30,-0.73)
ee(0.29,0.42)
That leads me to ask, how do you have your moderator variable coded? As 0 for non-smoker and 1 for smoker? And, what type of data is it; did you convert it into an MxFactor? Try running `str()` on your dataset.
Edit: syntax
Log in or register to post comments
In reply to remarks by AdminRobK
reply
Hello professor Robert, Thanks for your reply.
(1) define the variance component as a + βₐx
I think I have gotten what you means. It sounds theoretically reasonable to make the variance components linear in the moderator, but I have not seen any literature using that model right now. Besides, achieving this attempt through sytax is a little difficult for me. Maybe I'll try it after completing the univariate and bivariate moderation model put forward by Purcell .
(2) try using a custom compute plan
Thank you for your suggestion. I'm trying that. BTW, I'm just wondering whether the bivariate moderation model fits the situation that both the moderator and trait are categorial, OR the arithmetic is accurate enough to find the optimal and reliable result.
(3) predict a negative E variance at NPSOL's solution of -0.73
Emm, should the formula be "out[2] <- (a^2) + (2*a*b) + (b^2)" when calculate E variance component for smokers? Why you use "out[2] <- (a^2) + (2*a*b) * (b^2)"? I'm confused about that. Since the E variance component is calculated by the expression=(e+ Mod1%*%eM) %*% t(e+ Mod1%*%eM), it can not be negative, can't it?
(4) variable types
Yes, I code my moderator as 0 for non-smoker and 1 for smoker. The moderator is a numberical variable according to your suggetion for yaning(https://openmx.ssri.psu.edu/node/4548). I convertd the trait(T2DM) into a MxFactor, not the moderator. Here is the result of str(dataMZ)
Formal class 'MxDataStatic' [package "OpenMx"] with 19 slots
..@ observed :'data.frame': 11301 obs. of 8 variables:
.. ..$ smk_cur1: num [1:11301] 1 0 0 0 0 0 0 1 0 0 ...
.. ..$ T2DM1 : Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 1 1 ...
.. .. ..- attr(*, "mxFactor")= logi TRUE
.. ..$ smk_cur2: num [1:11301] 1 0 0 0 0 0 0 1 0 0 ...
.. ..$ T2DM2 : Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 1 1 ...
.. .. ..- attr(*, "mxFactor")= logi TRUE
.. ..$ sex1 : num [1:11301] 1 2 2 2 2 2 2 1 2 1 ...
.. ..$ age1 : num [1:11301, 1] 0.938 -0.202 1.552 1.025 0.412 ...
.. ..$ sex2 : num [1:11301] 1 2 2 2 2 2 2 1 2 1 ...
.. ..$ age2 : num [1:11301, 1] 0.938 -0.202 1.552 1.025 0.412 ...
Maybe I should convert the moderator into an MxFactor, and I have tried that. The result is in the supplementary material. It seems that the -2LL and AIC did not change, but the point estimate and CI of parameters changed a lot. So, should I convert the moderator into an MxFactor? Moreover, should I set the threshold for the moderator?
Thanks a lot! Looking forward for your reply.
Log in or register to post comments
In reply to reply by Chen SJ
my mistake; MxFactor
You're right. My mistake. It should just be `(a+b)^2`.
No. I asked about that to make sure you *hadn't* turned it into an MxFactor. If you do, then it is no longer coded as 0 & 1, but as 1 & 2, which makes interpretation more difficult. See this post in yaning's thread. If you do the "full bivariate" moderation model, you will need to have the smoking variable twice in your dataset: once as an MxFactor, to be analyzed as an endogenous variable along with T2DM, and again as a numeric variable, to be used as a "definition variable". Likewise, you would only need to worry about the threshold of the moderator if you do the full bivariate model.
Log in or register to post comments
In reply to my mistake; MxFactor by AdminRobK
How to choose the result
Let's come back to the result of univariate moderation result.
1. How should I choose the optimal model? Should I choose the model with smallest AIC and the goodness of fit not worse than the full model, ignoring whether coefficients are significant or not in the nested models? Then, just interprtate the result of the optimal model?
2. Since the significance of βe is different between the optimizer of SLSQP and NPSOL, which one should I choose?
Thank you for your patient explanation! Looking forward for your reply.
Log in or register to post comments
In reply to How to choose the result by Chen SJ
results
If you want to report results from only one model, that sounds reasonable to me, although I recommend ignoring pairwise model-comparison chi-square tests and focusing only on AIC. If you want to report results from more than one model, see the documentation for
mxModelAverage()
and the references therein.I brought up this problem during the OpenMx Development Team meeting last week. Since the AIC-best model has βe free, you at least have some evidence that forcing βe to be zero adversely affects model quality, so I think you can reasonably conclude that there is moderation of the nonshared-environmental variance. The two different optimizers are giving you differently signed point estimates and different confidence intervals. And yet, you've validated (at least some of) those confidence limits, and ACE_XZ has the same -2logL at the differing point estimates. The other developers and I concluded that there is some kind of "symmetry" of fit w/r/t βe, probably centered at some negative number that is close to zero. Again, I think your choice of parameterization is making your results unnecessarily difficult to interpret. Please consider parameterizing the variance components as linear rather than quadratic in the moderator and free parameters, as I previously suggested. Your moderator takes only 0 and 1 as values; you gain nothing by making the variance components quadratic in the moderator, and you make interpreting your results for the free parameters more complicated.
Log in or register to post comments
In reply to results by AdminRobK
Thank you so much
Log in or register to post comments