You are here

Including covariates to saturated model

8 posts / 0 new
Last post
Aurina's picture
Offline
Joined: 01/05/2016 - 01:12
Including covariates to saturated model

Hello,

I'm trying to run an ACE model for some imaging-derived measures and hope you could help me to solve some of the issues.

Initially, I am aiming to run the ACE model and test it against the saturated model to evaluate if fitting a twin model is appropriate for the data. In the ACE model I used sex and age as covariates, however, I'm struggling to incorporate those covariates in the saturated model. So the main question is how to appropriately incorporate covariates in the saturated model? (please find my script attached (data used is as presented in scatter plots - 60 DZ and 117 MZ pairs)).

In addition, when running the ACE model I quite often get a warning message 5 or 6, e.g.
"In model 'twinACE' Optimizer returned a non-zero status code 5. The Hessian at the solution does not appear to be convex. See ?mxCheckIdentification for possible diagnosis (Mx status RED)."
I tried to re-run the model twice by inputting the output of one run to the next run to resolve this issue, but it helped only in a few cases.
Using this data presented in the scatterplot it gives code 5 warning when fitting the saturated model. Is it likely that the data is problematic? or I'm running the model incorrectly?

I would very much appreciate your help.

Best regards,
Aurina

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
Adding covariates to your

Adding covariates to your saturated model should be pretty straightforward. For instance, try:

MZsat <- mxModel(
    "MZsat",
    mxMatrix( type = "Full", nrow=1, ncol=2, free=T, c(0,0), labels=c("b0_mz1","b0_mz2"), name="Intercepts" ),
    mxMatrix( type="Full", nrow=1, ncol=2, free=TRUE, values= 0, label=c("betaAge","betaSex"), name="beta"),
    mxMatrix( type="Full", nrow=2, ncol=2, free=F, 
        label=c("data.ageT1MZ","data.sexT1MZ","data.ageT2MZ","data.sexT2MZ"), name="MZDefVars"),
    mxAlgebra( expression=Intercepts + beta %*% MZDefVars, name="expMeanMZ"),
    mxMatrix( type = "Lower", nrow=2, ncol=2, free=T, .5, name="CholMZ" ),
    mxAlgebra( CholMZ %*% t(CholMZ), name="expCovMZ"),
    mxData( mzData_measure, type="raw"), mxFitFunctionML(),
    mxExpectationNormal( "expCovMZ", "expMeanMZ", mylabels))
 
DZsat <- mxModel(
    "DZsat",
    mxMatrix( type = "Full", nrow=1, ncol=2, free=T, c(0,0), labels=c("b0_dz1","b0_dz2"), name="Intercepts" ),
    mxMatrix( type="Full", nrow=1, ncol=2, free=TRUE, values= 0, label=c("betaAge","betaSex"), name="beta"),
    mxMatrix( type="Full", nrow=2, ncol=2, free=F, 
        label=c("data.ageT1DZ","data.sexT1DZ","data.ageT2DZ","data.sexT2DZ"), name="DZDefVars"),
    mxAlgebra( expression=Intercepts + beta %*% DZDefVars, name="expMeanDZ"),
    mxMatrix( type = "Lower", nrow=2, ncol=2, free=T, .5, name="CholDZ" ),
    mxAlgebra( CholDZ %*% t(CholDZ), name="expCovDZ"),
    mxData( dzData_measure, type="raw"), mxFitFunctionML(),
    mxExpectationNormal( "expCovDZ", "expMeanDZ", mylabels))
 
mytwinSatModel <- mxModel(model='twinSat', MZsat, DZsat, mxFitFunctionMultigroup(c('MZsat', 'DZsat'))) 

This code will estimate 4 regression intercepts (MZ twin #1, MZ twin #2, DZ twin #1, DZ twin #2) and 2 regression slopes (1 each for age and sex). Depending on how saturated you want the model to be, you could estimate as many as 4 intercepts, 4 slopes for age, and 4 slopes for sex.

In addition, when running the ACE model I quite often get a warning message 5 or 6, e.g.
"In model 'twinACE' Optimizer returned a non-zero status code 5. The Hessian at the solution does not appear to be convex. See ?mxCheckIdentification for possible diagnosis (Mx status RED)."
I tried to re-run the model twice by inputting the output of one run to the next run to resolve this issue, but it helped only in a few cases.
Using this data presented in the scatterplot it gives code 5 warning when fitting the saturated model. Is it likely that the data is problematic? or I'm running the model incorrectly?

You could try mxCheckIdentification(), but I don't see any reason to think that there are unidentified parameters. I suggest that you try better start values. You can get good start values for the regression coefficients from using lm() in R, and I suspect that sqrt((CovMZ[1,1]/3)) or sqrt((CovMZ[2,2]/3)) will be better start values for "a", "c", and "e" than sqrt((CovDZ[1,2]/3)). Also, consider using mxTryHard() in place of mxRun().

Edit: Changed "MZ" to "DZ" in copy-pasted syntax.

Aurina's picture
Offline
Joined: 01/05/2016 - 01:12
Thank you very much for your

Thank you very much for your help and such a quick answer!

I have tried to modify the script according to your suggestions and added the covariates to the saturated model and ran mxCheckIdentification for both saturated and ACE model. In the saturated model there were 6 unidentified parameters ("b0_mz1" "b0_mz2" "betaAge" "betaSex" "b0_dz1" "b0_dz2") while in the ACE there were 2 ("mean" "betaAge") - full function output is attached. I can't figure out why these parameters would be not identified in both models... Any thoughts how to solve this issue?

Also changed the a,c,e start values and tried mxTryHard function, however, the solution still seems not to be convex.
Attached is the modified version of the same code (the one that gives unidentified parameters).

Aurina's picture
Offline
Joined: 01/05/2016 - 01:12
General questions about saturated models

I haven't really worked with twin modeling before, therefore, while trying several different things, some general questions arose. It would really help me if you could point me in the right directions. Probably these questions are really basic, but some clarity would really help.

  1. The idea of my experiment is to estimate the heritability of multiple imaging-derived measures using ACE and it's nested models (AE, CE, AC). In this case, do I first need to run the saturated model on each imaging-derived measure and compare it outputs to ACE model? If so, if I'm using ACE with covariates, should the same covariates be used in the saturated model?

  2. From reading in forums it seems like the correct way test the data is first to run the saturated model and then constrain means (and variances) for twin1/twin2 and zygosity separately and only if constrained models are not different from the original saturated model, continue with ACE. Is this correct? would it be wrong just to compare a saturated model to ACE and if there is no significant difference, start testing nested ACE models?

  3. What if the saturated model is not locally defined? As in my previous post, there are 6 parameters in the saturated model that are not defined. In addition to that, it gives warning 6 ("The model does not satisfy the first-order optimality conditions to the required accuracy, and no improved point for the merit function could be found during the final linesearch (Mx status RED) "). Does this mean that something is wrong with the data (another example of the data is attached as a scatter plot Rplot_data.png) - the output of the model is in saturetedWITHcovariates.png.

  4. In addition, what if the results from model comparisons are significantly different when using mxTryHard() instead of mxRun()? For example, comparing a saturated model with no limitations to one where means are set to be equal? Should the result from mxTryHard() trusted more?

  5. If the saturated model with covariates needs to be used and checked against constrained saturated models (for mean and variance equality), what modifications to the attached script should be made to allow changes in mean and variance definitions when covariates are included (saturatedWITHOUTcovariates.R). At the moment comparisons can be performed without including the covariates. Script named saturatedWITHcovariates.R runs with covariates but does not test for constrained saturated models?

Any help would be really appreciated. Would be great if you could also point me towards some reading (a step-by-step guide with some explanations would be amazing) that could help with those general questions.
Thank you very much!

Best,
Aurina

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
some answers

First of all--two things before I forget:
1. What is your mxVersion() output?
2. What are the measuring units of age in your dataset? If age is measured in years, try dividing it by 100 to convert it to centuries. That's an old trick that can help make the optimization problem more numerically stable. You might also want to switch optimizer, to SLSQP or (if your OpenMx build has it) NPSOL, e.g. mxOption(NULL,"Default optimizer","SLSQP")

The idea of my experiment is to estimate the heritability of multiple imaging-derived measures using ACE and it's nested models (AE, CE, AC). In this case, do I first need to run the saturated model on each imaging-derived measure and compare it outputs to ACE model? If so, if I'm using ACE with covariates, should the same covariates be used in the saturated model?

You are interested in the variance decomposition of each trait, so yes, be sure that the mean of the trait is conditioned on the same covariates in the models you compare to one another.

From reading in forums it seems like the correct way test the data is first to run the saturated model and then constrain means (and variances) for twin1/twin2 and zygosity separately and only if constrained models are not different from the original saturated model, continue with ACE. Is this correct? would it be wrong just to compare a saturated model to ACE and if there is no significant difference, start testing nested ACE models?

I consider the primary purpose of fitting the saturated and cross-constrained models to be detection of data-handling errors, and non-error quirks in the dataset (such as outliers). I recommend you fit all of the models that you think are a priori plausible, and compare their AICs. Using "stepwise" likelihood-ratio tests for model selection is not advised. If AIC favors the saturated or cross-constrained models over the ACE model for a trait, you'll need to explain why that might be in the ensuing paper or presentation; your subject-matter knowledge about the trait might help you with that. In that case, you could still report a heritability estimate from the biometric model with the smallest AIC.

What if the saturated model is not locally defined? As in my previous post, there are 6 parameters in the saturated model that are not defined. In addition to that, it gives warning 6 ("The model does not satisfy the first-order optimality conditions to the required accuracy, and no improved point for the merit function could be found during the final linesearch (Mx status RED) "). Does this mean that something is wrong with the data (another example of the data is attached as a scatter plot Rplot_data.png) - the output of the model is in saturetedWITHcovariates.png.

It could simply be that the model appears locally unidentified at the start values. Try changing start values of zero to something better. Also, my suggestions to re-scale the age variable, use better start values, and use mxTryHard() are meant to help with your optimization difficulties (status code 6). Beyond that, I can't really be sure. I'll look again at your script to see if I can spot any possible source of unidentification.

Should the result from mxTryHard() trusted more?

Yes. mxTryHard() makes multiple attempts to optimize the model, and randomly perturbs the start values between attempts. mxRun() (which is called by mxTryHard()) only makes one attempt. Therefore, mxTryHard() should give better results.

Aurina's picture
Offline
Joined: 01/05/2016 - 01:12
Thank you so much for your

Thank you so much for your help!

  1. mxVersion output is 2.8.3 (full output is attached in the mxVersion.png).
    This version doesn't have NPSOL optimizer. Is it worth updating the version to use it?

  2. age was in years, I've changed it to centuries - thanks!

  3. You were saying "...I recommend you fit all of the models that you think are a priori plausible, and compare their AICs. Using "stepwise" likelihood-ratio tests for model selection is not advised. If AIC favors the saturated or cross-constrained models over the ACE model for a trait, you'll need to explain why that might be in the ensuing paper or presentation; your subject-matter knowledge about the trait might help you with that. In that case, you could still report a heritability estimate from the biometric model with the smallest AIC".

Please correct me if I'm wrong, but does that mean that in my case I would need to run:
Saturated model without constraints (if the assumption is that data doesn't have outliers and is acquired from the same population - in my study all are young adults, there are no expected differences in variance or mean of the measure based on the population)
and
ACE model + nested models.
And instead of comparing them using mxCompare and evaluating p values, I would need to directly look at AIC values and choose the one with the smallest value? It seems like the saturated model is by definition the one with the lowest AIC as we don't put any constraints on it, so it can fit the data best. How should be AIC values interpreted in this case?

  1. In regards to the constrained saturated models, do I still need to test them and consider their AIC values when choosing the best model to fit the data if there is no reason to expect mean/variance differences in the data and I'm interested in the ACE model fit? If they still need to be evaluated, is there a way to do that using Cholesky parametrisation (I noticed that changing the parametrisation ended up giving warning 6 when Cholesky parametrisation on the saturated model was settling on the solution). An also, how should AIC values then should be compared/interpreted if all those models are considered: saturated(unconstrained), satudated(constrained), ACE, AE, AC, EA.

  2. In addition to all those questions, I've noticed that when running the ACE model (where standardized additive genetic variance = 0.44, unique environmental variance = 0.53, shared environmental variance = almost zero) and its submodels AE and CE models were running fine using mxTryHard. However, AC model is failing to run and gives an an error ("All fit attempts resulted in errors - check starting values or model specification"). Does it fail because the value for the unique environmental variance is the highest (0.53) and excluding its contribution the model cannot be defined? Considering that additive genetic variance also had a high value of 0.44, shouldn't it also happen in the CE model then? The current version of the script is attached (ACE_wcovariates_Saturated_nested.R). Is this order of steps in the script correct?

Thank you very much for your help - I really appreciate it!
Best,
Aurina

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
still more answers
1. mxVersion output is 2.8.3 (full output is attached in the mxVersion.png).
This version doesn't have NPSOL optimizer. Is it worth updating the version to use it?

It's a good idea to update to the newest release, which you can get either from CRAN or from our package repository. You have to install from our package repository if you want NPSOL, though.

2. age was in years, I've changed it to centuries - thanks!

Did it make a difference?

3. You were saying "...I recommend you fit all of the models that you think are a priori plausible, and compare their AICs..."

My main point here is that likelihood-ratio testing is a poor way to do model-selection, since it only compares two models at a time: a model and a nested submodel. It's possible for, say, a test of an ACE model against a CE model to imply conclusions that differ from a test of an AE model against an E model, and I know from experience that tends to confuse novices. Using the LRT for model-selection is even more objectionable if there is no correction for multiple comparisons, and if the set of models under consideration is dependent upon the results of previous comparisons (in a "stepwise" manner). In contrast, AIC can be used to simultaneously compare any number of models to one another, and they need not be a sequence of nested submodels.

It seems like the saturated model is by definition the one with the lowest AIC as we don't put any constraints on it, so it can fit the data best. How should be AIC values interpreted in this case?

No. AIC = -2logL + 2k, where k is the number of free parameters. Simplistically, AIC is a "penalized fit index," that penalizes model fit to the data (-2logL) proportionally to how many parameters the model needs to estimate from the data. At a deeper level, AIC is an approximation to model-selection via cross-validation. I refer you to a relevant publication of mine, which discusses the theory underlying AIC, and explains my perspective on model-selection. BTW, in the next release of OpenMx, there will be functions to automate model-averaging and multimodel inference.

4. In regards to the constrained saturated models, do I still need to test them and consider their AIC values when choosing the best model to fit the data if there is no reason to expect mean/variance differences in the data and I'm interested in the ACE model fit?

Like I said, the saturated and cross-constrained models are often mainly of interest to detect data-handling errors and legitimate data quirks. They're often not of substantive interest themselves, and it sounds like that's the case for you. Personally, I don't consider them necessary, but I don't see any harm in fitting them, and a lot of people do fit them as a matter of routine. Reviewers of your manuscript may want to know how well the ACE model fits in comparison to the saturated model, though, even if you end up reporting results from one of the biometrical models.

If they still need to be evaluated, is there a way to do that using Cholesky parametrisation (I noticed that changing the parametrisation ended up giving warning 6 when Cholesky parametrisation on the saturated model was settling on the solution).

Yes. It will require an MxConstraint, though.

n addition to all those questions, I've noticed that when running the ACE model (where standardized additive genetic variance = 0.44, unique environmental variance = 0.53, shared environmental variance = almost zero) and its submodels AE and CE models were running fine using mxTryHard. However, AC model is failing to run and gives an an error ("All fit attempts resulted in errors - check starting values or model specification"). Does it fail because the value for the unique environmental variance is the highest (0.53) and excluding its contribution the model cannot be defined? Considering that additive genetic variance also had a high value of 0.44, shouldn't it also happen in the CE model then? The current version of the script is attached (ACE_wcovariates_Saturated_nested.R). Is this order of steps in the script correct?

The AC model is a complete non-starter in biometrical variance-components analysis via maximum-likelihood. The AC model implies that the MZ covariance equals the phenotypic variance. That makes the model-expected MZ covariance matrix singular, i.e. not a valid covariance matrix, at every value of the free parameters.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
more answers
Would be great if you could also point me towards some reading (a step-by-step guide with some explanations would be amazing) that could help with those general questions.

Maybe look around in the materials from the Boulder Workshop from earlier this year? There's also the OpenMx User Guide.

Anyhow, I don't see that any of your saturated model's parameters are unidentified. It's possible that some of them could be empirically unidentified due to limitations of your dataset, but I rather doubt that--it's not a complicated model, and doesn't have that many free parameters. I think mxCheckIdentification()'s results are spurious, and probably due to the start values.

If the saturated model with covariates needs to be used and checked against constrained saturated models (for mean and variance equality), what modifications to the attached script should be made to allow changes in mean and variance definitions when covariates are included (saturatedWITHOUTcovariates.R). At the moment comparisons can be performed without including the covariates. Script named saturatedWITHcovariates.R runs with covariates but does not test for constrained saturated models?

Try this (note that it uses a different parameterization of the covariance matrices):

MZsat <- mxModel(
    "MZsat",
    mxMatrix( type = "Full", nrow=1, ncol=2, free=T, c(0,0), labels =c("b0_mz1","b0_mz2"), name="Intercepts" ),
    mxMatrix( type="Full", nrow=1, ncol=2, free=TRUE, values= 0, labels=c("betaAge","betaSex"), name="beta"),
    mxMatrix( type="Full", nrow=2, ncol=2, free=F, labels=c("data.ageT1MZ","data.sexT1MZ","data.ageT2MZ","data.sexT2MZ"), name="MZDefVars"),
    mxAlgebra( expression=Intercepts + beta %*% MZDefVars, name="expMeanMZ"),
    mxMatrix(type="Symm",nrow=2,free=T,values=c(0.25,0,0.25),labels=c("mzv1","mzc","mzv2"),lbound=c(0.0001,NA,0.0001),name="expCovMZ"),
    mxData( myDataMZ, type="raw"), mxFitFunctionML(),
    mxExpectationNormal( "expCovMZ", "expMeanMZ", mylabels))
 
DZsat <- mxModel(
    "DZsat",
    mxMatrix( type = "Full", nrow=1, ncol=2, free=T, c(0,0), labels=c("b0_dz1","b0_dz2"), name="Intercepts" ),
    mxMatrix( type="Full", nrow=1, ncol=2, free=TRUE, values= 0, labels=c("betaAge","betaSex"), name="beta"),
    mxMatrix( type="Full", nrow=2, ncol=2, free=F, labels=c("data.ageT1DZ","data.sexT1DZ","data.ageT2DZ","data.sexT2DZ"), name="DZDefVars"),
    mxAlgebra( expression=Intercepts + beta %*% DZDefVars, name="expMeanDZ"),
    mxMatrix(type="Symm",nrow=2,free=T,values=c(0.25,0,0.25),labels=c("dzv1","dzc","dzv2"),lbound=c(0.0001,NA,0.0001),name="expCovDZ"),
    mxData( myDataDZ, type="raw"), mxFitFunctionML(),
    mxExpectationNormal( "expCovDZ", "expMeanDZ", mylabels))
 
SatModel <- mxModel("twinSat", MZsat, DZsat, mxFitFunctionMultigroup(c('MZsat', 'DZsat')))
 
#Equate intercepts across twin order:
sub1 <- omxSetParameters(model=SatModel,labels=c("b0_mz1","b0_mz2"),newlabels="b0_mz",name="Submodel1")
sub1 <- omxSetParameters(model=sub1,labels=c("b0_dz1","b0_dz2"),newlabels="b0_dz",name="Submodel1")
 
#Equate intercepts across zygosity:
sub2 <- omxSetParameters(model=sub1,labels=c("b0_mz","b0_dz"),newlabels="b0",name="Submodel2")
 
#Equate variance across twin order:
sub3 <- omxSetParameters(model=sub2,labels=c("mzv1","mzv2"),newlabels="mzv",name="Submodel3")
sub3 <- omxSetParameters(model=sub3,labels=c("dzv1","dzv2"),newlabels="dzv",name="Submodel3")
 
#Equate variance across zygosity:
sub4 <- omxSetParameters(model=sub3,labels=c("mzv","dzv"),newlabels="v",name="Submodel4")