You are here

Constraining Components Across Gender Results in Significant Loss of Model Fit?

13 posts / 0 new
Last post
RFrank's picture
Offline
Joined: 06/08/2012 - 14:22
Constraining Components Across Gender Results in Significant Loss of Model Fit?

I'm running a simple ACE model of a continuous variable in openMX. I'm running both a constrained and sex-limited model to see which provides a better fit, and then running various constrained or sex-limited submodels where a or c are dropped.

Here is what I have noticed -- in a sex-limited submodel where c is dropped, the a and e variance components and confidence intervals are extremely similar across males and females -- AND, extremely similar to the components for a constrained model where c is dropped.

However, the sex-limited model is a SIGNIFICANTLY better fit. There is no comparison. It's waaaaay better, and yet, in reporting the variance components, it looks like it's about equal to the constrained model that's more parsimonious.

I'm having trouble wrapping my mind around this conceptually. Anyone have any insights? Would it be helpful to post my syntax?

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
The models' summary() output

The models' summary() output would probably be more informative, though the syntax might be helpful, too. If you post them, attach them to your post as text files, or at least use the <code> HTML tags if you include them in the body of your post.

RFrank's picture
Offline
Joined: 06/08/2012 - 14:22
Here it is

I do apologize for the delay - this is still a question I'm interested in. I've attached a textfile with the actual syntax for my models. In the body of my post are the summaries for my models:

twinACE2 = constrained across gender, full ACE
TwinAe2 = constrained across gender, drop C

twinACE3 = sex limited, full ACE
TwinAe3 = sex limited, drop C in both males and females

I also included a mxCompare for the TwinAe2Sum and the TwinAe3Sum, where TwinAe2 is much worse despite very similar estimates for the variance components.

> twinACE2Summ
data:
$MZm.data
     happy1             happy2        
 Min.   :-0.99600   Min.   :-1.16477  
 1st Qu.:-0.69904   1st Qu.:-0.70367  
 Median :-0.61150   Median :-0.61391  
 Mean   :-0.05889   Mean   :-0.02597  
 3rd Qu.: 0.06369   3rd Qu.: 0.20706  
 Max.   : 3.32268   Max.   : 3.34504  
 NA's   :59         NA's   :57        
 
$DZm.data
     happy1             happy2        
 Min.   :-0.84391   Min.   :-1.01634  
 1st Qu.:-0.68266   1st Qu.:-0.69592  
 Median :-0.60506   Median :-0.61179  
 Mean   : 0.07268   Mean   : 0.09249  
 3rd Qu.: 0.33279   3rd Qu.: 0.37600  
 Max.   : 3.27685   Max.   : 3.26455  
 NA's   :20         NA's   :19        
 
$MZf.data
     happy1            happy2        
 Min.   :-0.6727   Min.   :-0.74258  
 1st Qu.:-0.3339   1st Qu.:-0.33622  
 Median :-0.2724   Median :-0.28683  
 Mean   :-0.0255   Mean   :-0.00853  
 3rd Qu.:-0.2039   3rd Qu.:-0.19122  
 Max.   : 3.6351   Max.   : 3.60820  
 NA's   :46        NA's   :45        
 
$DZf.data
     happy1             happy2         
 Min.   :-0.77595   Min.   :-0.823770  
 1st Qu.:-0.34847   1st Qu.:-0.342329  
 Median :-0.28070   Median :-0.267554  
 Mean   : 0.05976   Mean   : 0.004863  
 3rd Qu.:-0.19968   3rd Qu.:-0.174856  
 Max.   : 3.62864   Max.   : 3.522407  
 NA's   :17         NA's   :23         
 
free parameters:
   name matrix row col      Estimate  Std.Error lbound ubound
1  am11 ACE.am   1   1  6.719720e-01 0.06941471              
2  cm11 ACE.cm   1   1  2.032210e-01 0.19949728              
3  em11 ACE.em   1   1  6.748012e-01 0.01795741              
4 meanM ACE.Mm   1   1 -9.251628e-05 0.03528906              
5 meanF ACE.Mf   1   1  1.757629e-03 0.03281190              
 
confidence intervals:
                   lbound   estimate    ubound
ACE.h2m[1,1] 2.910983e-01 0.47621338 0.5700096
ACE.c2m[1,1] 1.635266e-21 0.04355485 0.2052719
ACE.e2m[1,1] 4.299761e-01 0.48023178 0.5357458
ACE.h2f[1,1] 2.910983e-01 0.47621338 0.5700096
ACE.c2f[1,1] 1.635266e-21 0.04355485 0.2052719
ACE.e2f[1,1] 4.299761e-01 0.48023178 0.5357458
 
observed statistics:  2298 
estimated parameters:  5 
degrees of freedom:  2293 
-2 log likelihood:  6144.707 
saturated -2 log likelihood:  NA 
number of observations:  1292 
chi-square:  NA 
p:  NA 
Information Criteria: 
     df Penalty Parameters Penalty Sample-Size Adjusted
AIC:   1558.707           6154.707                   NA
BIC: -10282.223           6180.527             6164.644
CFI: NA 
TLI: NA 
RMSEA:  NA 
timestamp: 2015-05-11 17:29:11 
frontend time: 0.7330432 secs 
backend time: 10.84262 secs 
independent submodels time: 0 secs 
wall clock time: 11.57566 secs 
cpu time: 11.57566 secs 
openmx version number: 1.4-3060 
 
> twinACE3Summ
data:
$MZm.data
     happy1             happy2        
 Min.   :-0.99600   Min.   :-1.16477  
 1st Qu.:-0.69904   1st Qu.:-0.70367  
 Median :-0.61150   Median :-0.61391  
 Mean   :-0.05889   Mean   :-0.02597  
 3rd Qu.: 0.06369   3rd Qu.: 0.20706  
 Max.   : 3.32268   Max.   : 3.34504  
 NA's   :59         NA's   :57        
 
$DZm.data
     happy1             happy2        
 Min.   :-0.84391   Min.   :-1.01634  
 1st Qu.:-0.68266   1st Qu.:-0.69592  
 Median :-0.60506   Median :-0.61179  
 Mean   : 0.07268   Mean   : 0.09249  
 3rd Qu.: 0.33279   3rd Qu.: 0.37600  
 Max.   : 3.27685   Max.   : 3.26455  
 NA's   :20         NA's   :19        
 
$MZf.data
     happy1            happy2        
 Min.   :-0.6727   Min.   :-0.74258  
 1st Qu.:-0.3339   1st Qu.:-0.33622  
 Median :-0.2724   Median :-0.28683  
 Mean   :-0.0255   Mean   :-0.00853  
 3rd Qu.:-0.2039   3rd Qu.:-0.19122  
 Max.   : 3.6351   Max.   : 3.60820  
 NA's   :46        NA's   :45        
 
$DZf.data
     happy1             happy2         
 Min.   :-0.77595   Min.   :-0.823770  
 1st Qu.:-0.34847   1st Qu.:-0.342329  
 Median :-0.28070   Median :-0.267554  
 Mean   : 0.05976   Mean   : 0.004863  
 3rd Qu.:-0.19968   3rd Qu.:-0.174856  
 Max.   : 3.62864   Max.   : 3.522407  
 NA's   :17         NA's   :23         
 
The final iterate satisfies the optimality conditions to the accuracy requested, but the sequence of iterates has not yet converged. NPSOL was terminated because no further improvement could be made in the merit function (Mx status GREEN). 
 
free parameters:
   name matrix row col     Estimate  Std.Error lbound ubound
1  am11 ACE.am   1   1 8.366214e-01 0.04156916              
2  cm11 ACE.cm   1   1 5.351316e-07 0.38932071              
3  em11 ACE.em   1   1 7.824401e-01 0.02938613              
4  af11 ACE.af   1   1 4.062740e-01 0.09921859              
5  cf11 ACE.cf   1   1 3.784636e-01 0.09152652              
6  ef11 ACE.ef   1   1 5.708559e-01 0.02097595              
7 meanM ACE.Mm   1   1 4.382081e-04 0.04150923              
8 meanF ACE.Mf   1   1 8.345160e-04 0.02694799              
 
confidence intervals:
                   lbound     estimate    ubound
ACE.h2m[1,1] 3.230728e-01 5.334272e-01 0.6024054
ACE.c2m[1,1] 4.188448e-22 2.182420e-13 0.1816084
ACE.e2m[1,1] 3.975946e-01 4.665728e-01 0.5448360
ACE.h2f[1,1] 1.697180e-02 2.602750e-01 0.5137549
ACE.c2f[1,1] 2.252041e-03 2.258618e-01 0.4275499
ACE.e2f[1,1] 4.414490e-01 5.138632e-01 0.5965823
 
observed statistics:  2298 
estimated parameters:  8 
degrees of freedom:  2290 
-2 log likelihood:  5995.339 
saturated -2 log likelihood:  NA 
number of observations:  1292 
chi-square:  NA 
p:  NA 
Information Criteria: 
     df Penalty Parameters Penalty Sample-Size Adjusted
AIC:   1415.339           6011.339                   NA
BIC: -10410.099           6052.651             6027.239
CFI: NA 
TLI: NA 
RMSEA:  NA 
timestamp: 2015-05-11 17:33:11 
frontend time: 0.6750391 secs 
backend time: 17.71001 secs 
independent submodels time: 0 secs 
wall clock time: 18.38505 secs 
cpu time: 18.38505 secs 
openmx version number: 1.4-3060 
 
> TwinAe2Sum
data:
$MZm.data
     happy1             happy2        
 Min.   :-0.99600   Min.   :-1.16477  
 1st Qu.:-0.69904   1st Qu.:-0.70367  
 Median :-0.61150   Median :-0.61391  
 Mean   :-0.05889   Mean   :-0.02597  
 3rd Qu.: 0.06369   3rd Qu.: 0.20706  
 Max.   : 3.32268   Max.   : 3.34504  
 NA's   :59         NA's   :57        
 
$DZm.data
     happy1             happy2        
 Min.   :-0.84391   Min.   :-1.01634  
 1st Qu.:-0.68266   1st Qu.:-0.69592  
 Median :-0.60506   Median :-0.61179  
 Mean   : 0.07268   Mean   : 0.09249  
 3rd Qu.: 0.33279   3rd Qu.: 0.37600  
 Max.   : 3.27685   Max.   : 3.26455  
 NA's   :20         NA's   :19        
 
$MZf.data
     happy1            happy2        
 Min.   :-0.6727   Min.   :-0.74258  
 1st Qu.:-0.3339   1st Qu.:-0.33622  
 Median :-0.2724   Median :-0.28683  
 Mean   :-0.0255   Mean   :-0.00853  
 3rd Qu.:-0.2039   3rd Qu.:-0.19122  
 Max.   : 3.6351   Max.   : 3.60820  
 NA's   :46        NA's   :45        
 
$DZf.data
     happy1             happy2         
 Min.   :-0.77595   Min.   :-0.823770  
 1st Qu.:-0.34847   1st Qu.:-0.342329  
 Median :-0.28070   Median :-0.267554  
 Mean   : 0.05976   Mean   : 0.004863  
 3rd Qu.:-0.19968   3rd Qu.:-0.174856  
 Max.   : 3.62864   Max.   : 3.522407  
 NA's   :17         NA's   :23         
 
free parameters:
   name matrix row col     Estimate  Std.Error lbound ubound
1  am11 ACE.am   1   1 0.7042320230 0.02411151              
2  em11 ACE.em   1   1 0.6722360079 0.01707699              
3 meanM ACE.Mm   1   1 0.0004698117 0.03518650              
4 meanF ACE.Mf   1   1 0.0019849251 0.03272445              
 
confidence intervals:
                lbound  estimate    ubound
ACE.h2m[1,1] 0.4706395 0.5232325 0.5715719
ACE.c2m[1,1] 0.0000000 0.0000000 0.0000000
ACE.e2m[1,1] 0.4284281 0.4767675 0.5293605
ACE.h2f[1,1] 0.4706395 0.5232325 0.5715719
ACE.c2f[1,1] 0.0000000 0.0000000 0.0000000
ACE.e2f[1,1] 0.4284281 0.4767675 0.5293605
 
observed statistics:  2298 
estimated parameters:  4 
degrees of freedom:  2294 
-2 log likelihood:  6144.962 
saturated -2 log likelihood:  NA 
number of observations:  1292 
chi-square:  NA 
p:  NA 
Information Criteria: 
     df Penalty Parameters Penalty Sample-Size Adjusted
AIC:   1556.962           6152.962                   NA
BIC: -10289.131           6173.618             6160.912
CFI: NA 
TLI: NA 
RMSEA:  NA 
timestamp: 2015-05-11 17:36:12 
frontend time: 0.804045 secs 
backend time: 4.429254 secs 
independent submodels time: 0 secs 
wall clock time: 5.233299 secs 
cpu time: 5.233299 secs 
openmx version number: 1.4-3060 
 
 
> TwinAe3Sum
data:
$MZm.data
     happy1             happy2        
 Min.   :-0.99600   Min.   :-1.16477  
 1st Qu.:-0.69904   1st Qu.:-0.70367  
 Median :-0.61150   Median :-0.61391  
 Mean   :-0.05889   Mean   :-0.02597  
 3rd Qu.: 0.06369   3rd Qu.: 0.20706  
 Max.   : 3.32268   Max.   : 3.34504  
 NA's   :59         NA's   :57        
 
$DZm.data
     happy1             happy2        
 Min.   :-0.84391   Min.   :-1.01634  
 1st Qu.:-0.68266   1st Qu.:-0.69592  
 Median :-0.60506   Median :-0.61179  
 Mean   : 0.07268   Mean   : 0.09249  
 3rd Qu.: 0.33279   3rd Qu.: 0.37600  
 Max.   : 3.27685   Max.   : 3.26455  
 NA's   :20         NA's   :19        
 
$MZf.data
     happy1            happy2        
 Min.   :-0.6727   Min.   :-0.74258  
 1st Qu.:-0.3339   1st Qu.:-0.33622  
 Median :-0.2724   Median :-0.28683  
 Mean   :-0.0255   Mean   :-0.00853  
 3rd Qu.:-0.2039   3rd Qu.:-0.19122  
 Max.   : 3.6351   Max.   : 3.60820  
 NA's   :46        NA's   :45        
 
$DZf.data
     happy1             happy2         
 Min.   :-0.77595   Min.   :-0.823770  
 1st Qu.:-0.34847   1st Qu.:-0.342329  
 Median :-0.28070   Median :-0.267554  
 Mean   : 0.05976   Mean   : 0.004863  
 3rd Qu.:-0.19968   3rd Qu.:-0.174856  
 Max.   : 3.62864   Max.   : 3.522407  
 NA's   :17         NA's   :23         
 
The final iterate satisfies the optimality conditions to the accuracy requested, but the sequence of iterates has not yet converged. NPSOL was terminated because no further improvement could be made in the merit function (Mx status GREEN). 
 
free parameters:
   name matrix row col     Estimate  Std.Error lbound ubound
1  am11 ACE.am   1   1 0.8366215095 0.04156926              
2  em11 ACE.em   1   1 0.7824398828 0.02938617              
3  af11 ACE.af   1   1 0.5668069445 0.02706902              
4  ef11 ACE.ef   1   1 0.5590466843 0.01919378              
5 meanM ACE.Mm   1   1 0.0004380475 0.04150877              
6 meanF ACE.Mf   1   1 0.0019770976 0.02664580              
 
confidence intervals:
                lbound  estimate    ubound
ACE.h2m[1,1] 0.4551683 0.5334274 0.6024054
ACE.c2m[1,1] 0.0000000 0.0000000 0.0000000
ACE.e2m[1,1] 0.3975946 0.4665726 0.5448317
ACE.h2f[1,1] 0.4332632 0.5068925 0.5727278
ACE.c2f[1,1] 0.0000000 0.0000000 0.0000000
ACE.e2f[1,1] 0.4272722 0.4931075 0.5667368
 
observed statistics:  2298 
estimated parameters:  6 
degrees of freedom:  2292 
-2 log likelihood:  5999.27 
saturated -2 log likelihood:  NA 
number of observations:  1292 
chi-square:  NA 
p:  NA 
Information Criteria: 
     df Penalty Parameters Penalty Sample-Size Adjusted
AIC:    1415.27           6011.270                   NA
BIC:  -10420.50           6042.254             6023.195
CFI: NA 
TLI: NA 
RMSEA:  NA 
timestamp: 2015-05-11 17:37:36 
frontend time: 0.87605 secs 
backend time: 6.660381 secs 
independent submodels time: 0 secs 
wall clock time: 7.536431 secs 
cpu time: 7.536431 secs 
openmx version number: 1.4-3060 
 
 
> mxCompare(TwinAe3Fit, TwinAe2Fit)
     base comparison ep minus2LL   df      AIC   diffLL diffdf            p
1 Twin3AE       <NA>  6 5999.270 2292 1415.270       NA     NA           NA
2 Twin3AE    Twin2AE  4 6144.962 2294 1556.962 145.6922      2 2.308593e-32
AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
I think the explanation is

I think the explanation is that the raw variance components are simply larger for males than for females, to the degree that equating them to be equal induces substantial misfit. If you look at the point estimates from the sex-limited AE model, then (I'm assuming a single phenotype) the additive-genetic variance for males would be about 0.84^2 = 0.71, and the nonshared-environmental variance for males would be about 0.78^2 = 0.61. But for females, additive-genetic variance would be about 0.57^2 = 0.32 and the nonshared-environmental variance would be about 0.56^2 = 0.31.

On the other hand, the standardized variance components (the variance proportions) do appear to be similar for both sexes. For instance, in the sex-limited AE model, males would have a narrow-sense heritability of about 0.71/(0.71+0.61) = 0.54, and females would have a narrow-sense heritability of about 0.32/(0.32+0.31) = 0.51, which (allowing for rounding error) are close to the point estimates of your MxAlgebras, displayed in the output for the confidence intervals.

So, the standardized variance components are similar by sex, but the raw variance components differ by sex; in other words, heritability is similar by sex, but males have greater overall phenotypic variance.

BTW, I notice that your summary() output says you're using OpenMx version 1.4-3060. Does mxVersion() also say you have version 1.4-3060, and if so, are you using R version 3.1.0 or later? If yes, you should DEFINITELY upgrade your version of OpenMx to version 1.4-3475, or better yet, version 2.0 or 2.0.1. There's a reason we have a yellow-highlighted warning about this on the front page of the OpenMx website.

RFrank's picture
Offline
Joined: 06/08/2012 - 14:22
Ah, I see! Thank you for

Ah, I see! Thank you for this very cogent explanation!

Yes, I recently noticed this warning, and much appreciate the heads up. I noticed this yesterday and I think I'm using R 3.0.3 -- that should mean that my output is okay, right? But I need to go ahead and upgrade both OpenMX and R to the latest versions. (Does Open Mx 2.0 run with versoins of R lower than 3.1?)

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
It's OK then

If you're using OpenMx 1.4-3060 with R 3.0.3, you should be fine. If you want to upgrade to OpenMx 2.0.x, you'll be able to, since it's compatible with (I think) R 3.0.2 and later.

RFrank's picture
Offline
Joined: 06/08/2012 - 14:22
Ah, thank you! Looks like we

Ah, thank you! Looks like we were typing replies at the same time.

RFrank's picture
Offline
Joined: 06/08/2012 - 14:22
A follow-up question as I

A follow-up question as I contemplate on this: is it clearly correct to say that it would be better to present a univariate ACE model where gender isn't addressed at all rather than a constrained one that is substantially misfit? I think the answer would be clearly yes. And then would it be fair to say that it's not as unequivocally better to present a sex-limited ACE than a univarate ACE that doesn't address gender at all? Unlike a constrained model with a very poor fit, it's not that the narrow-sense heritability estimates in this generic univariate ACE should be questioned or that they necessarily disproportionately reflect one gender over another, but just that they don't tell the whole story of what's going on in the background compared to the sex-limited.

(Just thinking about how to make analytic plans in the future.)

neale's picture
Offline
Joined: 07/31/2009 - 15:14
Variance difference only

It is possible to fit a model that specifies the same proportions of variance, but a different total variance for the two sexes. Notation got scrambled at some point (possibly my fault) but this has been called a scalar sex difference model. Let's call it a variance difference model. To implement it, I would constrain am=af cm=cf and em=ef, and add a diagonal matrix with starting values equal to one. We only want to scale twin 1 in the OS DZ group if Twin 1 is male, so we need two matrices to "wrap" (i.e., pre and post-multiply) the previous expected covariance matrix with using the quadratic %&% operator. Something like

sdDiff <- mxMatrix("Diag", 2, 2, labels=rep("sdScale",2), free=T, values=1, name=sdScale)
sdOS <- mxMatrix("Diag", 2, 2, labels=c("sdScale",NA), free=c(T,F), values=1, name=sdOS)
mxAlgebra( sdDiff %&% whateverYouHadForMZMales, name="whateverMZMaleCovWasCalledBefore")
mxAlgebra( sdDiff %&% whateverYouHadForDZMales, name="whateverDZMaleCovWasCalledBefore")
mxAlgebra( sdOS %&% whateverYouHadForOSDZandAssumingTwin1isMale, name="whateverDZOSCovWasCalledBefore")

In a multivariate analysis, we'd probably want to do this a bit differently to simplify specification, e.g., using a diag(2) matrix to Kronecker product the variance differences matrices.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
There don't appear to be OS

There don't appear to be OS DZ twins in this case. It would be pretty simple to re-parameterize the model in terms of standardized variance components (constrained equal across sexes) and raw phenotypic variance or SD (allowed to differ by sex).

Edit: Something like my suggestion in another thread.

RFrank's picture
Offline
Joined: 06/08/2012 - 14:22
Oops

Thank you for these extremely helpful insights! I didn't see your comment until after I saw Dr. Neale's. I updated my code to reflect his suggestions. My sample indeed does not have opposite sex twin pairs, so I modified it for the bivariate case.

It looks to me like the output is as it is supposed to! The raw variance is different across gender but the standardized variance components are the same. I've attached my output here just to confirm that I've put these suggestions into place correctly. (Note that I used a different variable which is why the results are way different than my original post.) Thanks again!

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
Good to hear. You're

Good to hear. You're welcome.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
I don't think I understand

I don't think I understand your question. What do you mean by "gender isn't addressed at all" and "doesn't address gender at all?"