You are here

Weird model comparison results when using WLS

3 posts / 0 new
Last post
hansfredriksunde's picture
Joined: 12/10/2021 - 11:28
Weird model comparison results when using WLS
AttachmentSize
Binary Data WLS_error_SATtests.R4.55 KB
Binary Data WLS_error_ACEvAE.R5.52 KB

UPDATE!

I have found the issue. A colleague tried to reproduce my error with the code below, unsuccessfully. For him, the results were as expected: SBchisq showed the simple difference in chi-squares, and the comparison was not significant. We therefore concluded that this is a bug which is only apparent in the most recent version of OpenMx - yet to be released as a binary. I run an OpenMx-version compiled from source code (OpenMx version number: 2.20.6.19), as the bug fix I depend on is yet to be released as a binary.

Thanks nonetheless!


Hi everyone

I'm currently building an extended children-of-twins-and-siblings model with partly dichotomous data. The number of free parameters (>10) make maximum likelihood unfeasible (i.e., the processing time is too long), so I therefore use the WLS fit function instead.

I have gotten to the point where I am comparing nested models, but unfortunately, I get results that I do not quite understand. Despite a higher AIC, the unrestricted model is seemingly significantly better than the restricted one. When using WLS, the mxCompare() function reports SBchisq instead of -2LL (i.e., the Satorra-Bentler (2001) scaled difference chi-squared test statistics). On the info page for mxCompare, it says that "Under full weighted least squares, the Satorra-Bentler chi-squared value is equal to the difference in the model chi-squared values". Despite using full weighted least squares, the reported SBchisq is very much larger than the difference between the chi squares of the two models.

I have reproduced the problem with a simple simulation. I have simulated binary data where the AE model is the true model. I then run and compare an ACE model with an AE model. As expected, the C-component in the ACE model has confidence intervals that contain zero, and the AE model has lower AIC indicating better fit. Despite this, mxCompare() reports that the ACE model is significantly better than the AE-model. The true chi square difference is 0.156, but SBchisq reports 777.55 (which is what the p-value seems to be based on). Surely this is not correct?

  base comparison ep    chisq df      AIC  SBchisq diffdf             p
1  ACE       <NA>  3 13.48767  3 19.48767       NA     NA            NA
2  ACE         AE  2 13.64318  4 17.64318 777.5465      1 4.110884e-171

The code is reproduced at the end of the post.

So the question is: Why am I getting the wrong p-value? Have I misspecified something, or is it a bug? Or is it something I don't understand about chi-square tests and WLS which means that this is indeed the true result? Is there a reason I cannot simply take the difference in chi squares and test it manually, like this:?

> pchisq(0.15551, 1, lower.tail = FALSE)
[1] 0.6933245

Thank you very much for your help. I really appreciate the help that is available from this forum
Sincerely
Hans Fredrik

Here is the code (also uploaded as "WLS_error_ACEvAE.R"):

library(OpenMx)
mxOption(key="Default optimizer", value="NPSOL")
 
# PREPARE HELPER FUNCTION
print.CI <- function(fit) {
  x <- fit$output$confidenceIntervals  
  rownames(x) <- colnames(fit$est_var$result)
  return(round(x,2))
}
 
# ----------------------------------------------------------------------------------------------------------------------
# PREPARE DATA (SIMULATION)
 
# Set parameters
set.seed(123456789)
n <- 10000   # n/2 per zygosity
VA1 <- 0.4   # VE1 = 1-VA1
cut <- 0.8   # z-value
 
# Prepares data frame
dt <- data.frame(matrix(NA,nrow=n,ncol=3,dimnames = list(NULL,c("x1","x2","rel"))))
 
# Simulates genetic components
A1_mz <- MASS::mvrnorm(n/2,c(0,0), matrix(c(1,1,
                                            1,1),2,2,T))
A1_dz <- MASS::mvrnorm(n/2,c(0,0), matrix(c( 1,.5,
                                            .5, 1),2,2,T))
# Weighs genetic component and adds environmental component.
dt[1:(n/2)   , 1:2]   <- sqrt(VA1)*A1_mz + rnorm(n,0,sqrt(1-VA1))
dt[(n/2+1):n , 1:2]   <- sqrt(VA1)*A1_dz + rnorm(n,0,sqrt(1-VA1))
 
# Adds relatedness coefficients to data frame
dt[,"rel"] <- c(rep(1,n/2),rep(0.5,n/2))
 
# Dichotomises the data
dt[,1:2] <- ifelse(dt[,1:2] > cut,1,0)
 
# Select Variables for Analysis
selVars      <- c('x1','x2')    # list of variables names
ntv          <- length(selVars)      # number of total variables
 
# Converts to factor variables
dt[,selVars] <- mxFactor(dt[,selVars], levels = c(0,1))
 
 
# ----------------------------------------------------------------------------------------------------------------------
# PREPARE MODEL
 
#####################################################
####          Creates Data Objects              ####
#####################################################
dataMZ    <- mxData( observed=subset(dt, select = selVars, subset = rel == 1), type="raw" )
dataDZ    <- mxData( observed=subset(dt, select = selVars, subset = rel == 0.5), type="raw" )
 
#####################################################
####          Creates Free Parameters            ####
#####################################################
varA1      <- mxMatrix(name="VA1", type="Symm", nrow=1, ncol=1, free=TRUE, values=0, label="est_VA1" ) 
varC1      <- mxMatrix(name="VC1", type="Symm", nrow=1, ncol=1, free=TRUE, values=0, label="est_VC1" ) 
varE1      <- mxMatrix(name="VE1", type="Symm", nrow=1, ncol=1, free=FALSE, values=1, label="est_VE1" )
 
#######################################################
####  Creates Algebra for Expected (Co-)Variances  ####
#######################################################
var1     <- mxAlgebra(expression = VA1+VC1+VE1, name="var1" )
covMZ    <- mxAlgebra(expression = VA1+VC1,     name="covMZ")
covDZ    <- mxAlgebra(expression = 0.5*VA1+VC1, name="covDZ")
 
 
#######################################################
####         Creates Expectation Objects           ####
#######################################################
expCovMZ  <- mxAlgebra( expression= rbind( cbind(    var1,   covMZ),
                                           cbind(   covMZ,    var1)), name="expCovMZ" )
expCovDZ  <- mxAlgebra( expression= rbind( cbind(    var1,   covDZ),
                                           cbind(   covDZ,    var1)), name="expCovDZ" )
 
threG      <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=TRUE, values=0, labels="threshold", name="threG")
 
meanG     <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=FALSE, values=0, labels="mean", name="meanG" )
 
expMZ     <- mxExpectationNormal( covariance = "expCovMZ", 
                                  means      = "meanG", 
                                  thresholds = "threG", 
                                  dimnames   = selVars )
expDZ     <- mxExpectationNormal( covariance = "expCovDZ", 
                                  means      = "meanG", 
                                  thresholds = "threG", 
                                  dimnames   = selVars )
 
 
#######################################################
####               FINAL PREPARATIONS              ####
#######################################################
 
 
# Creates standardized estimates + confidence intervals
est_var     <- mxAlgebra(expression=cbind(VA1/var1, VC1/var1, VE1/var1, threshold/sqrt(var1)), 
                         name="est_var", dimnames=list("est",c("VA1","VC1","VE1","threshold")) )
CIs <- mxCI(c("est_var"))
 
 
fitFun     <- mxFitFunctionWLS("WLS")
 
modelMZ <- mxModel(threG, meanG, varA1, varC1, varE1, var1, covMZ, expCovMZ, expMZ, dataMZ, fitFun, name = "MZ")
modelDZ <- mxModel(threG, meanG, varA1, varC1, varE1, var1, covDZ, expCovDZ, expDZ, dataDZ, fitFun, name = "DZ")
 
multi     <- mxFitFunctionMultigroup( c("MZ","DZ") )
 
 
 
#######################################################
####               Run first model                 ####
#######################################################
 
model1   <- mxModel(threG, meanG, expCovMZ, expCovDZ, 
                    varA1, varC1, varE1, var1, covMZ, covDZ, 
                    modelMZ, modelDZ, multi, CIs, est_var,
                    name="ACE" )
 
fit1    <- mxRun(model1, intervals = T)
print.CI(fit1)
 
 
#######################################################
####      Prepare and run constrained model        ####
#######################################################
 
 
model2 <- mxModel(name = "AE", fit1)
model2 <- omxSetParameters( model2, label="est_VC1", free=FALSE, values=0 )
 
fit2    <- mxRun(model2, intervals = T)
print.CI(fit2)
 
# Compares models
mxCompare(fit1,fit2)

I have also attached a similar code ("WLS_error_SATtests.R") which compares a saturated model with equality-constrained thresholds. This simulation has similar problems, but the SBchisq and p-value is returned as NaN (Not a number). I guess this has something to do with the chi square for the saturated model being 0.

mhunter's picture
Offline
Joined: 07/31/2009 - 15:26
Hmm ... looking into this

Hi Hans,

Thanks for the clear code and working example! I'll have to take some time to check into this. The Satorra-Bentler chi-squared value seems way off, but I'm not sure what's causing it yet. The mxCompare() behavior for WLS models is still relatively new; I've tested it, but not exhaustively. There is mostly likely a bug that I didn't catch in my initial testing.

At least one mystery is easy to solve: the p-value is for the SBchisq, not the chi-squared in the chisq column.

> pchisq(777.5465, 1, lower.tail = FALSE)
[1] 4.110957e-171

Satorra and Bentler (2001), found that this scaled difference chi-squared behaved better in smaller samples than the standard chi-squared, and performed equally well-for large samples.

I'll update you on this thread as I find out more!

Cheers,
Mike Hunter

hansfredriksunde's picture
Joined: 12/10/2021 - 11:28
Thanks!

Thank you very much. I'll just manually calculate the chi-square difference and p-value in the mean time. Luckily, my sample is very large, so it does not sound like it will make a difference.

Cheers!