Wrong estimates when using definition variables and WLS

Posted on
Picture of user. hansfredriksunde Joined: 12/10/2021
Hi everyone!

I'm currently building an extended children-of-twins-and-siblings model that will have to handle dichotomous data and quite a large sample size (N > ~10000 families). I have written the model with definition variables so that it can easily handle other relatedness coefficients in the parental generation (half siblings, cousins, etc., ) without any modification, and also so that it can handle different family structures (i.e., where Father 1 and Father 2 are related versus Mother 1 and Mother 2, etc.,). I have simulated data to make sure it works, which it does when I use continuous data and maximum likelihood estimation.

The problem arises when I try to convert it to handle dichotomous data. First, the number of free parameters (>10) make maximum likelihood unfeasible when using dichotomous data (i.e., the processing time is too long). I therefore use the WLS fit function instead. However, this does not work. If I use mxRun, I usually get an error message about unfeasible starting values (regardless of what the starting values are). If I try mxTryHardOrdinal, it runs, but the estimates are wrong and the variance constraints are ignored. I cannot figure out why.

I suspect this is because WLS does not handle definition variables, but I'm not sure. Either (1) I have made an error in the code, (2) there is a bug here (or it is not yet supported), or (3) there is some mathematical reason that makes this impossible which I am yet to understand. I hope some of you can be of help.

I was able to recreate the same problem in a simple AE twin model with dichotomous data, a definition variable, and WLS fit function. Here it is possible to use ML, which confirms that the model otherwise works (i.e, it gets the correct estimates). Are any of you able to spot the issue?


library(OpenMx)
#mxOption(key="Default optimizer", value="NPSOL")
#mxOption(key="Default optimizer", value="SLSQP")

# ----------------------------------------------------------------------------------------------------------------------
# PREPARE DATA (SIMULATION)

# Set parameters
n <- 10000 # n/2 per zygosity
VA1 <- 0.4 # VE1 = 1-VA1
cut <- 0.8 # z-value

# Prepares data frame
dt <- data.frame(matrix(NA,nrow=n,ncol=3,dimnames = list(NULL,c("x1","x2","rel"))))

# Simulates genetic components
A1_mz <- MASS::mvrnorm(n/2,c(0,0), matrix(c(1,1,
1,1),2,2,T))
A1_dz <- MASS::mvrnorm(n/2,c(0,0), matrix(c( 1,.5,
.5, 1),2,2,T))
# Weighs genetic component and adds environmental component.
dt[1:(n/2) , 1:2] <- sqrt(VA1)*A1_mz + rnorm(n,0,sqrt(1-VA1))
dt[(n/2+1):n , 1:2] <- sqrt(VA1)*A1_dz + rnorm(n,0,sqrt(1-VA1))

# Adds relatedness coefficients to data frame
dt[,"rel"] <- c(rep(1,n/2),rep(0.5,n/2))

# Dichotomises the data
dt[,1:2] <- ifelse(dt[,1:2] > cut,1,0)

# Select Variables for Analysis
selVars <- c('x1','x2') # list of variables names
ntv <- length(selVars) # number of total variables

# Converts to factor variables
dt[,selVars] <- mxFactor(dt[,selVars], levels = c(0,1))

# ----------------------------------------------------------------------------------------------------------------------
# PREPARE MODEL

# Create Data Objects
dataTW <- mxData( observed=dt, type="raw" )

#####################################################
#### Creates Definition Variables ####
#####################################################
defRel <- mxMatrix(type = "Full", nrow = 1, ncol = 1, free = FALSE, labels = "data.rel", name = "rel")

#####################################################
#### Creates Free Parameters ####
#####################################################
varA1 <- mxMatrix( type="Symm", nrow=1, ncol=1, free=TRUE, values=0.5, label="est_VA1", name="VA1" )
varE1 <- mxMatrix( type="Symm", nrow=1, ncol=1, free=TRUE, values=0.5, label="est_VE1", name="VE1" )

threG <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=TRUE, values=0, labels="threshold", name="threG")

#######################################################
#### Creates Algebra for Expected (Co-)Variances ####
#######################################################
var1 <- mxAlgebra( expression= VA1+VE1, name="var1" )
covTW <- mxAlgebra(expression = rel*VA1, name = "covTW")

#####################################################
#### Creates Constrain Objects ####
#####################################################
unitP <- mxMatrix(type="Unit",nrow = 1,ncol=1,name="unitP")
varP1 <- mxAlgebra(expression = diag2vec(var1), name="varP1")
consV1 <- mxConstraint(expression = varP1 == unitP, name="consV1")

#######################################################
#### Creates Expectation Objects ####
#######################################################
expCovTW <- mxAlgebra( expression= rbind( cbind( var1, covTW),
cbind( covTW, var1)), name="expCov" )

meanG <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=FALSE, values=0, labels="mean", name="meanG" )

expTW <- mxExpectationNormal( covariance = "expCov",
means = "meanG",
thresholds = "threG",
dimnames = selVars )

fitFun <- mxFitFunctionWLS("WLS")
#fitFun <- mxFitFunctionML()

# Create Model Object
model1 <- mxModel(defRel, varA1, varE1, var1, covTW,
consV1, varP1,unitP,
meanG, threG, expCovTW, dataTW, expTW, fitFun,
name="model_1" )
# ----------------------------------------------------------------------------------------------------------------------
# RUN MODEL
fit1 <- mxRun(model1)
#fit1 <- mxTryHardOrdinal(model1)
round(fit1$output$estimate,2) # Check estimates
fit1$var1 # Check variance

For now, I have recoded my model with multiple groups instead, which works fine, but I would really like to use definition variables instead if possible. Having 5-10 different expected covariance matrices is quite messy, makes mistakes likely, and is not very flexible if I wish to use different family structures.

Thanks in advance for all help!

- Hans Fredrik Sunde

Replied on Fri, 12/10/2021 - 13:01
Picture of user. jpritikin Joined: 05/23/2012

Yeah, WLS cannot deal with that kind of definition variable. You'll need to use a separate group for each zygosity.
Replied on Sat, 12/11/2021 - 06:05
Picture of user. hansfredriksunde Joined: 12/10/2021

In reply to by jpritikin

I see. Thank you for the quick reply. Multigroup-method it is then.

I guess this is a because of a mathematical impossibility and not because it is not implemented?

Replied on Fri, 12/17/2021 - 11:06
Picture of user. AdminNeale Joined: 03/01/2013

In reply to by hansfredriksunde

It would be possible in principle to automagically make subsets of the data, but this could cause trouble as WLS relies on large N. Plus we try not to automagic things as it conflicts with WYSIWWD (what you say is what we do) philosophy that aims to keep user and software on the same page.

For cases where pihat IBS is desired as a continuous measure of relatedness, an alternative strategy would be needed. I need to think about that oneā€¦

Replied on Fri, 04/29/2022 - 06:19
Picture of user. hansfredriksunde Joined: 12/10/2021

In reply to by AdminNeale

Sorry I didn't reply. I did not see this until now. WYSIWWD is definitely preferable. If making subsets is what the method would require anyway, I can just as well make the subgroups manually. That way, I have full control.

(I "luckily" do not have a continuous measure of relatedness, so subsets is feasible, although smaller N is still an issue).