Wrong estimates when using definition variables and WLS
I'm currently building an extended children-of-twins-and-siblings model that will have to handle dichotomous data and quite a large sample size (N > ~10000 families). I have written the model with definition variables so that it can easily handle other relatedness coefficients in the parental generation (half siblings, cousins, etc., ) without any modification, and also so that it can handle different family structures (i.e., where Father 1 and Father 2 are related versus Mother 1 and Mother 2, etc.,). I have simulated data to make sure it works, which it does when I use continuous data and maximum likelihood estimation.
The problem arises when I try to convert it to handle dichotomous data. First, the number of free parameters (>10) make maximum likelihood unfeasible when using dichotomous data (i.e., the processing time is too long). I therefore use the WLS fit function instead. However, this does not work. If I use mxRun, I usually get an error message about unfeasible starting values (regardless of what the starting values are). If I try mxTryHardOrdinal, it runs, but the estimates are wrong and the variance constraints are ignored. I cannot figure out why.
I suspect this is because WLS does not handle definition variables, but I'm not sure. Either (1) I have made an error in the code, (2) there is a bug here (or it is not yet supported), or (3) there is some mathematical reason that makes this impossible which I am yet to understand. I hope some of you can be of help.
I was able to recreate the same problem in a simple AE twin model with dichotomous data, a definition variable, and WLS fit function. Here it is possible to use ML, which confirms that the model otherwise works (i.e, it gets the correct estimates). Are any of you able to spot the issue?
library(OpenMx)
#mxOption(key="Default optimizer", value="NPSOL")
#mxOption(key="Default optimizer", value="SLSQP")
# ----------------------------------------------------------------------------------------------------------------------
# PREPARE DATA (SIMULATION)
# Set parameters
n <- 10000 # n/2 per zygosity
VA1 <- 0.4 # VE1 = 1-VA1
cut <- 0.8 # z-value
# Prepares data frame
dt <- data.frame(matrix(NA,nrow=n,ncol=3,dimnames = list(NULL,c("x1","x2","rel"))))
# Simulates genetic components
A1_mz <- MASS::mvrnorm(n/2,c(0,0), matrix(c(1,1,
1,1),2,2,T))
A1_dz <- MASS::mvrnorm(n/2,c(0,0), matrix(c( 1,.5,
.5, 1),2,2,T))
# Weighs genetic component and adds environmental component.
dt[1:(n/2) , 1:2] <- sqrt(VA1)*A1_mz + rnorm(n,0,sqrt(1-VA1))
dt[(n/2+1):n , 1:2] <- sqrt(VA1)*A1_dz + rnorm(n,0,sqrt(1-VA1))
# Adds relatedness coefficients to data frame
dt[,"rel"] <- c(rep(1,n/2),rep(0.5,n/2))
# Dichotomises the data
dt[,1:2] <- ifelse(dt[,1:2] > cut,1,0)
# Select Variables for Analysis
selVars <- c('x1','x2') # list of variables names
ntv <- length(selVars) # number of total variables
# Converts to factor variables
dt[,selVars] <- mxFactor(dt[,selVars], levels = c(0,1))
# ----------------------------------------------------------------------------------------------------------------------
# PREPARE MODEL
# Create Data Objects
dataTW <- mxData( observed=dt, type="raw" )
#####################################################
#### Creates Definition Variables ####
#####################################################
defRel <- mxMatrix(type = "Full", nrow = 1, ncol = 1, free = FALSE, labels = "data.rel", name = "rel")
#####################################################
#### Creates Free Parameters ####
#####################################################
varA1 <- mxMatrix( type="Symm", nrow=1, ncol=1, free=TRUE, values=0.5, label="est_VA1", name="VA1" )
varE1 <- mxMatrix( type="Symm", nrow=1, ncol=1, free=TRUE, values=0.5, label="est_VE1", name="VE1" )
threG <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=TRUE, values=0, labels="threshold", name="threG")
#######################################################
#### Creates Algebra for Expected (Co-)Variances ####
#######################################################
var1 <- mxAlgebra( expression= VA1+VE1, name="var1" )
covTW <- mxAlgebra(expression = rel*VA1, name = "covTW")
#####################################################
#### Creates Constrain Objects ####
#####################################################
unitP <- mxMatrix(type="Unit",nrow = 1,ncol=1,name="unitP")
varP1 <- mxAlgebra(expression = diag2vec(var1), name="varP1")
consV1 <- mxConstraint(expression = varP1 == unitP, name="consV1")
#######################################################
#### Creates Expectation Objects ####
#######################################################
expCovTW <- mxAlgebra( expression= rbind( cbind( var1, covTW),
cbind( covTW, var1)), name="expCov" )
meanG <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=FALSE, values=0, labels="mean", name="meanG" )
expTW <- mxExpectationNormal( covariance = "expCov",
means = "meanG",
thresholds = "threG",
dimnames = selVars )
fitFun <- mxFitFunctionWLS("WLS")
#fitFun <- mxFitFunctionML()
# Create Model Object
model1 <- mxModel(defRel, varA1, varE1, var1, covTW,
consV1, varP1,unitP,
meanG, threG, expCovTW, dataTW, expTW, fitFun,
name="model_1" )
# ----------------------------------------------------------------------------------------------------------------------
# RUN MODEL
fit1 <- mxRun(model1)
#fit1 <- mxTryHardOrdinal(model1)
round(fit1$output$estimate,2) # Check estimates
fit1$var1 # Check variance
For now, I have recoded my model with multiple groups instead, which works fine, but I would really like to use definition variables instead if possible. Having 5-10 different expected covariance matrices is quite messy, makes mistakes likely, and is not very flexible if I wish to use different family structures.
Thanks in advance for all help!
- Hans Fredrik Sunde
data.rel
Log in or register to post comments
In reply to data.rel by jpritikin
Thanks
I guess this is a because of a mathematical impossibility and not because it is not implemented?
Log in or register to post comments
In reply to Thanks by hansfredriksunde
Partly
For cases where pihat IBS is desired as a continuous measure of relatedness, an alternative strategy would be needed. I need to think about that oneā¦
Log in or register to post comments
In reply to Partly by AdminNeale
Thanks!
(I "luckily" do not have a continuous measure of relatedness, so subsets is feasible, although smaller N is still an issue).
Log in or register to post comments