Wrong estimates when using definition variables and WLS

Posted on Fri, 12/10/2021 - 11:32

hansfredriksunde Joined: 12/10/2021

Forums

Hi everyone!

I'm currently building an extended children-of-twins-and-siblings model that will have to handle dichotomous data and quite a large sample size (N > ~10000 families). I have written the model with definition variables so that it can easily handle other relatedness coefficients in the parental generation (half siblings, cousins, etc., ) without any modification, and also so that it can handle different family structures (i.e., where Father 1 and Father 2 are related versus Mother 1 and Mother 2, etc.,). I have simulated data to make sure it works, which it does when I use continuous data and maximum likelihood estimation.

The problem arises when I try to convert it to handle dichotomous data. First, the number of free parameters (>10) make maximum likelihood unfeasible when using dichotomous data (i.e., the processing time is too long). I therefore use the WLS fit function instead. However, this does not work. If I use mxRun, I usually get an error message about unfeasible starting values (regardless of what the starting values are). If I try mxTryHardOrdinal, it runs, but the estimates are wrong and the variance constraints are ignored. I cannot figure out why.

I suspect this is because WLS does not handle definition variables, but I'm not sure. Either (1) I have made an error in the code, (2) there is a bug here (or it is not yet supported), or (3) there is some mathematical reason that makes this impossible which I am yet to understand. I hope some of you can be of help.

I was able to recreate the same problem in a simple AE twin model with dichotomous data, a definition variable, and WLS fit function. Here it is possible to use ML, which confirms that the model otherwise works (i.e, it gets the correct estimates). Are any of you able to spot the issue?

library(OpenMx) #mxOption(key="Default optimizer", value="NPSOL") #mxOption(key="Default optimizer", value="SLSQP")


# ----------------------------------------------------------------------------------------------------------------------

# PREPARE DATA (SIMULATION)
# Set parameters

n <- 10000   # n/2 per zygosity

VA1 <- 0.4   # VE1 = 1-VA1

cut <- 0.8   # z-value
# Prepares data frame

dt <- data.frame(matrix(NA,nrow=n,ncol=3,dimnames = list(NULL,c("x1","x2","rel"))))
# Simulates genetic components

A1_mz <- MASS::mvrnorm(n/2,c(0,0), matrix(c(1,1,

                                            1,1),2,2,T))

A1_dz <- MASS::mvrnorm(n/2,c(0,0), matrix(c( 1,.5,

                                            .5, 1),2,2,T))

# Weighs genetic component and adds environmental component.

dt[1:(n/2)   , 1:2]   <- sqrt(VA1)*A1_mz + rnorm(n,0,sqrt(1-VA1))

dt[(n/2+1):n , 1:2]   <- sqrt(VA1)*A1_dz + rnorm(n,0,sqrt(1-VA1))
# Adds relatedness coefficients to data frame

dt[,"rel"] <- c(rep(1,n/2),rep(0.5,n/2))
# Dichotomises the data

dt[,1:2] <- ifelse(dt[,1:2] > cut,1,0)
# Select Variables for Analysis

selVars      <- c('x1','x2')    # list of variables names

ntv          <- length(selVars)      # number of total variables
# Converts to factor variables

dt[,selVars] <- mxFactor(dt[,selVars], levels = c(0,1))
# ----------------------------------------------------------------------------------------------------------------------

# PREPARE MODEL
# Create Data Objects

dataTW    <- mxData( observed=dt, type="raw" )
#####################################################

####        Creates Definition Variables         ####

#####################################################

defRel <- mxMatrix(type = "Full", nrow = 1,  ncol = 1, free = FALSE, labels = "data.rel", name = "rel")
#####################################################

####          Creates Free Parameters            ####

#####################################################

varA1      <- mxMatrix( type="Symm", nrow=1, ncol=1, free=TRUE, values=0.5, label="est_VA1",   name="VA1" )

varE1      <- mxMatrix( type="Symm", nrow=1, ncol=1, free=TRUE, values=0.5, label="est_VE1",   name="VE1" )
threG      <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=TRUE, values=0, labels="threshold", name="threG")
#######################################################

####  Creates Algebra for Expected (Co-)Variances  ####

#######################################################

var1     <- mxAlgebra( expression= VA1+VE1, name="var1" )

covTW     <- mxAlgebra(expression = rel*VA1, name = "covTW")
#####################################################

####          Creates Constrain Objects          ####

#####################################################

unitP     <- mxMatrix(type="Unit",nrow = 1,ncol=1,name="unitP")

varP1     <- mxAlgebra(expression = diag2vec(var1), name="varP1")

consV1    <- mxConstraint(expression = varP1 == unitP, name="consV1")
#######################################################

####         Creates Expectation Objects           ####

#######################################################

expCovTW  <- mxAlgebra( expression= rbind( cbind(    var1,   covTW),

                                           cbind(   covTW,    var1)), name="expCov" )
meanG     <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=FALSE, values=0, labels="mean", name="meanG" )
expTW     <- mxExpectationNormal( covariance = "expCov",

                                  means      = "meanG",

                                  thresholds = "threG",

                                  dimnames   = selVars )
fitFun     <- mxFitFunctionWLS("WLS")

#fitFun     <- mxFitFunctionML()

# Create Model Object model1 <- mxModel(defRel, varA1, varE1, var1, covTW, consV1, varP1,unitP, meanG, threG, expCovTW, dataTW, expTW, fitFun, name="model_1" ) # ---------------------------------------------------------------------------------------------------------------------- # RUN MODEL fit1 <- mxRun(model1) #fit1 <- mxTryHardOrdinal(model1) round(fit1$output$estimate,2) # Check estimates fit1$var1 # Check variance

For now, I have recoded my model with multiple groups instead, which works fine, but I would really like to use definition variables instead if possible. Having 5-10 different expected covariance matrices is quite messy, makes mistakes likely, and is not very flexible if I wish to use different family structures.

Thanks in advance for all help!

- Hans Fredrik Sunde

Replied on Fri, 12/10/2021 - 13:01

jpritikin Joined: May 23, 2012

data.rel

Yeah, WLS cannot deal with that kind of definition variable. You'll need to use a separate group for each zygosity.

Log in or register to post comments

Replied on Sat, 12/11/2021 - 06:05

hansfredriksunde Joined: Dec 10, 2021

Thanks

I see. Thank you for the quick reply. Multigroup-method it is then.

I guess this is a because of a mathematical impossibility and not because it is not implemented?

Log in or register to post comments

Replied on Fri, 12/17/2021 - 11:06

AdminNeale Joined: Mar 01, 2013

Partly

It would be possible in principle to automagically make subsets of the data, but this could cause trouble as WLS relies on large N. Plus we try not to automagic things as it conflicts with WYSIWWD (what you say is what we do) philosophy that aims to keep user and software on the same page.

For cases where pihat IBS is desired as a continuous measure of relatedness, an alternative strategy would be needed. I need to think about that one…

Log in or register to post comments

Replied on Fri, 04/29/2022 - 06:19

hansfredriksunde Joined: Dec 10, 2021

Thanks!

Sorry I didn't reply. I did not see this until now. WYSIWWD is definitely preferable. If making subsets is what the method would require anyway, I can just as well make the subgroups manually. That way, I have full control.

(I "luckily" do not have a continuous measure of relatedness, so subsets is feasible, although smaller N is still an issue).

Log in or register to post comments

Log in or register to post comments

News

Recent Posts

Wrong estimates when using definition variables and WLS

data.rel

Thanks

Partly

Thanks!

News

Recent Posts