# Wrong estimates when using definition variables and WLS

I'm currently building an extended children-of-twins-and-siblings model that will have to handle dichotomous data and quite a large sample size (N > ~10000 families). I have written the model with definition variables so that it can easily handle other relatedness coefficients in the parental generation (half siblings, cousins, etc., ) without any modification, and also so that it can handle different family structures (i.e., where Father 1 and Father 2 are related versus Mother 1 and Mother 2, etc.,). I have simulated data to make sure it works, which it does when I use continuous data and maximum likelihood estimation.

The problem arises when I try to convert it to handle dichotomous data. First, the number of free parameters (>10) make maximum likelihood unfeasible when using dichotomous data (i.e., the processing time is too long). I therefore use the WLS fit function instead. However, this does not work. If I use mxRun, I usually get an error message about unfeasible starting values (regardless of what the starting values are). If I try mxTryHardOrdinal, it runs, but the estimates are wrong and the variance constraints are ignored. I cannot figure out why.

I suspect this is because WLS does not handle definition variables, but I'm not sure. Either (1) I have made an error in the code, (2) there is a bug here (or it is not yet supported), or (3) there is some mathematical reason that makes this impossible which I am yet to understand. I hope some of you can be of help.

I was able to recreate the same problem in a simple AE twin model with dichotomous data, a definition variable, and WLS fit function. Here it is possible to use ML, which confirms that the model otherwise works (i.e, it gets the correct estimates). Are any of you able to spot the issue?

library(OpenMx)

#mxOption(key="Default optimizer", value="NPSOL")

#mxOption(key="Default optimizer", value="SLSQP")

```
```# ----------------------------------------------------------------------------------------------------------------------

# PREPARE DATA (SIMULATION)

# Set parameters

n <- 10000 # n/2 per zygosity

VA1 <- 0.4 # VE1 = 1-VA1

cut <- 0.8 # z-value

# Prepares data frame

dt <- data.frame(matrix(NA,nrow=n,ncol=3,dimnames = list(NULL,c("x1","x2","rel"))))

# Simulates genetic components

A1_mz <- MASS::mvrnorm(n/2,c(0,0), matrix(c(1,1,

1,1),2,2,T))

A1_dz <- MASS::mvrnorm(n/2,c(0,0), matrix(c( 1,.5,

.5, 1),2,2,T))

# Weighs genetic component and adds environmental component.

dt[1:(n/2) , 1:2] <- sqrt(VA1)*A1_mz + rnorm(n,0,sqrt(1-VA1))

dt[(n/2+1):n , 1:2] <- sqrt(VA1)*A1_dz + rnorm(n,0,sqrt(1-VA1))

# Adds relatedness coefficients to data frame

dt[,"rel"] <- c(rep(1,n/2),rep(0.5,n/2))

# Dichotomises the data

dt[,1:2] <- ifelse(dt[,1:2] > cut,1,0)

# Select Variables for Analysis

selVars <- c('x1','x2') # list of variables names

ntv <- length(selVars) # number of total variables

# Converts to factor variables

dt[,selVars] <- mxFactor(dt[,selVars], levels = c(0,1))

# ----------------------------------------------------------------------------------------------------------------------

# PREPARE MODEL

# Create Data Objects

dataTW <- mxData( observed=dt, type="raw" )

#####################################################

#### Creates Definition Variables ####

#####################################################

defRel <- mxMatrix(type = "Full", nrow = 1, ncol = 1, free = FALSE, labels = "data.rel", name = "rel")

#####################################################

#### Creates Free Parameters ####

#####################################################

varA1 <- mxMatrix( type="Symm", nrow=1, ncol=1, free=TRUE, values=0.5, label="est_VA1", name="VA1" )

varE1 <- mxMatrix( type="Symm", nrow=1, ncol=1, free=TRUE, values=0.5, label="est_VE1", name="VE1" )

threG <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=TRUE, values=0, labels="threshold", name="threG")

#######################################################

#### Creates Algebra for Expected (Co-)Variances ####

#######################################################

var1 <- mxAlgebra( expression= VA1+VE1, name="var1" )

covTW <- mxAlgebra(expression = rel*VA1, name = "covTW")

#####################################################

#### Creates Constrain Objects ####

#####################################################

unitP <- mxMatrix(type="Unit",nrow = 1,ncol=1,name="unitP")

varP1 <- mxAlgebra(expression = diag2vec(var1), name="varP1")

consV1 <- mxConstraint(expression = varP1 == unitP, name="consV1")

#######################################################

#### Creates Expectation Objects ####

#######################################################

expCovTW <- mxAlgebra( expression= rbind( cbind( var1, covTW),

cbind( covTW, var1)), name="expCov" )

meanG <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=FALSE, values=0, labels="mean", name="meanG" )

expTW <- mxExpectationNormal( covariance = "expCov",

means = "meanG",

thresholds = "threG",

dimnames = selVars )

fitFun <- mxFitFunctionWLS("WLS")

#fitFun <- mxFitFunctionML()

`# Create Model Object`

model1 <- mxModel(defRel, varA1, varE1, var1, covTW,

consV1, varP1,unitP,

meanG, threG, expCovTW, dataTW, expTW, fitFun,

name="model_1" )

# ----------------------------------------------------------------------------------------------------------------------

# RUN MODEL

fit1 <- mxRun(model1)

#fit1 <- mxTryHardOrdinal(model1)

round(fit1$output$estimate,2) # Check estimates

fit1$var1 # Check variance

For now, I have recoded my model with multiple groups instead, which works fine, but I would really like to use definition variables instead if possible. Having 5-10 different expected covariance matrices is quite messy, makes mistakes likely, and is not very flexible if I wish to use different family structures.

Thanks in advance for all help!

- Hans Fredrik Sunde

## data.rel

Log in or register to post comments

In reply to data.rel by jpritikin

## Thanks

I guess this is a because of a mathematical impossibility and not because it is not implemented?

Log in or register to post comments

In reply to Thanks by hansfredriksunde

## Partly

For cases where pihat IBS is desired as a continuous measure of relatedness, an alternative strategy would be needed. I need to think about that oneā¦

Log in or register to post comments

In reply to Partly by AdminNeale

## Thanks!

(I "luckily" do not have a continuous measure of relatedness, so subsets is feasible, although smaller N is still an issue).

Log in or register to post comments