Hi everyone!
I'm currently building an extended children-of-twins-and-siblings model that will have to handle dichotomous data and quite a large sample size (N > ~10000 families). I have written the model with definition variables so that it can easily handle other relatedness coefficients in the parental generation (half siblings, cousins, etc., ) without any modification, and also so that it can handle different family structures (i.e., where Father 1 and Father 2 are related versus Mother 1 and Mother 2, etc.,). I have simulated data to make sure it works, which it does when I use continuous data and maximum likelihood estimation.
The problem arises when I try to convert it to handle dichotomous data. First, the number of free parameters (>10) make maximum likelihood unfeasible when using dichotomous data (i.e., the processing time is too long). I therefore use the WLS fit function instead. However, this does not work. If I use mxRun, I usually get an error message about unfeasible starting values (regardless of what the starting values are). If I try mxTryHardOrdinal, it runs, but the estimates are wrong and the variance constraints are ignored. I cannot figure out why.
I suspect this is because WLS does not handle definition variables, but I'm not sure. Either (1) I have made an error in the code, (2) there is a bug here (or it is not yet supported), or (3) there is some mathematical reason that makes this impossible which I am yet to understand. I hope some of you can be of help.
I was able to recreate the same problem in a simple AE twin model with dichotomous data, a definition variable, and WLS fit function. Here it is possible to use ML, which confirms that the model otherwise works (i.e, it gets the correct estimates). Are any of you able to spot the issue?
library(OpenMx) #mxOption(key="Default optimizer", value="NPSOL") #mxOption(key="Default optimizer", value="SLSQP") # ---------------------------------------------------------------------------------------------------------------------- # PREPARE DATA (SIMULATION) # Set parameters n <- 10000 # n/2 per zygosity VA1 <- 0.4 # VE1 = 1-VA1 cut <- 0.8 # z-value # Prepares data frame dt <- data.frame(matrix(NA,nrow=n,ncol=3,dimnames = list(NULL,c("x1","x2","rel")))) # Simulates genetic components A1_mz <- MASS::mvrnorm(n/2,c(0,0), matrix(c(1,1, 1,1),2,2,T)) A1_dz <- MASS::mvrnorm(n/2,c(0,0), matrix(c( 1,.5, .5, 1),2,2,T)) # Weighs genetic component and adds environmental component. dt[1:(n/2) , 1:2] <- sqrt(VA1)*A1_mz + rnorm(n,0,sqrt(1-VA1)) dt[(n/2+1):n , 1:2] <- sqrt(VA1)*A1_dz + rnorm(n,0,sqrt(1-VA1)) # Adds relatedness coefficients to data frame dt[,"rel"] <- c(rep(1,n/2),rep(0.5,n/2)) # Dichotomises the data dt[,1:2] <- ifelse(dt[,1:2] > cut,1,0) # Select Variables for Analysis selVars <- c('x1','x2') # list of variables names ntv <- length(selVars) # number of total variables # Converts to factor variables dt[,selVars] <- mxFactor(dt[,selVars], levels = c(0,1)) # ---------------------------------------------------------------------------------------------------------------------- # PREPARE MODEL # Create Data Objects dataTW <- mxData( observed=dt, type="raw" ) ##################################################### #### Creates Definition Variables #### ##################################################### defRel <- mxMatrix(type = "Full", nrow = 1, ncol = 1, free = FALSE, labels = "data.rel", name = "rel") ##################################################### #### Creates Free Parameters #### ##################################################### varA1 <- mxMatrix( type="Symm", nrow=1, ncol=1, free=TRUE, values=0.5, label="est_VA1", name="VA1" ) varE1 <- mxMatrix( type="Symm", nrow=1, ncol=1, free=TRUE, values=0.5, label="est_VE1", name="VE1" ) threG <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=TRUE, values=0, labels="threshold", name="threG") ####################################################### #### Creates Algebra for Expected (Co-)Variances #### ####################################################### var1 <- mxAlgebra( expression= VA1+VE1, name="var1" ) covTW <- mxAlgebra(expression = rel*VA1, name = "covTW") ##################################################### #### Creates Constrain Objects #### ##################################################### unitP <- mxMatrix(type="Unit",nrow = 1,ncol=1,name="unitP") varP1 <- mxAlgebra(expression = diag2vec(var1), name="varP1") consV1 <- mxConstraint(expression = varP1 == unitP, name="consV1") ####################################################### #### Creates Expectation Objects #### ####################################################### expCovTW <- mxAlgebra( expression= rbind( cbind( var1, covTW), cbind( covTW, var1)), name="expCov" ) meanG <- mxMatrix( type="Full", nrow=1, ncol=ntv, free=FALSE, values=0, labels="mean", name="meanG" ) expTW <- mxExpectationNormal( covariance = "expCov", means = "meanG", thresholds = "threG", dimnames = selVars ) fitFun <- mxFitFunctionWLS("WLS") #fitFun <- mxFitFunctionML() # Create Model Object model1 <- mxModel(defRel, varA1, varE1, var1, covTW, consV1, varP1,unitP, meanG, threG, expCovTW, dataTW, expTW, fitFun, name="model_1" ) # ---------------------------------------------------------------------------------------------------------------------- # RUN MODEL fit1 <- mxRun(model1) #fit1 <- mxTryHardOrdinal(model1) round(fit1$output$estimate,2) # Check estimates fit1$var1 # Check variance
For now, I have recoded my model with multiple groups instead, which works fine, but I would really like to use definition variables instead if possible. Having 5-10 different expected covariance matrices is quite messy, makes mistakes likely, and is not very flexible if I wish to use different family structures.
Thanks in advance for all help!
- Hans Fredrik Sunde
Yeah, WLS cannot deal with that kind of definition variable. You'll need to use a separate group for each zygosity.
I see. Thank you for the quick reply. Multigroup-method it is then.
I guess this is a because of a mathematical impossibility and not because it is not implemented?
It would be possible in principle to automagically make subsets of the data, but this could cause trouble as WLS relies on large N. Plus we try not to automagic things as it conflicts with WYSIWWD (what you say is what we do) philosophy that aims to keep user and software on the same page.
For cases where pihat IBS is desired as a continuous measure of relatedness, an alternative strategy would be needed. I need to think about that one…
Sorry I didn't reply. I did not see this until now. WYSIWWD is definitely preferable. If making subsets is what the method would require anyway, I can just as well make the subgroups manually. That way, I have full control.
(I "luckily" do not have a continuous measure of relatedness, so subsets is feasible, although smaller N is still an issue).