You are here

error code 6 in Univariate ACE model- Could it be related to missing data?

4 posts / 0 new
Last post
lior abramson's picture
Offline
Joined: 07/21/2017 - 13:13
error code 6 in Univariate ACE model- Could it be related to missing data?

Hello,
I would like to ask for help with an error code 6:

Required details about my openMX run:
OpenMx version: 2.7.12 [GIT v2.7.12-dirty]
R version: R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32
Default optimiser: CSOLNP

I ran a univariate ACE model with three covariates that are 100% shared between the twins (one continuous and two binary), using paths specification and raw data as an input. I did it twice:
1)one time with all the data, including twin pairs in which the data on the phenotypic observed variable of one of the twins is missing.
2)a second time, only with pairs that have full data.
If I under correctly, openMX runs the analysis with FIML by default, so I didn't enter any command regarding what to do with missing values in my script.
In the first run (with missing data) I got an error code 6 after running the ACE model but not after running the AE model.
In the second time, I got the opposite: an error code 6 after running the AE model but not after running the ACE model. This was the exact error text:

" In model 'ACE/AE' Optimizer returned a non-zero status code 6. The model does not satisfy the first-order optimality conditions to the required accuracy, and no improved point for the merit function could be found during the final linesearch (Mx status RED)"

Any idea what might have caused this?
The syntax of the ACE model follows this message (the AE model is very similar).
Thank you a lot, this forum is a life saver

#telling Mx that sex and place are binary variables
 
DF$sex_a <- mxFactor(DF$sex_a, levels= c(1,2), label="sex")
DF$LabHome6 <-  mxFactor(DF$LabHome6, levels=c(0,1), label="LabHome6")
threshold <- mxThreshold(vars=c("sex_a","LabHome6"), nThresh=c(1,1), free=TRUE, values=c(1,0))
 
#finding reasonable starting values for the ACE model
 
StartMean <-mean(c(mean(DF$Phenotype_1),mean(DF$Phenotype_2)))
StartMean.Age <-mean(DF$Age6)
FAC <-c(DF$Phenotype_1,DF$Phenotype_2) 
StartVar.ACE <-sqrt(var(FAC)/3)
 
require("OpenMx")
 
# Select Variables for Analysis
selVars <- c('Phenotype_1','Phenotype_2', 'Age6', 'sex_a','LabHome6')
aceVars <- c("A1","C1","E1","A2","C2","E2")
 
# Select Data for Analysis
mzData <- subset(DF, zygosity==1,selVars)
dzData <- subset(DF, zygosity==2, selVars)
 
# Path objects for Multiple Groups
manifestVars=selVars
latentVars=aceVars
 
# specify paths for the variances and means of the latent variables
latVariances <- mxPath( from=aceVars, arrows=2,
                        free=FALSE, values=1 )
 
varCovar <- mxPath( from= c('Age6','sex_a','LabHome6'), arrows=2,
                  free=(c(TRUE,FALSE,FALSE)), values=c(1,1,1), label=c('agevar','sexVar','placeVar') )
 
# means of latent variables
latMeans <- mxPath( from="one", to=aceVars, arrows=1,  free=FALSE, values=0 )
 
#specify paths for the means of the observed variables
obsMeans <- mxPath( from="one", to=selVars, arrows=1, free=c(T,T,T,F,F), values=c(StartMean, StartMean,                              StartMean.Age,0,0), labels=c("meanFAC","meanFAC","meanAge","meanSex","meanPlace") )
 
 
# path coefficients for twin 1. 
pathAceT1 <- mxPath( from=c("A1","C1","E1", "Age6","sex_a","LabHome6"), to="Phenotype_1", arrows=1, free=TRUE, 
                                            values=StartVar.ACE, label=c("a","c","e", "age","sex","place") )
# path coefficients for twin 2
pathAceT2 <- mxPath( from=c("A2","C2","E2", "Age6","sex_a","LabHome6"), to="Phenotype_2", arrows=1,free=TRUE, 
                                           values=StartVar.ACE, label=c("a","c","e", "age","sex","place") ) 
 
# covariance between C1 & C2
covC1C2 <- mxPath( from="C1", to="C2", arrows=2,free=FALSE, values=1 )
 
 
# covariance between A1 & A2 in MZ's
covA1A2_MZ <- mxPath( from="A1", to="A2", arrows=2, free=FALSE, values=1 )
# covariance between A1 & A2 in DZ's
covA1A2_DZ <- mxPath( from="A1", to="A2", arrows=2,  free=FALSE, values=.5 )
 
 
#call the data.frame with the MZ raw data, mzData, and the DZ raw data
dataMZ <- mxData( observed=mzData, type="raw" )
dataDZ <- mxData( observed=dzData, type="raw" )
 
 
# Combine Groups
#arrange together all the paths that are the same for MZ and DZ
paths <- list( latVariances, latMeans, obsMeans, pathAceT1, pathAceT2, covC1C2, varCovar )
 
#Build a model separately for MZ and DZ
modelMZ <- mxModel(model="MZ", type="RAM", manifestVars=selVars,
                   latentVars=aceVars, paths, threshold, covA1A2_MZ, dataMZ )
modelDZ <- mxModel(model="DZ", type="RAM", manifestVars=selVars,
                   latentVars=aceVars, paths, threshold, covA1A2_DZ, dataDZ )
 
 
minus2ll <- mxAlgebra( expression=MZ.fitfunction + DZ.fitfunction, name="minus2loglikelihood" )
obj <- mxFitFunctionAlgebra( "minus2loglikelihood" )
modelACE <- mxModel(model="ACE", modelMZ, modelDZ, minus2ll, obj )
 
#Run ACE model
fitACE <- mxRun(modelACE, intervals=TRUE)
sumACE <- summary(fitACE)
AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
status code 6
In the first run (with missing data) I got an error code 6 after running the ACE model but not after running the AE model.
In the second time, I got the opposite: an error code 6 after running the AE model but not after running the ACE model. This was the exact error text:

The "6" is an optimizer status code, not an error code, and it throws a warning, not an error. If it actually threw an error, object 'fitACE' in your script would never have been created.

I ran a univariate ACE model with three covariates that are 100% shared between the twins (one continuous and two binary), using paths specification and raw data as an input. I did it twice:
1)one time with all the data, including twin pairs in which the data on the phenotypic observed variable of one of the twins is missing.
2)a second time, only with pairs that have full data.
If I under correctly, openMX runs the analysis with FIML by default, so I didn't enter any command regarding what to do with missing values in my script.

OpenMx does FIML estimation automatically for RAM-type models when the data-type is "raw." Therefore, there's no need to drop incomplete twin pairs from the analysis. In fact, you shouldn't, because that's throwing away available information.

How much missing data are we talking here? Unless it's quite a bit, I doubt it is causing your code 6 (status Red), though it could be a contributing factor. I strongly suspect, though, that you're getting status code 6 because of the two threshold variables in your analysis, especially if the prevalence of either is close to 0.0 or 1.0. It seems that sometimes, status 6 is unavoidable in analyses of threshold variables--that is, in such analyses, you can get status code 6 even when the optimizer has found a local minimum. As I described in a previous post, I'm convinced it's an algorithmic limitation.

So, what to do about it? See here for my general advice concerning status Red. In your case, I'd suggest first replacing mxRun() with mxTryHardOrdinal() in your script, and trying again. Switching optimizers might help; I could say more about that if you'd like.

Another thing you could try is to treat the binary covariates as though they were continuous. That WOULD sidestep the problem of threshold variables, but is probably not a great idea, since the assumption of multivariate normality would obviously be violated, and because a Bernoulli random variable properly has one parameter (a proportion) rather than two (a mean and a variance). The latter issue could be overcome, but not straightforwardly with RAM-path specification.

Your script currently treats the covariates as endogenous random variables and models the joint distribution of phenotypes and covariates, which is theoretically optimal if there are missing values on the covariates, since FIML can simply "work around" the missing values. Alternately, you could instead treat the covariates as exogenous fixed regressors, in which case you'd be modeling the conditional distribution of the phenotypes, given the observed values of the covariates. This would entail using what are called, for historical reasons, "definition variables." For an example, see demo/DefinitionMeans_PathRaw.R in the OpenMx source repository. Note that this is NOT theoretically optimal if there are missing values on covariates, but it doesn't sound like that's a concern for you.

lior abramson's picture
Offline
Joined: 07/21/2017 - 13:13
Thank you so much!

Thank you so much!
I tried the mxTryHardOrdinal instead of the mxRun and it worked! seemed like I got a reasonable solution (not the one I wished for, but one I can trust that actually reflects the data).
I would like to read more about the difference between mxRun and mxTryHardOrdinal, and would appreciate if you say more (or direct me to some good post about it) as you suggested.

Following your response I would like to ask:
Let's say that I don't have missing values on my covariates, and I only want to enter them in the model to make sure that they are not the reason for shared environment effects. is there a theoretical reason to treat them as exogenous fixed regressors instead of endogenous variables? What is better from a theoretical perspective?

Thanks again

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
mxTryHard(); covariates
I would like to read more about the difference between mxRun and mxTryHardOrdinal, and would appreciate if you say more (or direct me to some good post about it) as you suggested.

mxTryHardOrdinal() is a wrapper to mxTryHard(), with default argument values tailored toward analysis of ordinal data; see the help page for mxTryHard(). In brief, mxTryHard() is a function that calls mxRun() multiple times, and randomly perturbs the start values between calls, until an "acceptable" solution is reached or the maximum number of attempts is reached. Note that mxTryHardOrdinal() considers status code 6 to be acceptable, because, as I mentioned, status Red is sometimes just a "fact of life" with ordinal data. But, mxTryHardOrdinal() always exhausts its supply of extra tries (default is 10), so you can be reasonably sure that the solution it finds is at a local minimum, and you can be more sure the more extra tries you use.

Let's say that I don't have missing values on my covariates, and I only want to enter them in the model to make sure that they are not the reason for shared environment effects. is there a theoretical reason to treat them as exogenous fixed regressors instead of endogenous variables? What is better from a theoretical perspective?

In this case, they would equivalently adjust for the covariate, and there's no theoretical advantage for one versus the other (at least, I can't think of one). Depending on what you want to do, though, there may be technical advantages for one over the other. For instance, mxStandardizeRAMpaths() doesn't work very well with definition variables (exogenous regressors).