saturated univariate SEM model of a binary variable

Posted on Thu, 04/18/2019 - 10:06

diana Joined: 04/17/2019

Attachment

Size

warning message.png

36.52 KB

Forums

Behavioral Genetics Models

Dear all,
I have some problems of fitting saturated univariate SEM model of a binary variable "whtr_g". I would appreciate it if anyone could help me.
1. I always get warning message when fitting saturated SEM model(see warning message.png), whether using SLSQP, NPSOL or CSOLNP optimizer. What’ more, the result of three optimizers are different. But I don’t know if there are errors in my code. Please help me to check it.
2. The result of mxCheckIdentification() is “model is not locally identifided”, which means the saturated model and estimated parameters are not reliable, right? In this situation, can I compare nested models(such as equating means of twin1 and twin2, or equating means of MZ and DZ in ACE model) against saturated model in order to test whether means/variances/thresholds could be equated between MZ and DZ twins. The code is “mxCompare(SatFit,SubmodelFit)”.
My code are as follows:

selVars - c('whtr_g1','whtr_g2') covVars - c("age1","age2","region_type1","region_type2")


mzData - subset(adipodata, zygosity1==0, c(selVars,covVars))

dzData - subset(adipodata, zygosity1==1, c(selVars,covVars))
mMZ - mean(unlist(mzData[,1:2]),na.rm=TRUE) #start value for mean

mDZ - mean(unlist(dzData[,1:2]),na.rm=TRUE)  #start value for mean

cMZ - cov(mzData[,1:2] , use='complete.obs')    #start value for variance covariance

cDZ - cov(dzData[,1:2] , use='complete.obs')     #start value for variance covariance
nv 	        - 1		# number of variables

ntv    	- nv*2	# number of variables per pair

nth	        - 1		# number of max thresholds

laMeMZT    - c("m1MZT","m2MZT")                    # labels for means for MZT twins

laMeDZT    - c("m1DZT","m2DZT")                    # labels for means for DZT twins

laVaMZT    - c("MZ11","MZ12","MZ12","MZ22")  # labels for (co)variances for MZT twins

laVaDZT    - c("DZ11","DZ12","DZ12","DZ22")  # labels for (co)variances for DZT twins
# tell mx the data is ordinal

mzData$whtr_g1 - mxFactor(mzData$whtr_g1, levels= c(0:nth))

mzData$whtr_g2 - mxFactor(mzData$whtr_g2, levels= c(0:nth))

dzData$whtr_g1 - mxFactor(dzData$whtr_g1, levels= c(0:nth))

dzData$whtr_g2 - mxFactor(dzData$whtr_g2, levels= c(0:nth))
modelMZ- mxModel("MZ",

     # Define definition variables

     mxMatrix( type="Full", nrow=2, ncol=2, free=F, label=c("data.age1","data.region_type1","data.age2","data.region_type2"), name="MZDefVars"),

     # define beta

     mxMatrix( type="Full", nrow=1, ncol=2, free=TRUE, values=0,labels=c('BaTHage','BaTHregion_type'), name="BageTH"),

     # expected mean Matrices

     mxMatrix( type="Full", nrow=1, ncol=ntv, free=T, values=mMZ, labels=laMeMZT, name="meanMZT" ), #two labels#

     mxAlgebra( expression= meanMZT + BageTH%*%MZDefVars, name="expMeanMZT" ), 
     # Define threshold matrices : nth--number of thresholds

     mxMatrix( type="Full", nrow=nth, ncol=ntv, free=T, values=0.22, lbound=-3, ubound=3,labels=c("Tmz1","Tmz2"), name="expThresMZ"),
     # expected variance–covariance Matrices

     mxMatrix( type="Symm", nrow=ntv, ncol=ntv, free=T, values=cMZ,labels=laVaMZT,name="expCovMZ"),
     # Read in the data

     mxData( observed=mzData, type="raw" ),

     mxExpectationNormal( covariance="expCovMZ", means="expMeanMZT", dimnames=selVars, thresholds="expThresMZ"),

     mxFitFunctionML()

)
modelDZ- mxModel("DZ",

     mxMatrix( type="Full", nrow=2, ncol=2, free=F, label=c("data.age1","data.region_type1","data.age2","data.region_type2"), name="DZDefVars"),

     mxMatrix( type="Full", nrow=1, ncol=2, free=TRUE, values=0,labels=c('BaTHage','BaTHregion_type'), name="BageTH"),

     mxMatrix( type="Full", nrow=1, ncol=ntv, free=T, values=mDZ, labels=laMeDZT, name="meanDZT" ), #two labels#

     mxAlgebra( expression= meanDZT + BageTH%*%DZDefVars, name="expMeanDZT" ),

     mxMatrix( type="Full", nrow=nth, ncol=ntv, free=T, values=0.20, lbound=-3, ubound=3,labels=c("Tdz1","Tdz2"), name="expThresDZ"),

     mxMatrix( type="Symm", nrow=ntv, ncol=ntv, free=T, values=cDZ,labels=laVaDZT,name="expCovDZ"),

     mxData( observed=dzData, type="raw"),

     mxExpectationNormal( covariance="expCovDZ", means="expMeanDZT", dimnames=selVars, thresholds="expThresDZ" ),

     mxFitFunctionML()

)
obj		- mxFitFunctionMultigroup(c('MZ.fitfunction','DZ.fitfunction'))

Conf		- mxCI (c ('MZ.expCovMZ[1,1]', 'MZ.expCovMZ[2,1]','DZ.expCovDZ[2,1]', 'MZ.BageTH[1,1]' ))

SatModel - mxModel( "Sat", modelMZ, modelDZ, obj, Conf ) SatFit - mxRun(SatModel, intervals=TRUE)

Many Thanks!!!!

OpenMx version: 2.12.2
R version: R version 3.3.3

Replied on Thu, 04/18/2019 - 10:43

diana Joined: 04/17/2019

how to calculate tetrachoric correlation using saturated model

additional question: how to calculate tetrachoric correlation using saturated model mentioned above?
Thanks a lot!

Replied on Fri, 04/19/2019 - 13:08

AdminRobK Joined: 01/24/2014

identify variance

additional question: how to calculate tetrachoric correlation using saturated model mentioned above?

Having looked at your script again, I see another reason why your model is unidentified. You also need to fix the variance of the (latent liability underlying the) binary trait. Change both model-expected covariance matrices to be "standardized" matrices instead:
mxMatrix( type="Stand", nrow=ntv, ncol=ntv, free=T, values=cMZ,labels=laVaMZT,name="expCovMZ"), #and mxMatrix( type="Stand", nrow=ntv, ncol=ntv, free=T, values=cDZ,labels=laVaDZT,name="expCovDZ"),
The off-diagonal elements of those two matrices will contain the MZ and DZ tetrachoric correlations, respectively.

Replied on Thu, 04/18/2019 - 11:09

AdminRobK Joined: 01/24/2014

model identification

1. I always get warning message when fitting saturated SEM model(see warning message.png)

I've never seen that warning message before. It looks like it's being raised by R, not by any OpenMx code. If it's just telling you that some rownames aren't being preserved, then it's probably benign.

2. The result of mxCheckIdentification() is “model is not locally identifided”

I can tell at a glance that your model is unidentified. You need to fix either the regression intercepts 'meanMZT' and 'meanDZT', or fix the thresholds 'expThresMZ' and 'expThresDZ'. Whichever pair you fix, it's simplest to fix it to zero.

The unidentification could explain why you're getting status RED, and why the optimizers don't agree.

Replied on Sat, 04/20/2019 - 05:52

diana Joined: 04/17/2019

different results yielded by different functions

Thank you so much for spotting that! Now the models become locally identified without warnings.

But I find that when fitting models, different functions will return different results. Because the model contains binary variables, I have tried mxRun(), mxTryHard(), mxTryHard() with argument finetuneGradient=FALSE and mxTryHardOrdinal(). And it seems that when I use mxTryHardOrdinal(),the optimizer can not reach an acceptable solution. Because the function don’t stop untill reach the maximum number of attempts I set. The different rusults yielded by different functions were showed in picture uploaded. In this case, I don’t know which result is reliable and which function I should choose.

In regard with tetrachoric correlations, I’m sorry that I’m not sure if I understand what you mean correctly.
1. My understanding is that type="symm" and type="stand"are both right for model-expected covariance matrices. If I want to get tetrachoric correlations, I can choose type="stand", otherwise the type of expected covariance matrices can also be "symm",is that right?
2. What’s more, if I set type="symm", the lower bound for variances must be defined as lbVas <- matrix(NA,ntv,ntv),diag(lbVas) <- 0.0001, right? Because I find the estimated parameters will changed if I don’t use lbVas.

Many thanks!

File attachments

results.png

Replied on Sat, 04/20/2019 - 12:58

diana Joined: 04/17/2019

and I subsequently found that

and I subsequently found that when the type of expected covariance matrices was "symm", even if the code was totally same, I got different estimated parameters and same AIC, is that rational? (I used mxTryHard() with argument
finetuneGradient=FALSE, extraTries=41, and the function stoped before reach the maximum number of attempts I set.)

Look forward to your replay. Many thanks!

Replied on Sun, 04/21/2019 - 16:18

AdminRobK Joined: 01/24/2014

mxTryHard*, tetrachorics

And it seems that when I use mxTryHardOrdinal(),the optimizer can not reach an acceptable solution. Because the function don’t stop untill reach the maximum number of attempts I set.

By default, `mxTryHardOrdinal()` uses `exhaustive=TRUE`, meaning that it uses all of its extra attempts, and returns the best solution it finds.

The different rusults yielded by different functions were showed in picture uploaded. In this case, I don’t know which result is reliable and which function I should choose.

The default argument values of `mxTryHardOrdinal()` are tailored toward analysis of threshold data, so it's what I'd recommend. By default, it exhausts all of its extra tries, and returns the solution with the smallest fitfunction value, even if it has status Red.

If you want to get the same result from `mxTryHardOrdinal()` whenever you run your script, set the random-number-generator seed with `set.seed()` before the call to `mxTryHardOrdinal()`. Alternately, after you run `mxTryHardOrdinal()`, use the start values it used to find the solution as the actual start values in your script (see `omxSetParameters()`), and replace the call to `mxTryHardOrdinal()` with a call to `mxRun()`.

In regard with tetrachoric correlations, I’m sorry that I’m not sure if I understand what you mean correctly.
1. My understanding is that type="symm" and type="stand"are both right for model-expected covariance matrices. If I want to get tetrachoric correlations, I can choose type="stand", otherwise the type of expected covariance matrices can also be "symm",is that right?

Let me explain something of which you might not be aware. When you analyze a threshold variable in OpenMx, the variable is modeled as a discretized indicator of an underlying, normally-distributed continuum (often called the "liability"). The model-expected mean and variance of the variable are the mean and variance of that underlying normal distribution. If the variable only has two categories (one threshold), then in order to identify the model, its variance must be fixed, and one of either its regression intercept or its threshold must also be fixed. In your script, you're fixing the intercept. But the variance must also be fixed to identify the model. So, if you use `type="Symm"` for the model-expected covariance matrices, the diagonal elements must be fixed to the same value (which is arbitrary, but 1.0 is the simplest choice). If you instead use `type="Stand"`, the diagonal elements will be automatically fixed to 1.0. If the diagonal elements are fixed to 1.0 one way or another, then the off-diagonals will be tetrachoric correlations.

and I subsequently found that when the type of expected covariance matrices was "symm", even if the code was totally same, I got different estimated parameters and same AIC, is that rational?

If the model is unidentified, then there are infinitely many sets of parameter values that will fit the data equally well.

Replied on Mon, 04/22/2019 - 04:08

diana Joined: 04/17/2019

Which is the most saturated model for binary variable

Many thanks for your detailed reply!

If I use type="Symm" for the model-expected covariance matrices and fix the diagonal elements to the same value, then I can get the same result each time I run my script. However, it means that I constrain the variances of twin1 and twin2 to be equal, and the model isn't the most saturated model. In fact, firstly I want to fit the most saturated model, and then fit the constrained saturated models(equate the thresholds and variances asross twin order and zygosity step by step), finally report tetrachoric correlations from the best-fitting saturated model. Therefore, I added these lines to my script and it seemed work.
#in the most saturated model mxMatrix( type="Symm", nrow=ntv, ncol=ntv, free=T, values=cDZ,labels=laVaDZT,lbound=lbVas,name="expCovDZ"), mxAlgebra(expCovDZ[2,1]/sqrt(expCovDZ[1,1]%*%expCovDZ[2,2]),name="CorDZ")


#Sub3Model:  equate var across Twin 1 and Twin 2

Sub3Model<- omxSetParameters(Sub2Model, labels="MZ22",newlabels="MZ11", free=T, values=SatFit$MZ.expCovMZ[1,1]$values)

Sub3Model<- omxSetParameters(Sub3Model, labels="DZ22",newlabels="DZ11", free=T, values=SatFit$DZ.expCovDZ[1,1]$values)

#Sub4Model: equate var across zygosity Sub4Model<- omxSetParameters(Sub3Model, labels="DZ11",newlabels="MZ11", free=T, values=SatFit$MZ.expCovMZ[1,1]$values)

I am not sure if my thought is right ,or for bianry variable, the most saturated model is the model constraining variances across twin order to be equal and then I just need to constrain thresholds.

Thank you very much for your help!

Replied on Mon, 04/22/2019 - 09:44

AdminRobK Joined: 01/24/2014

fix variances for identification

I am not sure if my thought is right ,or for bianry variable, the most saturated model is the model constraining variances across twin order to be equal and then I just need to constrain thresholds.

For a binary variable, the variances (i.e., the diagonal elements) need to be fixed to some value. Otherwise, the model will be unidentified. With only two categories, there's not enough information to freely estimate the variances.

Replied on Mon, 04/22/2019 - 10:05

diana Joined: 04/17/2019

Ok, now I know how to modify

Ok, now I know how to modify my code.

Thank you very much!

News

Recent Posts

saturated univariate SEM model of a binary variable

how to calculate tetrachoric correlation using saturated model

identify variance

model identification

different results yielded by different functions

and I subsequently found that

mxTryHard*, tetrachorics

Which is the most saturated model for binary variable

fix variances for identification

Ok, now I know how to modify

News

Recent Posts