Attachment | Size |
---|---|
code.txt | 13.86 KB |
Hello, I want to run a G*E model (2 binary variables ) using the script from “JuanJMV”(https://openmx.ssri.psu.edu/node/4324). Unfortunately, there seems to be some errors in my model
I just met the following Warning message when I add one of the Constraint lines
Warning message:
In mxTryHard(model = model, greenOK = greenOK, checkHess = checkHess, :
The model does not satisfy the first-order optimality conditions to the required accuracy, and no improved point for the merit function could be found during the final linesearch (Mx status RED)
Constraint on variance of Binary variables
matUnv <- mxMatrix( type="Unit", nrow=nvo, ncol=1, name="Unv1" )
var_constraint <- mxConstraint( expression=diag2vec(V0)==Unv1, name="Var1" )
or
matUnv <- mxMatrix( type="Unit", nrow=nvo, ncol=1, name="Unv1" )
var_constraint <- mxConstraint(expression=rbind(V0[2,2],V0[1,1]) == Unv1, name="Var1")
Another errors occurred when I remove the Constraint lines and define covE as:
myVarE <- mxMatrix( type="Symm",nrow=nv, ncol=nv, free=c(F,T,F), values=c(1, 0.1, 1),
labels=c("e_1_1", "e_2_1", "e_2_2"), lbound=c(0.0001,rep(NA,2)),name="E0")
** Information matrix is not positive definite (not at a candidate optimum).
Be suspicious of these results. At minimum, do not trust the standard errors.
Any suggestions from you?
Thanks!
Due to limited numerical precision of multidimensional integration (so that you don’t have to wait forever for models to run), this optimizer warning often occurs with ordinal data. The solution is very likely correct - or very close to it. It can be tested empirically by re-running from different starting values - if the lowest -2 ln likelihood is the same and the parameters are the same from several different sets of starting values, the model is likely identified and the solution is likely correct.
I would expect the alternative, to fix residual variances, to work, but perhaps it is because the values chosen do not play well with the estimates in other matrices. Let me have a look at the script...
This line seems to lbound a fixed element, maybe should be lbound=c(NA,.0001,NA) but that would constrain E covariance to be non-negative which may not be desired.
myVarE <- mxMatrix(type="Symm",nrow=nv, ncol=nv, free=c(F,T,F), values=c(1, 0.1, 1),
labels=c("e_1_1", "e_2_1", "e_2_2"), lbound=c(0.0001,rep(NA,2)), name="E0")
thanks!
I tried to change the line from lbound=c(0.0001,rep(NA,2)) to lbound=c(NA,.0001,NA) ,but it didn't seem to improve. the same warning message:
** Information matrix is not positive definite (not at a candidate optimum).
Be suspicious of these results. At minimum, do not trust the standard errors.
It seems indicate that the results are unreliable.
Can i trust the output?
Nothing the OP reports is an error. The two messages are warnings, not errors, and as Professor Neale explained, those warnings are not uncommon with threshold traits. The second one (non-PD information matrix, status 5) is more benign than the first one (first-order optimality conditions not satisfied, status 6).
Edit: had status codes backwards
thanks for your explanations.
Though the warnings are common, it seems that my results are not credible.
Any other solutions?
You ran your model through
mxTryHardOrdinal()
with 30 extra tries. Your point estimates are probably trustworthy, even if their standard errors aren't.The matter of status Red when analyzing threshold traits has been discussed endlessly on these forums. Try searching the forums for more information.
They are probably trustworthy. In reality, that's about as good as it gets with any non-trivial empirical analysis. Proof is a purely mathematical thing.
If you have a large sample, with missingness completely at random or of only a few patterns, weighted least squares (WLS or WLSMV available in OpenMx) may be a better way to go.
I took a closer look at yaning's script. I don't see that it identifies the scale of the liabilities anyplace. The lines that create the MxConstraint are commented out, and fixing the nonshared-environmental variance would preclude modeling its moderation. I also noticed that it's doing the "full bivariate" form of the Purcell continuous moderation model, but both the phenotype of interest and the putative moderator are binary variables, of type "factor". I believe R variables of that type stored as unsigned integers, which would get cast to double in the backend when the MxAlgebras using them are evaluated, but I'm not completely sure.
• Professor Neale:
• I tried to learn “weighted least squares (WLS)”, but it is too hard for me.
I will continue to try and let you know if there is any progress.
Professor Rob:
• I’m sorry that I can't understand your meanings very well.
• Did you mean there's something wrong with the script?
• Could you help improve the script?
Yes, there are several things wrong with the script. The fact that both the trait of interest and the putative moderator are binary variables requires considerations that aren't relevant to the continuous-only case.
The script correctly casts variables "smk_cur1" and "smk_cur2" to ordered factors (through
mxFactor()
). In R, variables of type "factor" are a compound data-type, consisting of a vector of integers (one unique integer value per level), as well as a vector of character strings (one unique string per level). The problem is that R begins enumerating levels of the factor at 1, not 0. So, if you use "smk_cur1" and "smk_cur2" as definition variables in MxAlgebras, they will be involved in calculations as 1s and 2s, not 0s and 1s as you might expect. Since the 0-and-1 numeric coding greatly simplifies interpretation, I suggest adding some lines like the following to the script, prior to the calls tomxFactor()
:Do likewise for the DZ dataset. Then, use those new variables with "_num" appended to their names as definition variables, e.g.
. Once you've created the "_num" variables, you can allow for incomplete twin pairs if you don't filter them out earlier in the script (as is currently done in lines 27 & 28). You can do something like the following,
, and likewise for MZ twin #2 and for the DZ twins.
To treat current smoking status as a moderator is equivalent to regressing "coronary" onto interactions between smoking status and the common and unique A, C, and E variables. As a rule, you also need to model the main effect of all variables appearing in interaction terms. In this case, that means you need to adjust for current smoking in the same way you currently adjust for age and sex.
The biggest problem with the script is that the model is unidentified, because the scale of the liabilities underlying smoking status and coronary aren't identified. As explained in Medland et al. (2009), fixing the liability-scale variance to a constant at some arbitrary value of the moderator is sufficient for model identification in the binary case. Replace lines 150 to 167 in the script with this:
The MxConstraint will fix the liability-scales variances of both binary variables to 1.0 when the moderator is zero. Make sure
myVarA
,myVarC
,myVarE
,myVar
,matUnv
, andvar_constraint
all go into either your MZ or DZ model.One minor thing: the script currently fixes the regression intercept at 1.5. There's nothing wrong with that, but most people fix it to zero to make interpretation easier.
Thanks for your help!
• I’ve changed my model according to what you suggested.
• A few questions:
• First, do I need to drop these lines
• #var1 <- mxAlgebra( A1+C1+E1, name="V1" )
• #var2 <- mxAlgebra( A2+C2+E2, name="V2" )
Second, each model(eg. TACE) needs to run for several hours(more than 10), is this normal?
• I tried the Parallel running (https://openmx.ssri.psu.edu/wiki/speed-parallel-running-and-efficiency). However, It seemed to make no difference.
omxDetectCores() # cores available
getOption('mxOptions')$"Number of Threads" # cores used by OpenMx
mxOption(model= yourModel, key="Number of Threads", value= (omxDetectCores() - 1))
• Any solutions to speed it up?
You didn't seem to be using them anywhere in your script.
I agree, 10 hours does seem like a lot of time. On the other hand, you're exhaustively using 30 extra tries per run. Plus, you're analyzing threshold traits in a sample of over 20,000 twin pairs, right!? Also, judging from the paths in your script, you're doing this on a Windows system, meaning that your installation of OpenMx lacks support for parallel processing (i.e., multithreading; we have never been able to build safe multithreaded Windows binaries of OpenMx). Therefore, nothing you do with the "Number of threads" option will make any difference for you. Do you have access to a system running a Unix-like of any kind (e.g., macOS, Linux/GNU)?
I just add some lines as you suggested.
The model has been running for more than 48 hours and still no results(optimizer:CSOLNP).
And same warnning informations when I changed the optimizer to NPSOL:
Uncertain solution found - consider parameter validity, try again, increase extraTries, change inits, change model, or check data!
Start values from best fit:
0.0190247529530963,-0.0821660228941958,0.255028372658143,0.886622060711073,-0.13136971157167,0.821153090102781,0.46235241827819,-0.445377904683978,0.198580657798816,-0.148076541630411,-0.602239460806354,0.396815292763481,-0.212013136782887,0.938572589184176,-0.198590431577647,0.133914730669409,0.452951777540742,-1.30215329797308,0.235172527991134,-1,3.03674274328704
> summary(TACEmodFit)
Summary of ACEmod
The model does not satisfy the first-order optimality conditions to the required accuracy, and no improved point for the merit function could be found during the final linesearch (Mx status RED)
Your ordinal model may converge if you reduce mvnRelEps
try: model <- mxOption(model, 'mvnRelEps', mxOption(model, 'mvnRelEps')/5)
it seemed no differences.
I changed the optimizer to SLSQP or NPSOL), and the model went smoothly.
However, a new problem appeared:
Retry limit reached; Best fit=71041.971 (started at 722950.7) (31 attempt(s): 31 valid, 0 errors)
Start values from best fit:
0.121845773916744,0.041566660123536,0.172376909791369,0.41528135772296,-0.347957531981112,1.53970276966828,0.245400087963739,-0.439950953301151,0.858792357887105,1.54846266486698,0.107660052363704,0.889290865551656,-0.97998733530009,1.20577696541579,-0.316831629947946,0.0324467144483237,0.968433754799487,-0.849153877011488,1.53512886597937,-0.762951559294409,7.5933437584602
> summary(TACEmodFit)
Summary of ACEmod
The nonlinear constraints and bounds could not be satisfied. The problem may have no feasible solution.
In addition, I want to ask the differences between optimizers.
MxConstraints can really slow down CSOLNP sometimes.
As Professor Neale and I have been trying to tell you: sometimes status Red is unavoidable when you're analyzing threshold traits. That's why
mxTryHardOrdinal()
defaults to exhausting its extra tries: we can be reasonably sure we've found at least a local minimum if we keep the solution with the smallest fitfunction value out of all the attempts, even the fitfunction derivatives at that solution make it not look like a minimum. Have you searched these forums like I suggested? This topic has been discussed numerous times on these forums in the past, e.g. here, here, and here.Huh? To which of the two did you switch?
That's even worse than status Red. Didn't
mxTryHardOrdinal()
alert you to that warning? Anyhow, if you're using NPSOL, you could try doingmxOption(NULL,"Feasibility tolerance",1e-5)
somewhere in your script before you run the model, and see if you get a feasible solution.They are similar in many respects. What in particular would you like to know?
I looked at Yaning's improved script. It still has two outstanding problems.
For one: there's no need to do this,
, if you're later in the script going to do this,
. Also, in line 224, the "_num" variables should be what are used as "definition variables", like this: