You are here

Lower bound confidence interval NaN from time to time

43 posts / 0 new
Last post
kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
Lower bound confidence interval NaN from time to time

Hi,

I was wondering about some lower bound NaN values I've come across (or also when, for example the AE model has C set to 0, but the variance component has a lower bound CI NaN value even though all of them should have a value of 0 (lower bound, estimate, upper bound)). Specifically, I have had NaN values for the A standardized variance component lower bound of the confidence interval from time to time. Does anyone know why this may be? Is there a work-around on how to adjust it? It's for relatively few estimates, but it does come up.

I am using the latest OpenMx version and it seems to be running fine otherwise in general (on a univariate model, etc.).

I know there is also calculating a CI via bootstrapping, but I am not entirely sure how to do that as of yet (but maybe that would lend itself toward resolving this issue).

Definitely appreciate any advice!

AdminJosh's picture
Offline
Joined: 12/12/2012 - 12:15
verbose output?

Have you looked at the summary(..., verbose=TRUE) output?

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
Thanks for the reply! I just

Thanks for the reply! I just included it and it lists the following:

R[write to console]: OpenMx version: 2.19.5 [GIT v2.19.5]
R version: R version 4.1.0 (2021-05-18)
Platform: x86_64-conda-linux-gnu
Default optimizer: SLSQP
NPSOL-enabled?: No
OpenMP-enabled?: Yes

I thought NPSOL was the default optimizer (at some point, though maybe I am mistaken)? Not sure if that may have to do with it. It also gives NaN values for the upper CI bound for the C estimate (when C is fixed to 0) at times as well.

AdminJosh's picture
Offline
Joined: 12/12/2012 - 12:15
verbose CI output

That looks fine. I think SLSQP is used for confidence intervals regardless of which optimizer you select. The output I was after is the detailed CI output. Here's what it outputs by default,

confidence intervals:
                 lbound     estimate     ubound note
common.A[1,1] 0.5566175 6.173024e-01 0.68400870     
common.C[1,1]        NA 2.406416e-13 0.05269798  !!!
common.E[1,1] 0.1537491 1.730463e-01 0.19563705     

and here's the detailed output with summary(model, verbose=TRUE):

CI details:
      parameter  side      value      fit              diagnostic                       statusCode            method         a            c         e     mean
1 common.A[1,1] lower 0.55661745 4071.507                 success                               OK neale-miller-1997 0.7460680 5.399907e-04 0.4244517 21.39288
2 common.A[1,1] upper 0.68400870 4071.519                 success                               OK neale-miller-1997 0.8270482 4.229033e-06 0.4095962 21.39341
3 common.C[1,1] lower 0.00000000 4067.663 alpha level not reached infeasible non-linear constraint neale-miller-1997 0.7856859 0.000000e+00 0.4159883 21.39293
4 common.C[1,1] upper 0.05269798 4071.549                 success infeasible non-linear constraint neale-miller-1997 0.7560895 2.295604e-01 0.4181163 21.39237
5 common.E[1,1] lower 0.15374906 4071.505                 success infeasible non-linear constraint neale-miller-1997 0.7968068 2.489554e-08 0.3921085 21.39306
6 common.E[1,1] upper 0.19563705 4071.512                 success infeasible non-linear constraint neale-miller-1997 0.7729641 9.786281e-08 0.4423088 21.39289
kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
Oh I see, yeah that looks

Oh I see, yeah that looks very in-depth, I didn't print it out, so now I see it. Thanks a lot for the clarification. Here are two outputs (for a C and A lower bound respectively where it's NaN) (hopefully the formatting is okay):

1) (for C lower bound NaN)

confidence intervals:
lbound estimate ubound note
MZ.StdVarComp[1,1] 0.07422233 0.3950067 0.6354808
MZ.StdVarComp[2,1] NA 0.0000000 0.0000000 !!!
MZ.StdVarComp[3,1] 0.36451917 0.6049933 0.9257777

CI details:
parameter side value fit diagnostic
1 MZ.StdVarComp[1,1] lower 0.07422233 -37.95743 success
2 MZ.StdVarComp[1,1] upper 0.63548083 -37.95903 success
3 MZ.StdVarComp[2,1] lower 0.00000000 -41.80563 alpha level not reached
4 MZ.StdVarComp[2,1] upper 0.00000000 -37.93677 success
5 MZ.StdVarComp[3,1] lower 0.36451917 -37.95903 success
6 MZ.StdVarComp[3,1] upper 0.92577767 -37.95743 success
statusCode method a e
1 OK neale-miller-1997 0.05779659 0.2041213
2 OK neale-miller-1997 0.18255742 0.1382638
3 infeasible non-linear constraint neale-miller-1997 0.13495746 0.1670206
4 infeasible non-linear constraint neale-miller-1997 0.17279164 0.1425522
5 infeasible non-linear constraint neale-miller-1997 0.18255742 0.1382638
6 infeasible non-linear constraint neale-miller-1997 0.05779659 0.2041213

2) for the A component case:

confidence intervals:
lbound estimate ubound note
MZ.StdVarComp[1,1] NA 4.752728e-19 0.184599 !!!
MZ.StdVarComp[2,1] 0.0000000 0.000000e+00 0.000000 !!!
MZ.StdVarComp[3,1] 0.8153433 1.000000e+00 1.000000

CI details:
parameter side value fit diagnostic
1 MZ.StdVarComp[1,1] lower 2.842761e-41 -252.6202 alpha level not reached
2 MZ.StdVarComp[1,1] upper 1.845990e-01 -248.7728 success
3 MZ.StdVarComp[2,1] lower 0.000000e+00 -248.7756 success
4 MZ.StdVarComp[2,1] upper 0.000000e+00 -248.7447 success
5 MZ.StdVarComp[3,1] lower 8.153433e-01 -248.7712 success
6 MZ.StdVarComp[3,1] upper 1.000000e+00 -248.8087 success
statusCode method a e
1 infeasible non-linear constraint neale-miller-1997 -5.366488e-22 0.10065144
2 infeasible non-linear constraint neale-miller-1997 -4.407308e-02 0.09262844
3 infeasible non-linear constraint neale-miller-1997 -2.513257e-02 0.10927582
4 infeasible non-linear constraint neale-miller-1997 -2.463694e-02 0.10243448
5 infeasible non-linear constraint neale-miller-1997 -4.407965e-02 0.09262449
6 infeasible non-linear constraint neale-miller-1997 -3.161669e-10 0.09930529

AdminJosh's picture
Offline
Joined: 12/12/2012 - 12:15
CI interpretation

> 1) (for C lower bound NaN)

If the upper bound is zero then you can probably just regard the lower bound as zero. The algorithm is very particular and wants to find the correct amount of misfit, but the model is already backed up into a corner and the optimizer gets stuck.

> 2) for the A component case:

This is similar to the first case. Here, the optimizer got closer the target fit of -248.77, but didn't quite make it because the parameters got cornered again. 2.842761e-41 can be regarded as zero.

It looks like you're using the ACE model that does not allow variance components to go negative. This model is miscalibrated and will result in biased intervals. For better inference, you should use the model that allows the variance components to go negative. You can truncate the intervals at the [0,1] interpretable region for reporting.

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
Got it. That makes a lot of

Got it. That makes a lot of sense. I guess I can go with that in this case. I really appreciate the clarification and suggestion. Is that normal practice to go ahead and just turn NaN values to zero (in that circumstance), etc.? Otherwise, I can just list that as being what was done in this circumstance (unless there are other workarounds).

Do you have any references on setting it so the variance components are allowed to go negative? I am not too familiar with this, and have seen some posts, but am not entirely sure where would be the best place to look.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
reference
Got it. That makes a lot of sense. I guess I can go with that in this case. I really appreciate the clarification and suggestion. Is that normal practice to go ahead and just turn NaN values to zero (in that circumstance), etc.? Otherwise, I can just list that as being what was done in this circumstance (unless there are other workarounds).

If a model fixes the shared-environmental component to zero, then under that model, the lower confidence limit for the shared-environmental component is trivially zero (as is the upper confidence limit).

Do you have any references on setting it so the variance components are allowed to go negative? I am not too familiar with this, and have seen some posts, but am not entirely sure where would be the best place to look.

See here.

AdminNeale's picture
Offline
Joined: 03/01/2013 - 14:09
Verhulst & Neale

Hi

There's a paper in Behavior Genetics about estimating unbounded A C and E variance components, instead of the usual implicitly-bounded path coefficient specification, which constrains variance components to be non-negative. It's here: https://pubmed.ncbi.nlm.nih.gov/30569348/
Please let us know what you think, and if there are any remaining questions we could answer that would help you further.

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
Thank you both (and I

Thank you both (and I apologize for the delay in response). That definitely makes sense. I still need to read the paper more carefully, but I understand the need / benefits for it. Ideally I would like to use it given its benefits, so I am trying to figure out what best way to incorporate it may be. I usually have been using path based implementations of the univariate ACE, and from what I've seen the direct variance approach can either be done in umx or there is a script which is more non-path based (which would require more familiarity, though both do).

As for umx (I am new to this), would it be sufficient to use the umxACEv function, along with the umxConfint function (to get the standardized estimates + standardized CIs)? Or is there some other pointer to look at? That's what I've gathered from one of the tutorials and also the documentation of it.

And if I am to go with this other (direct variance approach) approach, in order to limit the range of the bounds, would I just do that after obtaining the estimates and associated CIs?

And, lastly, it is okay to just approximate the bounds which are very close to 0 but the optimizer fails (like was said above, outside of the triviality of it already being constrained to 0)? So for example the A estimate?

I really appreciate all the great amount of help you all have given!

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
So, I figured out most of

So, I figured out most of these answers. One question that is coming up, is I am using umx (just for convenience at this point), and I run into an issue as in this thread: https://openmx.ssri.psu.edu/node/4593 for at least one of the estimates. Is it possible to adjust either using umxModify or umxACE the lower bound for the C parameter as in that post? Or is that strictly limited to doing it directly without umx (because it cannot be adjusted with umx as that's not a feature)? I know there is a umx_scale() function, which maybe can help (to a degree with this)--on that note, though, does that not affect the ultimate result and bias it in ways?

I appreciate it. If not, I hope there is a way for umx to skip the estimate instead of crashing, but I am not sure that is feasible, also.

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
So it looks like this

So it looks like this crashing issue ends up being an issue a lot when it comes to submodels (AE/CE). For now, since I am not sure how that would be addressed consistently using the direct variance approach (without biasing it in some sense), I will just stick with the older approach that is biased since it doesn't throw any errors. But I would definitely be curious if there is a consistent / fair work-around for the above which still is using the direct variance approach based on any of your advice, since that would ultimately be unbiased relative the older approach which would be ideal.

If anyone has any advice, I appreciate it as always!

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
crashing?

In the thread you reference, the problem is starting values. If you use a large enough error variance then you should be able to get the model to optimize. In contrast to the referenced thread, I would not add lower or upper bounds since the whole point of the variance parameterization is to not have these bounds. When you report the confidence interval, you can apply the bounds. For example, if the A variance is estimated as -.2 to .2 then you can report the interval [0,.2].

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
Hi jpritikin, thank you for

Hi jpritikin, thank you for the response! That is helpful (as to how to apply the bounds after the fact).

Hmm, if I am using umx, I am not sure of how to get large enough error variance. By error, do you mean the environmental variance in this case (or just in general the variance that needs to be large for the MZ/DZ twins)? Or would this be something to explore by multiplying the actual values of the data by 100 or so (and would that be okay to do with no other alterations, or would that cause issues elsewhere if it's not renormalized, so to speak)?

I am not sure if the freeToStart, or value variables of umxModify are relevant in this instance (with the freeToStart parameter, for example, or tryHard which didn't seem to work too well). Also maybe xmuValues or xmu_starts would be relevant in this case?

For reference, here are two errors I am explicitly getting:

Error: The job for model 'AE' exited abnormally with the error message: fit is not finite (The continuous part of the model implied covariance (loc2) is not positive definite in data 'MZ.data' row 20. Detail:
covariance = matrix(c( # 2x2
0.0132678091792278, -0.0151264409931152
, -0.0151264409931152, 0.0132678091792278), byrow=TRUE, nrow=2, ncol=2)
)

and

Error: The job for model 'CE' exited abnormally with the error message: fit is not finite (The continuous part of the model implied covariance (loc2) is not positive definite in data 'DZ.data' row 53. Detail:
covariance = matrix(c( # 2x2
0.142828309693162, 0.18385055831094
, 0.18385055831094, 0.142828309693162), byrow=TRUE, nrow=2, ncol=2)
)

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
I just used the umxModify

I just used the umxModify function to set the starting value for E (just as you would for any fixed parameter, before running, and it seems to work for CE (I will try also for AE). If that sounds right to you, let me know. Otherwise, for now it sounds about right to me. Thanks so much!!

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
starting values

Yeah, that sounds like the correct approach. Based on what you wrote, I'm not sure that you realize that umx models are OpenMx models. If you do umxACEv(..., autoRun=FALSE) then you get the OpenMx model which you can adjust before running.

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
That's actually exactly what

That's actually exactly what I did (so I had two lines one with FALSE and the other with not) :) but I am sure my understanding could always be better (umx is still quite new to me).
One question that came to mind is, does it matter if one identifies the regex parameter for any given component of interest as A_r.c. vs A_r1c1? I saw this post https://openmx.ssri.psu.edu/node/4229 where it's used to drop the entire free parameter set, but I am not sure if it makes any technical difference.

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
regex

Regex matching is just for ease of use. You can always list out all the parameters one-by-one.

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
I see. I was actually

I see. I was actually wondering if there is any difference between the notation for removing all of C (C_r.c.) versus just one part of the matrix (C_r1c1), but I looked into it and I guess that since it is symmetric it is okay (is my guess)?

Also, in the path version of OpenMx the starting values I did by taking, say V = sqrt(phenotypic variance across all twin pairs)/3. Should this equivalently be set for all the parameters A, C and E in umx/is it possible/needed, with the direct variance approach? Right now I have only directly set the E parameter as needed for the subnested models for AE/CE to 0.5 in umxmodify, but am wondering if there is a more systematic approach (i.e. should the 0.5 just be replaced with the V listed above)? Since I know the start value can affect the ultimate CI bounds at the very least.

And, can this be set prior to running ACEv? I am not sure if the xmu_start_value_list() function is relevant.

Finally, is there a way to suppress warnings, or the attempt to print to browser for umx?

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
quick follow-up

My mistake, from here it looks like since it is concerned directly with variance the square root shouldn't be taken, right? https://github.com/tbates/umx/issues/38

But does this mean that it's already included directly in umx internally? If not, is there a rule of thumb then for making E large or small (when it comes to doing the subnested models), or any starting values in general when doing ACEv?

And the warning suppression + browser suppression may still be helpful (since it prints out browser related information even if I have options(Browser = 'NULL').

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
starting values

Regarding starting values, the critical thing is to start with a larger enough error/environmental variance. If your error variance is too small then residuals will get ridiculously large Z scores which can cause optimization failure.

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
Got it, I see. So just

Got it, I see. So just arbitrarily set it high I am guessing (when it comes to say, umxModify since that's where the issue comes up). The only other question I have related to this is can this be done prior to running umxACEv/is it needed, or should it only be done using umxModify() when it comes to the submodels (using regex, etc. for example)?

Thanks so much for the quick responses!

And I will try and submit a bug within the week for sure regarding the output issue.

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
warnings etc

If you can't figure out how to suppress umx output then you might want to file a bug in the umx github.

tbates's picture
Offline
Joined: 07/31/2009 - 14:25
umxModify, umx_set_silent, umx_set_auto_plot, umxSetParameters

> Is there any difference in umxModify between regex = "C_r.c." and update="C_r1c1"

As the matrix is symmetric these are equivalent. It's always easy to check what you've done with
m1$top$C, or parameters(m1) , or, for path-based models, try tmx_show(m1) - it shows all the matrix properties in nice browser tables with roll-overs for properties.

> Do I need to set start values in umx models?

No - umx takes care this for you. But if you want to, you can set them directly. They are just parameters, so just set them: for instance if you wondered about sensitivity to the start value for C, just set the C values quite high to start, e.g. see what the parameters are with parameters(m1), and set with, e.g. umxSetParameters(m1, "C_r1c1", values=1)

> Finally, is there a way to suppress warnings, or the attempt to print to browser for umx?

Yes:

umx_set_silent(TRUE)
umx_set_auto_plot(FALSE)
kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
Thanks so much for the

Thanks so much for the response!

I just figured out about the auto_plot before I saw this, that is exactly what one of them was (and I had umx_set_silent prior too). The only thing that is left at the moment seems to be as a result of the xmu functions (as I found online to match what I am getting). One of them isn't a warning, but the other is. I am wondering if it might be possible to disable these kinds of messages since I am looking into quite a few phenotypes.

The specific popups are from: xmu_show_fit_or_comparison which automatically outputs the log likelihood estimate (this isn't as major but everything adds into computation in terms of print out), and more apparently from
xmu_check_variance:

Polite note: Variance of variable(s) '' and '' is < 0.1.
You might want to express the variable in smaller units, e.g. multiply to use cm instead of metres.
Alternatively umx_scale() for data already in long-format, or umx_scale_wide_twin_data for wide data might be useful.

Given that the phenotypes I am working with are already in their native form, I see this note a lot, and am not sure if it could be suppressed.

And thanks a lot for those clarifications--that all makes plenty of sense.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
I partially disagree
In contrast to the referenced thread, I would not add lower or upper bounds since the whole point of the variance parameterization is to not have these bounds.

I partially disagree, in that it's not a bad idea to use a bound to ensure that the E variance is strictly positive.

For example, if the A variance is estimated as -.2 to .2 then you can report the interval [0,.2].

Won't that cause CIs to have smaller coverage probability than they're supposed to?

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
coverage probability

> > For example, if the A variance is estimated as -.2 to .2 then you can report the interval [0,.2].
>
> Won't that cause CIs to have smaller coverage probability than they're supposed to?

No because variance proportions are proportions. The true values are always between 0 and 1. Or you could regard values outside of 0 and 1 as rejections of the model. For example, if DZ twins are more correlated than MZ twins then there is something else going on besides genetic effects. Hence, it is inappropriate to use the classical ACE model to analyze such data.

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
Lower/Upper bound for E with direct variance approach

Is it normal to get NaN values (instead of 1) for the upper/lower bound of the E model? This comes up frequently, with Code 3. I am not sure if this is the same case as mentioned by AdminRobK where C is trivially 0, etc. I am guessing this is the case, though and can be safely regarded as 1 (upper or lower) with corresponding log likelihood that is produced?

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
Example

As an example:

           lbound estimate ubound lbound Code ubound Code

top.A_std[1,1] NA 0 0 NA 3
top.C_std[1,1] NA 0 0 NA 3
top.E_std[1,1] 1 1 NA 3 NA

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
Phenotypes where E lower bound gets code 3 in alternative models

And, one last question hopefully (outside of the E bound question):
I get very few code 3 NaNs for the lower bound of the E estimate in any model in general (AE, etc.) of the ones I select for. Sometimes this is fixable by a change in the E starting value (a bit higher than what I had already set and not too high in certain cases, though this doesn't always work), and sometimes it is fixable by changing the seed. Are these alterations okay to do in this circumstance, even though it's not necessarily consistent with the rest of what I would be using for the rest of the phenotypes? I definitely appreciate it!

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
CSOLNP optimizer

I also had type 6 error codes in a few cases (absent upper bound of an A estimate, for example), and along with the type 3 error codes, I could fix these by switching the optimizer to CSOLNP. I guess, again, going to the other question, would this be acceptable even if the vast majority of estimates involve SLSQP as the primary optimizer (it's a bit inconsistent across phenotypes)?

The packages are really nice :)

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
confidence intervals
Is it normal to get NaN values (instead of 1) for the upper/lower bound of the E model? This comes up frequently, with Code 3. I am not sure if this is the same case as mentioned by AdminRobK where C is trivially 0, etc. I am guessing this is the case, though and can be safely regarded as 1 (upper or lower) with corresponding log likelihood that is produced?

Yes, in an E-only model, the upper and lower limits of the confidence interval for the standardized E variance component are trivially 1 (because the standardized E component is fixed to 1 under that model).

And, one last question hopefully (outside of the E bound question):
I get very few code 3 NaNs for the lower bound of the E estimate in any model in general (AE, etc.) of the ones I select for. Sometimes this is fixable by a change in the E starting value (a bit higher than what I had already set and not too high in certain cases, though this doesn't always work), and sometimes it is fixable by changing the seed. Are these alterations okay to do in this circumstance, even though it's not necessarily consistent with the rest of what I would be using for the rest of the phenotypes? I definitely appreciate it!

I also had type 6 error codes in a few cases (absent upper bound of an A estimate, for example), and along with the type 3 error codes, I could fix these by switching the optimizer to CSOLNP. I guess, again, going to the other question, would this be acceptable even if the vast majority of estimates involve SLSQP as the primary optimizer (it's a bit inconsistent across phenotypes)?

If you're getting different results for your CIs by changing the start values, the RNG seed, and/or the optimizer, then I'm concerned that you're also getting a different solution in the primary optimization (i.e., to find the MLE). The fact that changing the start values apparently affects your CI results is especially concerning, since every confidence-limit search begins at the MLE and not at the initial start values. Have you checked whether or not you're getting substantially equivalent point estimates, standard errors, and -2logL each time you try? You might want to first run your MxModel with intervals=FALSE to get a good initial solution, and then use omxRunCI() to subsequently get confidence intervals.

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
Thanks a lot for the reply.

Thanks a lot for the reply. Definitely helpful.

From what I can tell, it looks like the point estimates are stable (I will look into this more, though). The - 2logL seems consistent as well. The only thing that seems to change is, for example, when changing the E start value, I will get a different lower bound CI (not upper bound, for example) in the cases in which this did not succeed for say, the AE model. Here is an example where it doesn't succeed for the upper bound of the ACE model (info column is -2logLL top row, and AIC second row). This is with no change in the starting value.

             lbound  estimate    ubound note        info

top.A_std[1,1] -1.062900 -0.245001 NaN !!! 139.933203
top.C_std[1,1] -0.182275 0.492349 1.059306 -271.866406
top.E_std[1,1] 0.489673 0.752652 1.078756 0.000000

This is the case when I change the optimizer to CSOLNP

              lbound  estimate    ubound note        info

top.A_std[1,1] -1.064765 -0.245001 0.571626 139.933203
top.C_std[1,1] -0.186698 0.492349 1.060668 -271.866406
top.E_std[1,1] 0.489466 0.752652 1.078679 0.000000

When I look into the AE model with the error and change the starting value for E (from 0.5 to 0.8) for an estimate I get this:

              lbound  estimate    ubound note       info

top.A_std[1,1] 0.179462 0.430153 0.625025 36.693141
top.C_std[1,1] 0.000000 0.000000 0.000000 !!! -67.386281
top.E_std[1,1] NaN 0.569847 0.820538 !!! 0.000000

to this:
lbound estimate ubound note info
top.A_std[1,1] 0.179462 0.430153 0.625025 36.693141
top.C_std[1,1] 0.000000 0.000000 0.000000 !!! -67.386281
top.E_std[1,1] 0.393639 0.569847 0.820538 0.000000

and if I change the above (same AE model case) to the CSOLNP optimizer (instead of the starting value to 0.8, so keep that at 0.5), I get:

              lbound  estimate    ubound note       info

top.A_std[1,1] 0.178853 0.430153 0.624640 36.693141
top.C_std[1,1] 0.000000 0.000000 0.000000 !!! -67.386281
top.E_std[1,1] 0.375360 0.569847 0.821147 0.000000

Curious about your insight on this.

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
top row - log likelihood

My mistake, small correction: top row/last column is just the log likelihood.

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
umx equivalent

Outside of the rest of the examples I posted, etc. is there an equivalent option for what you are suggesting specifically in umx? All I know of is the addCI parameter when running the ACE model / variants, so I am not sure how that method translates from OpenMx to the umx interface, etc. So far I haven't changed anything in that respect. The optimizer seems to fail for very few cases, specifically with respect to the Cis (usually it's just one bound of one parameter estimate that it has difficulty with). But out of the phenotypes I am checking, it's relatively few where this occurs (though ideally there would be none).

Thanks a lot for all of the help--I really appreciate it.

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
Another phenotype example if perturbed

For another estimate (which didn't have any errors), these are the following differences it has if I change the optimizer/starting E value/seed (below with a hashtag):

              lbound  estimate    ubound note       info

top.A_std[1,1] 0.063975 0.342624 0.565213 50.340513
top.C_std[1,1] 0.000000 0.000000 0.000000 !!! -94.681026
top.E_std[1,1] 0.434787 0.657376 0.936025 0.000000

E_start = 0.5, CSOLNP

              lbound  estimate    ubound note       info

top.A_std[1,1] 0.063876 0.342624 0.565174 50.340513
top.C_std[1,1] 0.000000 0.000000 0.000000 !!! -94.681026
top.E_std[1,1] 0.434826 0.657376 0.936090 0.000000

E_start = 0.5, default optimizer

              lbound  estimate    ubound note       info

top.A_std[1,1] 0.063897 0.342624 0.565175 50.340513
top.C_std[1,1] 0.000000 0.000000 0.000000 !!! -94.681026
top.E_std[1,1] 0.434825 0.657376 0.936082 0.000000

E_start = 0.8, default optimizer

              lbound  estimate    ubound note       info

top.A_std[1,1] 0.063876 0.342624 0.565174 50.340513
top.C_std[1,1] 0.000000 0.000000 0.000000 !!! -94.681026
top.E_std[1,1] 0.434826 0.657376 0.936090 0.000000

E_Start = 0.5, default optimizer, different seed

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
More comprehensive output

Here is a more comprehensive version (with standard errors). It does look like it's relatively pretty consistent even with SEs (if I am not mistaken).

Here is an example of an estimate where there is a CI bound NaN error.

Estimate with error (NaN lowerbound, AE).

seed = first, optimizer = SLSQP, E = 0.5

free parameters:
name matrix row col Estimate Std.Error A
1 expMean_var1 top.expMean means var1 0.60409619 0.018456489
2 A_r1c1 top.A 1 1 0.01609547 0.005227711
3 E_r1c1 top.E 1 1 0.02132252 0.004326436
lbound estimate ubound note info
top.A_std[1,1] 0.179462 0.430153 0.625025 36.693141
top.C_std[1,1] 0.000000 0.000000 0.000000 !!! -67.386281
top.E_std[1,1] NaN 0.569847 0.820538 !!! 0.000000

seed = first, optimizer = SLSQP, E = 0.8

       name      matrix   row   col   Estimate   Std.Error A

1 expMean_var1 top.expMean means var1 0.60409620 0.018456490
2 A_r1c1 top.A 1 1 0.01609547 0.005227712
3 E_r1c1 top.E 1 1 0.02132252 0.004326436
lbound estimate ubound note info
top.A_std[1,1] 0.179462 0.430153 0.625025 36.693141
top.C_std[1,1] 0.000000 0.000000 0.000000 !!! -67.386281
top.E_std[1,1] 0.393639 0.569847 0.820538 0.000000

seed = second, optimizer = SLSQP, E = 0.5

free parameters:
name matrix row col Estimate Std.Error A
1 expMean_var1 top.expMean means var1 0.60409619 0.018456489
2 A_r1c1 top.A 1 1 0.01609547 0.005227711
3 E_r1c1 top.E 1 1 0.02132252 0.004326436
lbound estimate ubound note info
top.A_std[1,1] 0.179462 0.430153 0.625016 36.693141
top.C_std[1,1] 0.000000 0.000000 0.000000 !!! -67.386281
top.E_std[1,1] 0.375239 0.569847 0.820538 0.000000

seed = first, optimizer = CSOLNP, E = 0.5

free parameters:
name matrix row col Estimate Std.Error A
1 expMean_var1 top.expMean means var1 0.60409614 0.018456474
2 A_r1c1 top.A 1 1 0.01609544 0.005227694
3 E_r1c1 top.E 1 1 0.02132248 0.004326420

              lbound  estimate    ubound note       info

top.A_std[1,1] 0.179462 0.430153 0.624903 36.693141
top.C_std[1,1] 0.000000 0.000000 0.000000 !!! -67.386281
top.E_std[1,1] 0.375201 0.569847 0.820538 0.000000

Here is an example of the working case:

Working estimate (AE).

seed = first, optimizer = SLSQP, E = 0.5

ree parameters:
name matrix row col Estimate Std.Error A
1 expMean_var1 top.expMean means var1 0.42486799 0.016192053
2 A_r1c1 top.A 1 1 0.01035623 0.004402426
3 E_r1c1 top.E 1 1 0.01986998 0.004057314

              lbound  estimate    ubound note       info

top.A_std[1,1] 0.063876 0.342624 0.565174 50.340513
top.C_std[1,1] 0.000000 0.000000 0.000000 !!! -94.681026
top.E_std[1,1] 0.434826 0.657376 0.936090 0.000000

seed = first, optimizer = SLSQP, E = 0.8

free parameters:
name matrix row col Estimate Std.Error A
1 expMean_var1 top.expMean means var1 0.42486797 0.016192049
2 A_r1c1 top.A 1 1 0.01035622 0.004402423
3 E_r1c1 top.E 1 1 0.01986997 0.004057314

              lbound  estimate    ubound note       info

top.A_std[1,1] 0.063897 0.342624 0.565175 50.340513
top.C_std[1,1] 0.000000 0.000000 0.000000 !!! -94.681026
top.E_std[1,1] 0.434825 0.657376 0.936082 0.000000

seed = second, optimizer = SLSQP, E = 0.5

       name      matrix   row   col   Estimate   Std.Error A

1 expMean_var1 top.expMean means var1 0.42486799 0.016192053
2 A_r1c1 top.A 1 1 0.01035623 0.004402426
3 E_r1c1 top.E 1 1 0.01986998 0.004057314
lbound estimate ubound note info
top.A_std[1,1] 0.063876 0.342624 0.565174 50.340513
top.C_std[1,1] 0.000000 0.000000 0.000000 !!! -94.681026
top.E_std[1,1] 0.434826 0.657376 0.936090 0.000000

seed = first, optimizer = CSOLNP, E = 0.5

       name      matrix   row   col   Estimate   Std.Error A

1 expMean_var1 top.expMean means var1 0.42486797 0.016192034
2 A_r1c1 top.A 1 1 0.01035620 0.004402407
3 E_r1c1 top.E 1 1 0.01986994 0.004057299
lbound estimate ubound note info
top.A_std[1,1] 0.063819 0.342624 0.565174 50.340513
top.C_std[1,1] 0.000000 0.000000 0.000000 !!! -94.681026
top.E_std[1,1] 0.434826 0.657376 0.936118 0.000000

Based on this, should there be any concerns as to switching the optimizer, or maybe an idea as to what really is going on in the few cases in which one of the CI bounds are not calculated?

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
One miscellaneous hint I

One miscellaneous hint I forgot to mention yesterday: if you're using SLSQP, try setting mxOption "Feasibility tolerance" to a smaller value than its on-load default (say, 1e-3 or 1e-4). That might help, at least as of OpenMx v2.19. What is your mxVersion() output, come to think of it? Edit: Never mind. I just saw your mxVersion() output upthread.

Based on this, should there be any concerns as to switching the optimizer, or maybe an idea as to what really is going on in the few cases in which one of the CI bounds are not calculated?

I do not think there are any concerns relating to switching the optimizer or changing the RNG seed. Neither of those things is considerably changing the point estimates or standard errors, right? But, from what you've included in your posts, I can't really tell what's going on when a confidence limit is reported as NaN. I would need to see at least the first seven columns of the 'CI details' table (which prints when you use summary() with argument verbose=TRUE). I would also need the -2logL at the MLE (which you seem to have intended to include in your posts?). The information in your posts is not very easy to read, either. The tables would be easier to read if they displayed in a fixed-width font, which can be done with Markdown or with HTML tags.

Outside of the rest of the examples I posted, etc. is there an equivalent option for what you are suggesting specifically in umx?

I don't know. Sorry.

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
That sounds promising, and it

That sounds promising, and it does work with the alternate optimizer. I have tried changing the tolerance using mxOptions() but it doesn't look like that makes a difference (setting it as such mxOption(key="Feasibility tolerance", value=1e-3)), unless umx might be overriding this, etc (which I wouldn't think it would, but I don't know).

I've never really typed in html, takes more time, but yeah I agree it looks much nicer (and I personally also didn't like how it was saving prior).

confidence intervals:

lbound estimate ubound note
top.A_std[1,1] 0.1794616 0.4301532 0.6244231
top.C_std[1,1] 0.0000000 0.0000000 0.0000000 !!!
top.E_std[1,1] NA 0.5698468 0.8205384 !!!

CI Details:

parameter side value fit diagnostic
1 top.A_std[1,1] lower 0.1794616 -69.54434 success
2 top.A_std[1,1] upper 0.6244231 -69.54440 success
1 top.C_std[1,1] lower 0.0000000 -69.54471 success
1 top.C_std[1,1] upper 0.0000000 -69.54464 success
1 top.E_std[1,1] lower 0.4012878 -69.82063 alpha level not reached
1 top.E_std[1,1] upper 0.8205384 -69.54434 success
StatusCode method expMean_var1 A_r1c1 E_r1c1
1 OK neale-miller-1997 0.6039050 0.006549955 0.02994786
2 OK neale-miller-1997 0.6042243 0.026113065 0.01570644
3 OK neale-miller-1997 0.6330415 0.010966001 0.02533962
4 OK neale-miller-1997 0.6384895 0.018864979 0.02289753
5 infeasible non-linear constraint neale-miller-1997 0.6040962 0.022071510 0.01479346
6 infeasible non-linear constraint neale-miller-1997 0.6039050 0.006549955 0.02994786

This is when it fails.
Also there is this information:
Model Statistics:

Parameters Degrees of Freedom Fit (-2lnL units)
3 141 -73.38628

And, outside of that, it looks like the std errors are NA under the "free parameters" column, specifically when I run summary(verbose = TRUE) on the umxConfint result (which is what gives the results/CIs above). But when I run summary(verbose=TRUE) on just the model prior to umxConfint, the SEs (albeit no CIs) are stable (and can be referred to in the other post, etc.).

And no worries--you've helped me a lot! Hopefully this last bit will give a bit of closure. I will say switching the optimizer fixes the issue though for those few estimates this occurs.

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
Optimizer for CI vs optimizer for the model itself including CI

It looks like I can change the optimizer specifically for the CI or also in general for the model estimate / calculation itself, and maybe these two would give slightly different output. Is there a preference of doing either, or is there a benefit, etc. since I would only be doing this for a few phenotypes out of the majority, anyway?

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
confidence intervals
I've never really typed in html, takes more time, but yeah I agree it looks much nicer (and I personally also didn't like how it was saving prior).

OK, now I can see what's wrong with the lower confidence limit for E. The optimizer was unable to adequately worsen the -2logL. For a 95% CI, the target worsening of fit at both limits is about 3.841. For E's lower limit, the worsening was only by about 3.566, which isn't too far off.

It looks like I can change the optimizer specifically for the CI or also in general for the model estimate / calculation itself, and maybe these two would give slightly different output. Is there a preference of doing either, or is there a benefit, etc. since I would only be doing this for a few phenotypes out of the majority, anyway?

It's basically impossible to give a general recommendation about that. Do whatever seems to work best for you.

kiacan's picture
Offline
Joined: 11/19/2018 - 19:32
Got it--that all makes sense.

Got it--that all makes sense. Thanks for all of the help!!

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
Glad to be of help.

Glad to be of help.