You are here

Standard Error estimate

11 posts / 0 new
Last post
Math Wolf's picture
Offline
Joined: 10/23/2009 - 08:28
Standard Error estimate

I'm have been fitting cross-lagged models in OpenMx, but I have some questions on the standard errors.

The same models can be fitted with other software, for example SAS and I do get, for exactly the same model the same estimates but different standard errors. To test this, I removed all missing values (as SAS and OpenMx handle them differently) and built 3 different models. As an illustration, I'll give the simplest model here.

  • There are 124 subjects, each measured at 7 days.
  • For each subject at each day, there are two variables measure, let's call them X and Y here.
  • This gives a total of 1736 observations (as OpenMx correctly shows in the output)
  • In SAS, the variables are recoded to get predictors and responses so that there are 1488 observed responses (days 2-7) and twice 1488 predictors (days 1-6 autoregressive & days 1-6 lagged)

The model in both cases is:
* Y_{i} = Y_{i-1} + X_{i-1}
* X_{i} = X_{i-1} + Y_{i-1}
for i = 2,...,7
and with residual covariance for each variable and between X_{i} and Y_{i}.

The summarized results for Open Mx and SAS are the following (error between brackets)

Open Mx

  • Effects
    AR X 0.622942286 (0.02300498)
    AR Y 0.787465914 (0.01841442)
    X lag Y -0.105527933 (0.02479874)
    Y lag X -0.029917726 (0.01773734)
  • Covariance structure
    Res X 0.556322176 (0.02229294)
    Res Y 0.348904648 (0.01369003)
    Res XY -0.152949023 (0.01257696)

SAS

  • Effects
    AR X 0.6232 (0.03167)
    AR Y 0.7874 (0.02552)
    X lag Y -0.1054 (0.03223)
    Y lag X -0.03000 (0.02508)
  • Covariance structure
    Res X 0.5579 (0.02896)
    Res Y 0.3499 (0.01816)
    Res XY -0.1534 (0.01717)

As one can see, the estimates are very similar, but the errors do differ.

A linear regression on these errors and the errors of a few similar models showed that on average the errors of Open Mx are linearly related to those of SAS, more exactly they are about 0.837 times the errors of SAS (with a small intercept) or 0.788 (without intercept).

Consequently, I have two questions:
* How are standard errors calculated in OpenMx?
* Can anyone explain the difference in this case? Does it have to do with the degrees of freedom only or may there be other reasons? If it were only the degrees of freedom, the difference should be smaller, I assume.

carey's picture
Offline
Joined: 10/19/2009 - 15:38
Bet that if you multiply the

Bet that if you multiply the standard errors from OpenMx by sqrt(2), you will get the standard errors from SAS. The ones from OpenMx (unless they have been changed recently) are incorrect.

Google "SAS Hessian standard errors" and you should get a support page for SAS PROC TCALIS that gives the equations.

greg

neale's picture
Offline
Joined: 07/31/2009 - 15:14
I agree that the formulae

I agree that the formulae differ in the two packages, and that they are in the ratio sqrt(2). The SAS documentation at http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/statug_tcalis_sect072.htm is dogmatic and lacks citations or proofs. I am not saying it is wrong; just that I have not been able to verify it. If it is correct, why is it that OpenMx standard errors agree well with bootstrap standard errors, whereas those provided by TCALIS's would agree less well? In the thread http://openmx.psyc.virginia.edu/issue/2009/08/error-estimate-summary-wrong-so-hessian

there were hessian-based SE's and bootstrap-based SE's. Presumably, multiplying the hessian-based estimates by sqrt(2) should provide estimates which agree more closely with the bootstrap estimates, but the reverse seems to be the case:

> hessians
[1] 0.010674776 0.012604163 0.014016030 0.016688139 0.018605535 0.001966411 0.001975071 0.002205340
[9] 0.002347757 0.002583113
> bootstraps
[1] 0.008744156 0.010768885 0.012274162 0.013864563 0.014175315 0.001660683 0.001440713 0.001870993
[9] 0.002094885 0.001910026
> hessians-bootstraps
[1] 0.001930620 0.001835278 0.001741868 0.002823576 0.004430220 0.000305728 0.000534358 0.000334347
[9] 0.000252872 0.000673087
> hessianssqrt(2)-bootstraps
[1] 0.006352257 0.007056093 0.007547498 0.009736030 0.012136885 0.001120242 0.001352459 0.001247829
[9] 0.001225345 0.001743047
> (hessians-bootstraps)-(hessians
sqrt(2)-bootstraps)
[1] -0.0044216370 -0.0052208153 -0.0058056297 -0.0069124535 -0.0077066649 -0.0008145141 -0.0008181012
[8] -0.0009134817 -0.0009724728 -0.0010699604

The last of these shows that the adjusted hessians are uniformly further away from the bootstrap estimates than are the unadjusted hessians. Possibly, this is an accident due to Hessians not being computed accurately during optimization, per Greg's comments in this thread http://openmx.psyc.virginia.edu/issue/2009/12/alternative-estimate-hessian But perhaps the sqrt(2) & TCALIS docs are wrong?

carey's picture
Offline
Joined: 10/19/2009 - 15:38
running to class, so no time

running to class, so no time to comment. check the attached simulation routine.

Paras's picture
Offline
Joined: 07/31/2009 - 15:30
Could you please provide SAS

Could you please provide SAS (CALIS?) and OpenMx code for fitting the model. I suspect the two approaches may be making different assumptions regarding joint distributions of dependent variables.
Are the reported degrees of freedom the same??

Paras

carey's picture
Offline
Joined: 10/19/2009 - 15:38
The attached code will do the

The attached code will do the same simulation in OpenMx as the SAS code sent earlier.
BTW, you need SAS 9.2 to run the SAS code.
greg

neale's picture
Offline
Joined: 07/31/2009 - 15:14
Ok: Parameter

Ok:

Parameter mean obsStDev meanHessEst sqrt2meanHessEst
1 lambda1 0.8037281654 0.09509229 0.06646744 0.09399915
2 lambda2 0.4990249335 0.06789536 0.04784936 0.06766921
3 lambda3 0.7012477844 0.08523608 0.05979934 0.08456904
4 lambda4 -0.0004040130 0.05937003 0.04163092 0.05887501
5 specifics1 0.9861626248 0.14562295 0.10144265 0.14346157
6 specifics2 0.9962583208 0.08001066 0.05647445 0.07986693
7 specifics3 0.9921679329 0.11666915 0.08199077 0.11595246
8 specifics4 0.9963982839 0.06295382 0.04485073 0.06342851

I stand convinced. Let's change the Standard Error formula so that it's multiplied by sqrt(2). Line 177 of MxSummary.R could be changed to directly change the Std.Error column of the summary. However, since users can access either that column or the Hessian itself, we could multiply the elements of the Hessian by .5, so that when it's inverted we get [recalling that inv(kA) = inv(k)inv(A) for non-zero scalar k] 2Hessian and then sqrt of the diagonal elements would give the sqrt(2)diag(Hessian).

carey's picture
Offline
Joined: 10/19/2009 - 15:38
All, Pardon my fractionated

All,
Pardon my fractionated posts. Have had little concentrated time.

Here is what I suspect is going on with the standard errors--in some cases (? all) OpenMx minimizes -2log(L) and not -log(L). To examine the consequences, let F1 = -log(L) and F2 = -2log(L). Let d1 be the vector of first derivatives for F1, d1 = dF1 / d(vector x) where x contains the parameters. Then d2 = dF2/ dx = 2dF1/dx = 2d1. This, however, makes little difference to the solution. The optimizer is trying to find a solution in which d1 (or d2) is close to 0, so using F1 and F2 should arrive at the same parameter estimates. The gradient of F2, however, should be twice the gradient of F1.

Let H1 and H2 denote the matrix of second partial derivatives for F1 and F2 respectively. If H1 = dF1/(dxdx'), then H2 = 2H1. Taking the inverses to get the information matrices I1 and I2 shows that I2 = .5I. Taking the square root of the diagonals gives the standard errors for the F2 as sqrt(2)the standard errors of F1.

I have known about this for a while and have been meaning to get into the raw code to examine the function values that OpenMx uses. Have not had time to due so.

If this is correct, I suggest changing the automatic functions to -log(L) instead of multiplying the standard errors by sqrt(2). You may also have to change some of the default parameters (probably the default tolerances) for NSPOL. I am guessing that this will lead to faster and more robust optimization.
best,
greg

neale's picture
Offline
Joined: 07/31/2009 - 15:14
Right, this may be a better

Right, this may be a better idea! So changes would be to the fit function being minimized rather than to the summarizing of the Hessian. More changes in more places though. I am reminded of a parameter "UP" in the CERN routine MINUIT, used to effect the same thing.

Math Wolf's picture
Offline
Joined: 10/23/2009 - 08:28
Thanks a lot for the replies,

Thanks a lot for the replies, they were really helpful.

I must remark that the results posted come from SAS proc mixed, not SAS proc calis, although the results of SAS proc calis are similar.

The differences between the two approaches are (as can also be seen in my first post) not exactly square root 2, rather somewhere in between 1 and square root 2, but closer to the latter. Therefore multiplying them with square root 2 may inflate them slightly too much (at least for my data it seems).

For those who wonder: the reason why I prefer SAS proc mixed here is because on the complete data set (with missing values), the structure of the data allows for pairwise deletion if SAS proc mixed is used, which is seemingly similar to what OpenMx does. Proc Calis on the other hand has (as far as I know) only listwise deletion.

neale's picture
Offline
Joined: 07/31/2009 - 15:14
There are likely some

There are likely some inaccuracies in the standard errors being reported by OpenMx versions up to at least 1070. These estimated SE's are derived from a covariance matrix of parameter estimates built up on the fly during optimization. Quite soon, OpenMx will be implementing a more precise routine to obtain better estimates of the parameter variances, per this thread: http://openmx.psyc.virginia.edu/issue/2009/12/alternative-estimate-hessian

Second, just to clear up a possible confusion, OpenMx's FIML functions for continuous or ordinal data never use pairwise deletion, nor listwise. The likelihood is computed based on all the non-missing observations. It's an approach which can be computationally burdensome, but which has many desirable features - all those of maximum likelihood estimates for a start.