You are here

Standard Errors versus Confidence Intervals

7 posts / 0 new
Last post
rabil's picture
Offline
Joined: 01/14/2010 - 16:47
Standard Errors versus Confidence Intervals
AttachmentSize
Binary Data example.R2.53 KB
Binary Data fake.data_.R7.95 KB

When I run this code (a threshold model for ordinal data), the standard errors are incredibly large but the confidence intervals are relatively narrow. Are any of the results trustworthy? I've tried simplifying the thresholds (using labeling to reduce the number of parameters) but it doesn't help. I'm not sure why this data is so hard to model.

If I treat the ordinal data as if it were quantitative, I have no trouble fitting a common factor model.

I've also noticed that when I used mxGenerateData, the resulting data is noticeably different from the original. The original ordinal data goes from 0 to 9. mxGenerateData changes this to 1 to 10 - which I could live with, but the percentage of 1s is MUCH larger than the percentage of 0s for each of the seven ordinal variables.

rabil's picture
Offline
Joined: 01/14/2010 - 16:47
Here is the output:

> oneFactorFit.npsol.fake <- mxRun(oneFactorThresholdModel,intervals=TRUE)
Running oneFactorThresholdModel with 25 parameters
Warning message:
In model 'oneFactorThresholdModel' Optimizer returned a non-zero status code 6. The model does not satisfy the first-order optimality conditions to the required accuracy, and no improved point for the merit function could be found during the final linesearch (Mx status RED)
> summary(oneFactorFit.npsol.fake)
Summary of oneFactorThresholdModel

The model does not satisfy the first-order optimality conditions to the required accuracy, and no improved point for the merit function could be found during the final linesearch (Mx status RED)

free parameters:
name matrix row col Estimate Std.Error A lbound ubound
1 r1 facLoadings 1 1 0.72393105 43.17916 -0.99 0.99
2 r2 facLoadings 2 1 0.71059798 43.03563 -0.99 0.99
3 r3 facLoadings 3 1 0.69776866 41.61558 -0.99 0.99
4 r4 facLoadings 4 1 0.75890452 49.58525 -0.99 0.99
5 r5 facLoadings 5 1 0.70166711 42.71634 -0.99 0.99
6 r6 facLoadings 6 1 0.78463064 53.51735 -0.99 0.99
7 r7 facLoadings 7 1 0.77898024 52.09968 -0.99 0.99
8 k11 thresholdDeviations 1 v1 0.12058928 24.74546 -Inf
9 k21 thresholdDeviations 2 v1 0.26322885 40.79872 0.01
10 k31 thresholdDeviations 3 v1 0.17018005 47.23743 0.01
11 k41 thresholdDeviations 4 v1 0.22078831 40.55031 0.01
12 k51 thresholdDeviations 5 v1 0.22655569 35.78757 0.01
13 k61 thresholdDeviations 6 v1 0.16270037 37.08574 0.01
14 k71 thresholdDeviations 7 v1 0.09345118 NaN ! 0.01
15 k81 thresholdDeviations 8 v1 0.24889435 26.16626 0.01
16 k91 thresholdDeviations 9 v1 0.16995595 27.66385 0.01
17 k12 thresholdDeviations 1 v2 0.13324363 43.75378 -Inf
18 k22 thresholdDeviations 2 v2 0.19559708 111.85217 0.01
19 k32 thresholdDeviations 3 v2 0.15311304 121.41920 0.01
20 k42 thresholdDeviations 4 v2 0.18729238 106.37887 0.01
21 k52 thresholdDeviations 5 v2 0.19904518 97.32996 0.01
22 k62 thresholdDeviations 6 v2 0.23197829 82.46422 0.01
23 k72 thresholdDeviations 7 v2 0.20277574 78.16480 0.01
24 k82 thresholdDeviations 8 v2 0.14844392 80.63950 0.01
25 k92 thresholdDeviations 9 v2 0.25727615 54.20237 0.01

confidence intervals:
lbound estimate ubound note
r1 0.6211308 0.7239311 0.7389540
r2 0.6946362 0.7105980 0.7913712

observed statistics: 1617
estimated parameters: 25
degrees of freedom: 1592
fit value ( -2lnL units ): 4844.447
number of observations: 231

** Information matrix is not positive definite (not at a candidate optimum).
Be suspicious of these results. At minimum, do not trust the standard errors.

Information Criteria:
| df Penalty | Parameters Penalty | Sample-Size Adjusted
AIC: 1660.447 4894.447 NA
BIC: -3819.882 4980.508 4901.272
To get additional fit indices, see help(mxRefModels)
timestamp: 2016-04-17 18:15:04
Wall clock time (HH:MM:SS.hh): 00:05:20.75
optimizer: NPSOL
OpenMx version number: 2.5.2

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
the curse of dimensionality

As it stands, I think the analysis you're trying to do has too many ordinal levels and/or too many variables for the sample size. Keep in mind that the covariance matrix in this model is a polychoric correlation matrix, the underlying theory of which is more or less inferring something about a latent normally distributed continuum from contingency tables. Note the "zones of rarity" that are apparent in, say, the two-way table for the first two variables of fake.data:

#> table(fake.data[,1],fake.data[,2])
 
      1  2  3  4  5  6  7  8  9 10
  1  89  9  5  6  4  5  1  3  1  3
  2  13  3  3  1  0  2  1  0  0  0
  3   3  1  0  0  2  4  2  1  0  1
  4   6  3  1  2  0  1  1  1  0  2
  5   8  2  0  1  0  1  0  0  2  0
  6   2  0  0  1  1  0  2  1  1  0
  7   1  0  0  1  1  0  0  0  1  0
  8   1  1  0  0  2  1  2  1  0  1
  9   0  0  1  1  0  0  0  1  1  1
  10  3  0  1  0  1  2  3  0  1  0

The abundance of empty cells is only compounded in the seven-variable space where the multivariate normal probability integral is calculated. Basically, I'm saying I think your model is empirically unidentified, but partly due to the algorithm being used to fit it. You could try a different algorithm, such as WLS or IFA, or just treat your variables as continuous rather than ordinal. Another possibility would be to collapse your ordinal categories into a smaller number of categories.

Something else entirely would be to reparameterize your model so that the first two thresholds of each variable are fixed to 0 and 1, respectively, with the latent mean and variance of each variable freely estimated. Then, regularize estimation of the covariance matrix with an inverse-Wishart prior. Your main model would have two submodels: the one with your data that fits your factor analysis, and another that computes the -2 logdensity of the covariance matrix, given the inverse-Wishart hyperparameters you selected for the prior. Then, your main model would combine the fitfunctions of the two submodels via a multigroup fitfunction.

rabil's picture
Offline
Joined: 01/14/2010 - 16:47
Thanks. I tried recoding to a

Thanks. I tried recoding to a smaller number of levels but it makes no difference. Then I tried using WLS, but the mxDataWLS function will not work with my data so I can't try WLS:

> mxDataWLS(
+ data=na.omit(real.data),
+ type="WLS"
+ )
Calculating asymptotic summary statistics ...
Error in checkmvArgs(lower = lower, upper = upper, mean = mean, corr = corr, :
‘corr’ is not a correlation matrix

real.data consists of mxFactor results.

mxDataWLS works with fake.data but won't work with any version of the actual data.

As I mentioned before, treating the data as quantitative easily works and gives me results almost identical to what a lavaan threshold model gives (which I believe lavaan in this case is using WLS). If I try to regress the latent variables on a covariate, then the model will not complete without a warning message and code 6. I've been able to fit the full model in lavaan but would have liked to compare the results with OpenMx.

mhunter's picture
Offline
Joined: 07/31/2009 - 15:26
traceback()

Thanks for trying out the OpenMx WLS. When you get the error

Error in checkmvArgs(lower = lower, upper = upper, mean = mean, corr = corr, :
‘corr’ is not a correlation matrix

what does running traceback() from the R console tell you?

I'm betting the issue has something to do with checking the symmetry of the computed correlation matrix. It finds that some element of the lower triangle isn't exactly equal to the lower triangle: e.g. 0.3948576 is not exactly equal to 0.3948575 so it throws an error. Having a better idea of the sequence of calls that led to this problem will help me fix it.

Sorry that it's a problem in the first place!

tbates's picture
Offline
Joined: 07/31/2009 - 14:25
equivalent not equal?

might need to do something like

 x = cov2cor(x)
    x[lower.tri(x)] <- t(x)[lower.tri(t(x))]
tbates's picture
Offline
Joined: 07/31/2009 - 14:25
bug - report it

Sounds like a bug/limitation of mxDataWLS()

I'd share the data confidentially with the developer-lead on that function, Michael Hunter

Best, t