Attachment | Size |
---|---|
infotest.R | 6.05 KB |
I'm experimenting with OpenMx's capabilities to calculate the information matrix (and standard errors) in the context of IFA models with priors on some item parameters. e.g., see for example, the 3PL model in the documentation: https://openmx.ssri.psu.edu/docs/OpenMx/latest/ItemFactorAnalysis.html#a-3pl-with-bayesian-priors
Say one does the following changes to the example inside mxComputeSequence (e.g., see also end of attached code):
Remove mxComputeNumericDeriv()
Add mxComputeOnce('fitfunction', 'information', "sandwich")
If I do this, I do get standard errors, but no information matrix in the output. Just wondering how the standard errors are calculated in this case and how to obtain the information matrix. I do see there being a note in the example possibly suggesting that the above approach ought not to work or is not recommended in the context of priors, but at this point I'm just experimenting to figure out how things work.
Many thanks!
You'll also need to add mxComputeReportDeriv() to the compute plan to obtain the actual information matrix. It is returned in model$output$hessian
> Just wondering how the standard errors are calculated in this case
The method in this case is Yuan, Cheng, & Patton (2013).
> the above approach ought not to work or is not recommended in the context of priors
Yeah, any approach based on Louis (1982) is not going to take the priors into account. In simulations, I've found that Oakes (1999) is the best method (fastest and most accurate).
All the options are not really well documented yet. If you want to experiment, you might browse through inst/models/enormous/lib/stderrlib.R which I have used to run simulations.
Also, it looks like the example in the manual needs to be updated. Those univariate priors are now packaged as ifaTools::univariatePrior
I see. So, adding arguments like the following to mxComputeEM ought to yield Oakes' approach, even with priors?
information="oakes1999", infoArgs=list(fitfunction=c('itemModel.fitfunction','gaussModel.fitfunction','betaModel.fitfunction'))
I'll have to read Oakes and compare to the other approaches with which I'm more familiar. Might need to do a few simulations to see what (if anything) works well with the item models I'm using.
mxComputeReportDeriv() should also return the gradient?
Thank you again and hope you all enjoy some vacation!
> So, adding arguments like the following to mxComputeEM ought to yield Oakes' approach, even with priors?
Yes, Oakes will take priors into account. You can compare the accuracy to central difference Richardson extrapolation (mxComputeNumericDeriv).
> information="oakes1999", infoArgs=list(fitfunction=c('itemModel.fitfunction','gaussModel.fitfunction','betaModel.fitfunction'))
I think you only need to provide the top-level multigroup fit function, so
infoArgs=list(fitfunction="demo3pl_xpdse.fitfunction")
> mxComputeReportDeriv() should also return the gradient?
Yes
> I'll have to read Oakes and compare to the other approaches with which I'm more familiar.
Oakes is elegant. Take a look at src/Compute.cpp line 2222. It's about 40 lines of code.
mxComputeReportDeriv() for some reason does not return the gradient in the attached example. Wondering if I did something wrong. Really I want the gradient at particular values of some item parameters that may not necessarily be at the MLE, but I'll worry about how to do that later.
If I understand correctly, let’s see if the following about sums up the current methods for computing the "information matrix" as implemented in OpenMx:
Add something like this to mxComputeSequence():
mxComputeOnce('fitfunction', 'information', 'meat')
“meat” = empirical cross-product (observed?)
“bread” = Fisher information matrix (expected?)
“sandwich” = same as the name implies (but, expected or observed?)
“hessian” = M-step
Use of mxComputeNumericDeriv() will do Richardson extrapolation
Add arguments such as the following to mxComputeEM:
information="mr1991",
infoArgs=list(semMethod=”mr”, fitfunction='fitfunction')
For the information argument:
“oakes1999” = Oaks (1999), no need to specify semMethod
“mr1991” = supplemented EM (S-EM)
In combo with “mr1991”, a particular method for S-EM may be specified using semMethod
“mr” = as applied by Cai (2008) to IFA models
“tian” = as specified by Tian, Cai, Thissen, & Xin (2013)
“agile” = your method for S-EM
Hope this helps others and hope I've covered what's available.
Thank you!
> mxComputeReportDeriv() for some reason does not return the gradient in the attached example. Wondering if I did something wrong. Really I want the gradient at particular values of some item parameters that may not necessarily be at the MLE, but I'll worry about how to do that later.
My hunch is that the gradient is not returned because the gradient is only computed in the M-step and not preserved outside of mxComputeEM.
To get the gradient, take a look at inst/models/nightly/ifa-meat-2d.R
> “meat” = empirical cross-product (observed?)
I am confused about the distinction between observed and expected information. But yeah, "meat" is the empirical cross-product. Mislevy (1984) is implemented for the latent distribution.
Methods "bread" and "sandwich" are implemented according to Yuan, Cheng, & Patton (2013).
With respect to the EM related methods, "mr" and "tian" require the convergence history while "oakes1999" and "agile" do not.
You have covered all the options.
> “meat” = empirical cross-product (observed?)
> “bread” = Fisher information matrix (expected?)
> “sandwich” = same as the name implies (but, expected or observed?)
There are two different ways of defining the Fisher information: (1) the variance of the first derivative of the loglikelihood and (2) the expectation of the second derivative of the loglikelihood. The true Fisher information is generally unknown, because it requires knowledge of the true parameter values and the true functional form (i.e., family) of the distribution. If we assume a particular functional form of the distribution, and "plug-in" our MLEs in place of the true parameter values, we can estimate the true Fisher information from data. If the necessary integrals are analytically tractable, then we can estimate the true Fisher information as the expected information matrix. But, sometimes those integrals are not so easy, so we instead use the observed information as an estimator, and calculate it either as a quadratic function of the first derivative loglikelihood (proportional to the "empirical cross-product" or "meat") or as a first-degree function of the second-derivative loglikelihood (proportional to the Hessian matrix, or "bread"). When the distribution is correctly specified, those two ways of calculating the observed information will on average give the same answer in large samples. The "sandwich" is
solve(bread)%*%meat%*%solve(bread)
, and so if the distribution is correctly specified, then asymptotically, the product of the first two factors tends toward an identity matrix, and the sandwich merely tends toward the inverted Hessian. But if the distribution is incorrectly specified, it can be shown that the sandwich nonetheless provides a consistent estimator of the repeated-sampling covariance matrix of the MLEs.Thank you! Do I understand correctly then that all of these approaches (meat, bread, sandwich) are implemented in OpenMx using observed information? The cross-product approach is usually based on observed proportions of the response patterns. What I think is often referred to as Fisher information in the IFA literature is the plug-in method you refer to, where the expected information matrix is calculated, though it isn't really practical for long tests since one must loop over the model-implied proportions of all possible response patterns. I initially thought this was "bread" but it doesn't sound like it now.
The Fisher information method where you need to loop over all possible response patterns is not implemented.