You are here

information matrix with IFA models

10 posts / 0 new
Last post
falkcarl's picture
Offline
Joined: 10/29/2015 - 17:22
information matrix with IFA models
AttachmentSize
Binary Data infotest.R6.05 KB

I'm experimenting with OpenMx's capabilities to calculate the information matrix (and standard errors) in the context of IFA models with priors on some item parameters. e.g., see for example, the 3PL model in the documentation: https://openmx.ssri.psu.edu/docs/OpenMx/latest/ItemFactorAnalysis.html#a-3pl-with-bayesian-priors

Say one does the following changes to the example inside mxComputeSequence (e.g., see also end of attached code):
Remove mxComputeNumericDeriv()
Add mxComputeOnce('fitfunction', 'information', "sandwich")

If I do this, I do get standard errors, but no information matrix in the output. Just wondering how the standard errors are calculated in this case and how to obtain the information matrix. I do see there being a note in the example possibly suggesting that the above approach ought not to work or is not recommended in the context of priors, but at this point I'm just experimenting to figure out how things work.

Many thanks!

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
info matrix

You'll also need to add mxComputeReportDeriv() to the compute plan to obtain the actual information matrix. It is returned in model$output$hessian

> Just wondering how the standard errors are calculated in this case

The method in this case is Yuan, Cheng, & Patton (2013).

> the above approach ought not to work or is not recommended in the context of priors

Yeah, any approach based on Louis (1982) is not going to take the priors into account. In simulations, I've found that Oakes (1999) is the best method (fastest and most accurate).

All the options are not really well documented yet. If you want to experiment, you might browse through inst/models/enormous/lib/stderrlib.R which I have used to run simulations.

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
ifaTools

Also, it looks like the example in the manual needs to be updated. Those univariate priors are now packaged as ifaTools::univariatePrior

falkcarl's picture
Offline
Joined: 10/29/2015 - 17:22
I see. So, adding arguments

I see. So, adding arguments like the following to mxComputeEM ought to yield Oakes' approach, even with priors?

information="oakes1999", infoArgs=list(fitfunction=c('itemModel.fitfunction','gaussModel.fitfunction','betaModel.fitfunction'))

I'll have to read Oakes and compare to the other approaches with which I'm more familiar. Might need to do a few simulations to see what (if anything) works well with the item models I'm using.

mxComputeReportDeriv() should also return the gradient?

Thank you again and hope you all enjoy some vacation!

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
oakes

> So, adding arguments like the following to mxComputeEM ought to yield Oakes' approach, even with priors?

Yes, Oakes will take priors into account. You can compare the accuracy to central difference Richardson extrapolation (mxComputeNumericDeriv).

> information="oakes1999", infoArgs=list(fitfunction=c('itemModel.fitfunction','gaussModel.fitfunction','betaModel.fitfunction'))

I think you only need to provide the top-level multigroup fit function, so

infoArgs=list(fitfunction="demo3pl_xpdse.fitfunction")

> mxComputeReportDeriv() should also return the gradient?

Yes

> I'll have to read Oakes and compare to the other approaches with which I'm more familiar.

Oakes is elegant. Take a look at src/Compute.cpp line 2222. It's about 40 lines of code.

falkcarl's picture
Offline
Joined: 10/29/2015 - 17:22
mxComputeReportDeriv() for

mxComputeReportDeriv() for some reason does not return the gradient in the attached example. Wondering if I did something wrong. Really I want the gradient at particular values of some item parameters that may not necessarily be at the MLE, but I'll worry about how to do that later.

If I understand correctly, let’s see if the following about sums up the current methods for computing the "information matrix" as implemented in OpenMx:

  1. Ways computing information once at the end

Add something like this to mxComputeSequence():
mxComputeOnce('fitfunction', 'information', 'meat')

“meat” = empirical cross-product (observed?)
“bread” = Fisher information matrix (expected?)
“sandwich” = same as the name implies (but, expected or observed?)
“hessian” = M-step

Use of mxComputeNumericDeriv() will do Richardson extrapolation

  1. Ways requiring keeping track of things or doing stuff across EM cycles.

Add arguments such as the following to mxComputeEM:
information="mr1991",
infoArgs=list(semMethod=”mr”, fitfunction='fitfunction')

For the information argument:
“oakes1999” = Oaks (1999), no need to specify semMethod
“mr1991” = supplemented EM (S-EM)

In combo with “mr1991”, a particular method for S-EM may be specified using semMethod
“mr” = as applied by Cai (2008) to IFA models
“tian” = as specified by Tian, Cai, Thissen, & Xin (2013)
“agile” = your method for S-EM

Hope this helps others and hope I've covered what's available.
Thank you!

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
derivs and information

> mxComputeReportDeriv() for some reason does not return the gradient in the attached example. Wondering if I did something wrong. Really I want the gradient at particular values of some item parameters that may not necessarily be at the MLE, but I'll worry about how to do that later.

My hunch is that the gradient is not returned because the gradient is only computed in the M-step and not preserved outside of mxComputeEM.

To get the gradient, take a look at inst/models/nightly/ifa-meat-2d.R

> “meat” = empirical cross-product (observed?)

I am confused about the distinction between observed and expected information. But yeah, "meat" is the empirical cross-product. Mislevy (1984) is implemented for the latent distribution.

Methods "bread" and "sandwich" are implemented according to Yuan, Cheng, & Patton (2013).

With respect to the EM related methods, "mr" and "tian" require the convergence history while "oakes1999" and "agile" do not.

You have covered all the options.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
Fisher information

> “meat” = empirical cross-product (observed?)
> “bread” = Fisher information matrix (expected?)
> “sandwich” = same as the name implies (but, expected or observed?)
There are two different ways of defining the Fisher information: (1) the variance of the first derivative of the loglikelihood and (2) the expectation of the second derivative of the loglikelihood. The true Fisher information is generally unknown, because it requires knowledge of the true parameter values and the true functional form (i.e., family) of the distribution. If we assume a particular functional form of the distribution, and "plug-in" our MLEs in place of the true parameter values, we can estimate the true Fisher information from data. If the necessary integrals are analytically tractable, then we can estimate the true Fisher information as the expected information matrix. But, sometimes those integrals are not so easy, so we instead use the observed information as an estimator, and calculate it either as a quadratic function of the first derivative loglikelihood (proportional to the "empirical cross-product" or "meat") or as a first-degree function of the second-derivative loglikelihood (proportional to the Hessian matrix, or "bread"). When the distribution is correctly specified, those two ways of calculating the observed information will on average give the same answer in large samples. The "sandwich" is solve(bread)%*%meat%*%solve(bread), and so if the distribution is correctly specified, then asymptotically, the product of the first two factors tends toward an identity matrix, and the sandwich merely tends toward the inverted Hessian. But if the distribution is incorrectly specified, it can be shown that the sandwich nonetheless provides a consistent estimator of the repeated-sampling covariance matrix of the MLEs.

falkcarl's picture
Offline
Joined: 10/29/2015 - 17:22
Thank you! Do I understand

Thank you! Do I understand correctly then that all of these approaches (meat, bread, sandwich) are implemented in OpenMx using observed information? The cross-product approach is usually based on observed proportions of the response patterns. What I think is often referred to as Fisher information in the IFA literature is the plug-in method you refer to, where the expected information matrix is calculated, though it isn't really practical for long tests since one must loop over the model-implied proportions of all possible response patterns. I initially thought this was "bread" but it doesn't sound like it now.

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
Re: loop over the model-implied proportions of all possible resp

The Fisher information method where you need to loop over all possible response patterns is not implemented.