Hi folks. I mentioned this to Joshua who asked me to post it here for further input. While wondering about some strange output in one of my functions the other day I discovered that the matrices output by openmx only reflect the last calculated row of data. I believe this only becomes evident when definition variables are used, such that the A S or M matrices (when using RAM format, though I don't believe it's limited to this case) depend on the definition variable. I think a better option may be some sort of weighted output based on the number of observations in each row...
You are correct that the returned values of a matrix or algebra that depends upon a definition variable will only reflect the values of that definition variable in the last row of the dataset. I doubt that was a deliberate design decision. Instead, the matrix or algebra simply retains the last value it held during the run. Keep in mind that definition variables, as their name implies, define the specific statistical model applied to each row of the dataset. In other words, the model for each row is defined conditional on its values of the definition variables. In general, every row could have different values of the definition variables, meaning that a slightly different model would be applied to each row. Many times, one or more free parameters will represent how much certain quantities change as a function of the definition variables. An easy example is conditioning the expected means on the definition variables via linear regression; the tell-tale parameters in this case would be the regression coefficients.
Could you further describe the behavior you'd prefer to see from OpenMx in this regard?
Ok. For the sorts of models and data I'm used to dealing with (and which I'd have thought would be more common than whichever sorts generate the exceptions, but I am of course ignorant of my unknown unknowns!) an average (with the same weighting function as FIML estimation) would be more meaningful. An average expCov could then be meaningfully plotted against the observed cov - even with definition variables 'altering the model' as such, I've found this can be very informative. If it doesn't make sense to approach it in this way then fair enough... it just took me a while to understand why everything was coming out 'wrong' and I thought I'd raise the issue.
The reason I asked Charles to post here is because I think it is a user interface question. He should be able to get per-row expected covariance matrices out, just like it is possible to get per-row likelihoods. If it is the average covariance he wants then do we want to provide a special API to get that or would he just collect the per-row covariance and compute the average himself?
I see that Mike Hunter pointed out that mxEval can do it.
Just for info: I couldn't come up with a way to sum the expCov's for each row together meaningfully (an average is wrong) so created a new single row model with the def vars as the column means of my original def vars, reinserted the F matrix that went missing and recalculated expCov as suggested above. This works fine, my expected cov now matches my observed when the model is correct.
Hi Charles,
It is a design feature to leave the matrices and algebras at their last computed state. You're right that it is general to any model, not just RAM models. Definition variables could be used for many things in many different ways . Thus, OpenMx does not presume to know how you might want matrices or algebras printed. We just leave them in their last computed form. A weighted combination might make sense for your use case, but be meaningless or misleading in others' cases.
The summary() function gives you a nice report of the estimated parameters. And mxEval() can be used to get the computed form of any matrix or algebra for any row of data.