Attachment | Size |
---|---|

Rplot.jpeg | 125.44 KB |

Hi,

Does anyone know how to estimate latent scores in the following model?

What package should I use?

best regards,

Krzysztof

Wed, 06/11/2014 - 17:25

#1
latent scores

Attachment | Size |
---|---|

Rplot.jpeg | 125.44 KB |

Does anyone know how to estimate latent scores in the following model?

What package should I use?

best regards,

Krzysztof

The image you attached appears to be a factor model. So the latent scores are factor scores. Here is a thread that discusses getting factor scores in OpenMx and provides some helper functions to do so: http://openmx.psyc.virginia.edu/thread/1294 . OpenMx does not have built-in utilities for this, at least not yet.

Thank you for this explanation.

My model includes four latent factors: two of them represent psychological content, the other two represent different forms of the items. Each of the items load on one of the two "content" factors and on one of the two "form" factors. This model is similar to the multitrait-multimethod model. I'm not sure if in the case of such a model I can calculate the factor scores using one of the methods typically applied for less complex models such as Bartlett, Thomson, or regression based factor scores.

From a modeling perspective, it's still just a factor model. That the factors represent "content" or "form" makes no difference; neither does it matter that we consider this kind of factor model similar to a multi-trait-multi-method model. You can still get factor scores that are just as meaningful. You happen to have a factor model with crossed loadings; that's the only difference. Factor score methods do not require simple structure. You can use regression based factor score methods.

If interested, you could also use maximum likelihood factor scores. A paper by Ryne Estabrook and Michael Neale gives plenty of details on doing this in OpenMx.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3773873/

Thank you very much for your help!

I tried to apply R code (presented in the article you advised me) to estimate ML factor scores. I tested this code in several factor models consisting of a different number of factors. In the case of my one-factor model it worked pretty well, although expected posterior weights implementation to some cases generated the following warning message:

In model 'Weighted Factor Score Model' NPSOL returned a non-zero status

code 1. The final iterate satisfies the optimality conditions to the

accuracy requested, but the sequence of iterates has not yet converged.

NPSOL was terminated because no further improvement could be made in the

merit function (Mx status GREEN).

Much worse was the case of two-factor model for which some data made the

weights implementation return error, like below:

Error: The job for model 'Weighted Factor Score Model' exited abnormally with the error message: Objective function returned a value of NaN at iteration 30.3.

In addition: Warning message:

In model 'Weighted Factor Score Model' NPSOL returned a non-zero status code 6. The model does not satisfy the first-order optimality conditions to the required accuracy, and no improved point for the merit function could be found during the final linesearch (Mx status RED)

The case of four-factor model (like the one I presented in my earlier post) was the worst. For all of my data I received the messages like this:

Error: The job for model 'Weighted Factor Score Model' exited abnormally with the error message: Objective function returned an infinite value at iteration 26.6.

How can I improve the performance of this code? I would be grateful for

any suggestions and information.

It would be good to give a strictly positive error for the residual components. If you can post the script you are using it makes it easier to check whether you have done, e.g., x, y and z already.

I am assuming that your problems arise in the estimation of the model, not of the factor scores, correct?

No, these problems arise in the estimation of the maximum likelihood factor scores. I was using R code presented in the paper by Ryne Estabrook and Michael Neale.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3773873/.

It seems that the more factors in the model, the more serious errorrs arise in the estimation of the maximum likelihood factor scores.

I have personally used the Estabrook/Neale method of estimating ML factor scores on models with 15 factors. It worked fine. With that being said, there are several nuances to doing this. You might have missed one. From memory, first you must fit your regular factor model to all the data. Then you fix all of the previously estimated parameters, free the factor means, and use only a single row of raw data. The only free parameter should be the factor means. The estimated factor means for row k are the ML factor score estimates for row k.

The other piece is whether or not you use the latent covariance structure to get the posterior ML factor scores, a kind of empirical Bayes ML factor score. I found less sensitivity when using straight ML than in the posterior ML method.

I did everything like you wrote with one exception. By mistake I did not fix factor covariance. After correcting this error, everything works fine, although for some row of data (it works in the loop) still I get the warning:

In model 'Weighted Factor Score Model' NPSOL returned a non-zero status code 1. The final iterate satisfies the optimality conditions to the accuracy requested, but the sequence of iterates has not yet converged. NPSOL was terminated because no further improvement could be made in the merit function (Mx status GREEN).

I'm very glad that it worked! Little bugs like that often get in the way. I wouldn't worry too much about the Mx status Green. From our Wiki

"A value of 1 means that an optimal solution was found, but that the sequence of iterates did not converge. There are several reasons this can happen, including starting at the correct values. It generally does not indicate a problem. These estimates can generally be considered correct solutions, so this code is labeled (Mx status GREEN)."

http://openmx.psyc.virginia.edu/wiki/errors

Glad you and Hunter were able to resolve this. If you do have problems with larger numbers of factors, please let me know and I'll look into the method.

Ryne

Hi Ryne,

Can you share how to translate this weight algebra into RAM form from the LISREL-style? matrices used in Estabrook and Neale 2013?

mxAlgebra(name = "weight",

1 / (sqrt( 2 * pi) * sqrt(det(model.phi))) * exp(-.5 * (model.mu %&% solve(model.phi)))

)

Is phi the latent corner of S, and mu the latent columns in M?

PS: Is there a reference for the purpose of each matrix in the CFA implementation you and Mike used as an example?

If you have missing item data, Bartlett/Thompson regression factor scores can be more awkward to compute. ML factor scores will take care of missingness patterns very conveniently. However, the error distribution - and the expected variance of the latent trait factor scores - will vary as a function of how many and (if it's not a Rasch model) which of the items were measured.

Factor scores derived by ML from 0/1 type items are typically more precise, if the item responses are thought to derive from an underlying and normally distributed liability dimension.

I do not have missing data and my data are ordinal. If I understand you well, it is better in this situation to use the Bartlett / Thompson regression.

I think it's better to use ML, which takes care of the fact that the distances 1-2-3-4 are not necessarily equal intervals when the data are ordinal. The normal theory approach assumes an underlying normal distribution of liability with thresholds, so if, for example, only 1% of people answer category 1, then the left-most threshold would be -2.33 z-score units, and the average of people in that category would be less than that. If the next category accounted for 80%, then they would have an average z-score around -.5. Well, I hope you get the picture that using information about the relative frequencies in the different categories makes a difference when computing the factor score.