I’m a PhD student in the field of economics very new using SEM and MX software.

I’d appreciate your feedback with regard to the following basic questions.

Let’s construct a SEM model where 3 measurable variables: X, Y and Z interact with a latent variable G and assume that the factor loadings of latent variable G over the measurable variables X, Y and Z are 0.3, 0.4 and 0.1 respectively. The model is constructed sourced on the correlation matrix (3x3) of the measurable variables.

Then, as far as factor loadings are considered as regression coefficients:

Question 1: could I estimate that G = 0.3*X + 0.4*Y + 0.1*Z?

Question 2: could you confirm me whether the obtained regression coefficients sourced on the correlation matrix are standardized or unstandardized coefficients?

Thanks

Question 1: could I estimate that G = 0.3

X + 0.4Y + 0.1*Z?You could, but you'd need other information about G. The factor model you described is more like the following.

Question 2: could you confirm me whether the obtained regression coefficients sourced on the correlation matrix are standardized or unstandardized coefficients?

The regression coefficients would be standardized.

If the model regresses X, Y, and Z onto G, then the one-headed paths run from G to the observables (similarly to what's depicted in the attached diagram), and you would be doing a simple factor analysis: G would be a common factor, and the factor loadings would indeed equal the regressions of X, Y, and Z onto G. If you're doing factor analysis using the correlation matrix of X, Y, and Z, then the loadings will be standardized regression coefficients. More specifically, in this case they would be the correlations between each observable variable and the latent G, since there is only one common factor.

No, this won't work, because factor analysis regresses the observable variables onto the latent variables. The equations defining this model are:

X = 0.3

G + UxG + UyY = 0.4

Z = 0.1*G + Uz

Where Ux, Uy, and Uz are the unique factors corresponding respectively to X, Y, and Z. What factor analysis does is partition variance in the observable variables into (1) variance common to multiple observable variables, and represented by common factors, and (2) variance unique to each observable, represented by the unique factors.

You can, however, predict scores on a factor. In psychometrics, there is a literature on factor-score prediction, and various methods exist for doing so. I could say more about that if you're curious.

Note that you can do a model where G is regressed onto the observable variables, but such a model would not be considered factor analysis, and would require something additional in order to be identified.

Edit: I didn't notice Mike Hunter's post below while I was writing mine.

Thanks for your comments (also for Mike's). They clarify a lot.

Please find attached a drawing of the model I am analyzing.

I understand from your explanations that I can regress and calculate:

DIS = 0.24

GOV + 0.70MUT + 0.29CONFLICT + 0.20WEALTHwhere GOV, MUT, CONFLICT and WEALTH are latent variables.

However, I have the following issue:

If we consider the particular case of WEALTH, it interacts onto the measurable variables of TECHINBAND, GDCAP and TECHBROADIN. The factor loadings are 0.65, 0.85 and 0,93 respectively. I need to estimate the value of WEALTH for the different observations within the analysis (N=142) based on the values of the measurable variables. Is that a situation in which I would need to use factor-score prediction ? In fact, I need to predict values of the 4 latent variables and in the end calculate the predicted values of DIS. Could you please give me some more guidance on how to proceed?

Thanks i advance.

Are the factor scores themselves of interest? Or are they just intermediate steps toward predicting DIS? If they're not of interest themselves, then you can predict DIS from the model-expected correlation matrix as long as none of your variables are ordinal. It's easy to do if you don't have any missing data. You would invert the model-expected correlation matrix for all the observable variables other than DIS. Then, post-multiply that inverted matrix by a column vector containing the correlations between DIS and each other observable variable. You will then have a vector of model-expected, standardized regression coefficients, from which scores on DIS can be predicted using raw data. To do so, you would need to standardize the observed variables, and keep in mind that the predicted DIS scores are on the standardized scale.

If you have missing data, or if some of your variables are ordinal, it would be a little more complicated.

If the factor scores are actually of interest, then yes, proceed with factor-score prediction. But I wouldn't use the predicted factor scores to in turn predict DIS. The model-specified structural relations among the latent variables, and between the latent and observable variables, are represented in the model-expected correlation matrix for the observable variables.

Thanks for your proposed solution to use the inverted correlations matrix to predict the values of variable DIS. I think it works well and the obtained results make sense. Variable DIS picks up the “order” of an economy. Using the described SEM model I have determined the values of DIS for 142 observed countries (N=142).

May I have your opinion as regards the fit measures of my model?

ML ChiSq 908.185 (p = 0.000)

Df: 161

Parameters: 49

AIC: 586.195

RMSEA: 0.181

Normed Fit Index 1: 0.779

Normed Fit Index 2: 0.811

Tucker Lewis Index: 0.775

Parsominious Fit Index 1: 0.660

Parsominious Fit Index 2: 0.111

Relative Non-Centrality Index: 0.809

Centrality Index: 0.072

Null Model Chi_Square Fit: 4107.14

Null Model Degree of Freedom: 190

Obviously the fit is not so good. May I justify/excuse it somehow and go on with the analysis? Is the fit so crucial for the validity of the analysis?

The poor fit means that the paths and variables you have do not represent the measured data well.

If you build a model that fits well, the paths you are taking your measures off might change, or not even exist anymore.

So getting to acceptable fit is important before interpreting the model or scores on latent traits etc.