Hi All,
I need to calculate variance explained for my endogenous latent variables after fitting the pooled data to a structural model in Stage 2 (using tssem2). I have specified a model (all observed variables) with 3 dependent variables (two of which are mediators, with one key "final" dependent variable).
(I guess I'll drop in the A matrix so you know what it looks like:)
KSSE Prestige EnjHlpg BossExp ExtrMotiv Recip TrustB ATS ITS KS KSSE "0" "0" "0" "0" "0" "0" "0" "0" ".3*KSSE_ITS" ".3*KSSE_KS" Prestige "0" "0" "0" "0" "0" "0" "0" ".3*Prest_ATS" "0" "0" EnjHlpg "0" "0" "0" "0" "0" "0" "0" ".3*EnjHlp_ATS" "0" "0" BossExp "0" "0" "0" "0" "0" "0" "0" "0" ".3*BossExp_ITS" "0" ExtrMotiv "0" "0" "0" "0" "0" "0" "0" ".3*ExtrMotiv_ATS" "0" "0" Recip "0" "0" "0" "0" "0" "0" "0" ".3*Recip_ATS" "0" "0" TrustB "0" "0" "0" "0" "0" "0" "0" ".3*Trst_ATS" "0" "0" ATS "0" "0" "0" "0" "0" "0" "0" "0" ".3*ATS_ITS" "0" ITS "0" "0" "0" "0" "0" "0" "0" "0" "0" ".3*ITS_KS" KS "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
The S matrix looks (in part) like this:
$labels KSSE Prestige EnjHlpg BossExp ExtrMotiv Recip TrustB ATS ITS KS KSSE "e1" "cov" "cov" "cov" "cov" "cov" "cov" NA NA NA Prestige "cov" "e2" "cov" "cov" "cov" "cov" "cov" NA NA NA EnjHlpg "cov" "cov" "e3" "cov" "cov" "cov" "cov" NA NA NA BossExp "cov" "cov" "cov" "e4" "cov" "cov" "cov" NA NA NA ExtrMotiv "cov" "cov" "cov" "cov" "e5" "cov" "cov" NA NA NA Recip "cov" "cov" "cov" "cov" "cov" "e6" "cov" NA NA NA TrustB "cov" "cov" "cov" "cov" "cov" "cov" "e7" NA NA NA ATS NA NA NA NA NA NA NA "e8" NA NA ITS NA NA NA NA NA NA NA NA "e9" NA KS NA NA NA NA NA NA NA NA NA "e10"
In the output of the stage 2 analysis, I get parameter estimates for all the paths as expected, and I get values for the error terms e1-e10:
Estimate Std.Error lbound ubound z value Pr(>|z|) KSSE_KS 0.142375 NA 0.038505 0.240741 NA NA KSSE_ITS 0.400807 NA 0.309218 0.495398 NA NA Prest_ATS 0.422256 NA 0.355886 0.488867 NA NA EnjHlp_ATS 0.522933 NA 0.443224 0.602747 NA NA BossExp_ITS 0.437108 NA 0.376581 0.497560 NA NA ExtrMotiv_ATS 0.296702 NA 0.219903 0.373522 NA NA Recip_ATS 0.505664 NA 0.448948 0.562794 NA NA Trst_ATS 0.609971 NA 0.548550 0.671707 NA NA ATS_ITS 0.742128 NA 0.686109 0.799690 NA NA ITS_KS 0.592750 NA 0.540885 0.644920 NA NA e8 0.449247 NA 0.360477 0.529136 NA NA e4 0.808936 NA 0.752439 0.858183 NA NA e3 0.726541 NA 0.636709 0.803560 NA NA e5 0.911968 NA 0.860482 0.951646 NA NA e9 0.648647 NA 0.584078 0.707454 NA NA e10 1.000000 NA NA NA NA NA e1 0.751432 NA 0.701149 0.796394 NA NA e2 0.821700 NA 0.760925 0.873350 NA NA e6 0.744304 NA 0.683333 0.798446 NA NA e7 0.627935 NA 0.548822 0.699112 NA NA
I need r-square values for all three dependent variables (ATS, ITS, and KS). These are not output by default, but I think they can be calculated from some of the output. Except that the key dependent variable (for which I'd really like an r-squared value) is KS (associated with e10), which is being constrained automatically by the metaSEM package.
I can't imagine I'm the first to want to calculate r-square values using metaSEM, but I didn't see any reference to their calculation in the documentation or here in the forums. Any help you can provide will be most useful.
Thanks!
Hi, David.
R2=1-error variance for correlation structural models. It is the same as 1-e8 for ATS. You may refer to the following example:
https://courses.nus.edu.sg/course/psycwlm/internet/metaSEMbook/index.html#a-regression-model-on-sat-math
BTW, KS is an independent variable in your model. There is no R2 on it.
Mike
Thanks, Mike. This is helpful. (I knew the R2 was simple, but didn't know it was that simple.)
But I'm confused about your indication that KS is an independent variable in my model. The A matrix specifies two direct paths to KS (from KSSE and ITS). (Note also that I've specified all variables as observed variables, though I'm not sure that's relevant.)
Doesn't this mean that KS is a dependent variable for which an R2 will apply? Or have I been thinking about this incorrectly in MetaSEM for a few years? And if so, how can I force MetaSEM to provide the error term so that R2 can be calculated?
A-matrix code pasted below for reference:
Hi, David.
You may have confused the directions in RAM specification. There are direct effects from ITS and KS to KSSE in your A matrix (see row KSSE). Since there is no direct effect on KS (all are zeros in row KS), KS is an independent variable.
I would suggest plotting the model graphically so that you can check the directions imposed. There are some materials that may be helpful. https://courses.nus.edu.sg/course/psycwlm/internet/MASEMworkshop/slides2.html
Mike
Mike,
Your comment led me down a week-long rabbit hole, but it was incredibly instructive. You're right; I have been thinking about this incorrectly for some time. I have rectified my model, achieving the correct estimations, including the error terms for DVs.
This has brought up three follow-up questions, though, and I wonder if you'd be kind enough to enlighten me:
1) There are NA terms included in several (many, actually) of the LB interval estimates. This makes it difficult to establish wether paths are significant (i.e., whether they cross zero). I also see that in your "Examples of MASEM" page, the Digman97 dataset also produces NAs in the interval estimates. Can you explain what this means and whether there is a way around it?
2) Related to #1, I'm aware of the ability to do a bootstrapped estimation of z-scores for each of the parameter estimates (by using the default internals.type = "z") option. Is there a way to alter the number of replications in that bootstrap operation?
3) Now that my model is correctly specified, I am calculating error terms for my DVs. You have indicated that the R2 for each of these DVs can be calculated as (1-error). However, my use of this formula has produced, in every case, a decrease in explained variance as more explanatory variables are added to my models. In my experience with regression and other variance-explaining procedures, I have always seen R2 go up, even if only slightly, as many more variables are added (and the increase is obviously larger when good predictors are added). Can you reconcile this seeming paradox for me?
Again, I very much appreciate your help.
Hi, David.
1) My guess is that you are referring to the z value and p value when LB intervals are requested (see the following example). When the 95% LBCI does not include the hypothesized value, say 0, it is statistically significant at .05. For example, all of the following estimates are statistically significant at .05.
95% confidence intervals: Likelihood-based statistic
Coefficients:
Estimate Std.Error lbound ubound z value Pr(>|z|)
Alpha_A 0.56258 NA 0.53239 0.59285 NA NA
Alpha_C 0.60512 NA 0.57510 0.63532 NA NA
Alpha_ES 0.71913 NA 0.68863 0.75031 NA NA
Beta_E 0.78200 NA 0.71899 0.85587 NA NA
Beta_I 0.55089 NA 0.49939 0.60232 NA NA
2) From the reply in (1), it is usually not required to use bootstrapped SE. Started from metaSEM 0.9.6, tssem2(diag.constraints=FALSE) is correct for all models including mediators. It provides SEs for the estimates. If you really want to use bootstrapped SE, you may try:
fit <- tssem2(diag.constraints=TRUE)
Then you may use:
summary(fit, R=50)
You may increase the no. of bootstrap samples by using a larger R.
3) Could you provide the data and output for me to check? If the data are sensitive, you may send it to me email.
Mike
Thanks, Mike. I sent a reply via email. Look forward to hearing from you.