You are here

Inequality of the sample size

9 posts / 0 new
Last post
Eva CUI's picture
Offline
Joined: 03/22/2022 - 10:52
Inequality of the sample size

Hello,

I am conducting a MASEM study using TSSEM (random-effects) approach. There are in total six variables involved (i.e., PS, PA, MA, Voc, WR, and RC) in the hypothesized model. In the tested model, PS is the predictor and RA serves as the latent variable, which can be loaded on WR and RC. And the PA, MA, Voc would serve as the mediators. I do have several questions listed below:

  1. Whether the inequality of the sample size would lead to a biased estimation of the parameters? Or, what is the ideal number of studies/participants to fit the proposed model?

Below, please kindly find the detailed information about my study

1) There are in total 68 studies included. The average sample size for each study is 86.

2) Number of studies:
PS WR RC PA MA Voc
PS 68 67 23 55 15 36
WR 67 67 23 52 16 32
RC 23 23 25 18 3 14
PA 55 52 18 58 15 32
MA 15 16 3 15 17 11
Voc 36 32 14 32 11 38

3) Number of participants:
PS WR RC PA MA Voc
PS 5511 5396 1852 4572 1607 2847
WR 5396 5396 1820 4623 1899 2723
RC 1852 1820 1999 1391 550 832
PA 4572 4623 1391 4873 1831 2661
MA 1607 1899 550 1831 2002 1207
Voc 2847 2723 832 2661 1207 3021

4) Sample size for each study
[1] 126 98 103 110 16 133 6 49 33 61 103 93 70
[14] 85 86 70 104 25 25 20 23 45 80 18 33 115
[27] 44 85 62 75 93 50 114 70 81 370 30 77 80
[40] 80 63 54 76 211 48 60 76 66 69 56 130 199
[53] 123 44 25 24 110 35 78 29 18 84 64 31 161
[66] 180 54 302

  1. It is obvious that there are 67 studies reporting the correlation between the PS and WR. However, the number of studies for other pairs of correlation decreases dramatically. One of the reasons is probably because our inclusion criteria clearly stated that all the studies included in the meta-analysis must report the correlation between PS and WR (because it would address the first research question and control the scope of the study). I noticed that some MASEM studies would include all the correlations (i.e., at least report one pair of correlations of interest)

In short, whether this inclusion criterion limits the estimation for the model (which is actually not the full picture). Or are there any potential methodological problems?

  1. In generating the list of the correlation matrix, I do encounter some warnings. How to interpret and fix the warnings "In mat[lower.tri(mat, diag = FALSE)] <- x :a number of items to replace is not a multiple of replacement length"?

R code:

cormatrices <- readstack(dataSEM04, no.var = 6, var.names = varnames, diag = FALSE)

  1. I am wondering if there are any approaches to generate the pooled correlation matrix in One-stage MASEM?

  2. To address the missing data, I am wondering if there are any functions in metaSEM to evaluate the missing data rate? And to implement the FIML, is it necessary to run a code separately? Or, FIML has been integrated into the function of "tssem"?

It would be highly appreciated if you are willing to help!! Thanks again for your contribution to the open science community! :)

Best regards,
Eva

Mike Cheung's picture
Offline
Joined: 10/08/2009 - 22:37
Hi Eva,

Hi Eva,

1) I am unaware of any simulation study showing that unbalanced sample sizes lead to biased parameter estimates. It is hard to give a concrete suggestion on the number of studies and sample size required. The best way to answer this question is to try a simulation study. The following pre-print may give you some ideas. https://openmx.ssri.psu.edu/node/4798

2) The inclusion/exclusion criteria will affect the results. Researchers have to defend their choices. If you have concerns about whether this decision may affect your results, you may conduct a sensitivity analysis by comparing results of including/excluding some studies.

3) It is hard to tell without a reproducible example.

4) You may use tssem1REM() to get an average correlation matrix, which is identical to the one using the osmasem().

5) You have reported the missing rates with pattern.na() before. If you want to get more details on the missing rates, you may try functions in the mice package. tssem1() and osmasem() use the FIML. You don't need a separate analysis.

Mike

Eva CUI's picture
Offline
Joined: 03/22/2022 - 10:52
Dear Mike,

Dear Mike,

Thank you so much for your detailed and precise comments!! I am much clearer now. I do have some other questions. It would be highly appreciated if I can know more about your insightful suggestions.

1) Since I am testing a path model (specificed as below). I am kind of stuck in testing the direct and indirect effects. To estimate the direct and indirect effecst, I basically followed the instructions illustrated in the study by Cheung, (2022). I am wondering whether it is possible to conduct the statistical tests of the direct and indirect effects, or if there are anytings I misunderstood?

SEM13 <-
'RC ~ eWR + cVoc + aPS
Voc ~ PA + b
PS + MA
WR ~ MA + PA + dPS
PA ~ PS
MA ~ PS
PS ~~ 1
PS
RC ~~ RC
Voc ~~ WR
PA ~~ MA
Direct := a
Indirect := b*c'

plot(SEM13)

input:

TS1 <- tssem1(cormatrices13, n, method = "REM")
summary(TS1)

averageR <- vec2symMat(coef(TS1, select = "fixed"), diag = FALSE)

RAM <- lavaan2RAM(SEM13, obs.variables = varnames13)

TS2 <- tssem2(TS1, RAM = RAM)
summary(TS2)

calEffSizes (model = SEM13, n = n, Cov = averageR)

Output:
Direct Indirect
0.10543856 0.04041951

2) In one-stage MASEM, I am wondering if I can apply the categorial moderator with multiple levels on the Ax ?

I have also attached my R.codes. Thank you agian for your time and attention.

Best regards,
Eva

File attachments: 
Mike Cheung's picture
Offline
Joined: 10/08/2009 - 22:37
Dear Eva,

Dear Eva,
1) A likelihood-based (LB) CI can be used to test the indirect and direct effects.

TS2 <- tssem2(TS1, RAM = RAM, intervals.type = "LB",)

2) Similar to multiple regression, you may create dummy indicators of 0 and 1 to handle categorical moderators in the one-stage MASEM.
Mike

Eva CUI's picture
Offline
Joined: 03/22/2022 - 10:52
Dear Mike,

Dear Mike,

Thank you so much for your instructions! I tried this "LB". However, I am not quite sure my interpretation of the results was correct. The z-score and p values were all NA. Does it mean they are not significant? If the PS is the predictor, RC is the outcome, others are the mediators (mutiple mediators), I am wondering how to indicate the significant level of direct (a) or indirect effect (bc/de)? Below, please kindly find the output.

Thank you again for your help and generous sharing!

Eva

95% confidence intervals: Likelihood-based statistic
Coefficients:
Estimate Std.Error lbound ubound z value Pr(>|z|)
MAONPS 0.266092 NA 0.201725 0.330306 NA NA
PAONPS 0.286819 NA 0.253663 0.319983 NA NA
a 0.098143 NA 0.024736 0.170192 NA NA
c 0.374117 NA 0.290725 0.456512 NA NA
e 0.381771 NA 0.267457 0.494503 NA NA
VocONMA 0.403405 NA 0.272429 0.534873 NA NA
VocONPA 0.126609 NA 0.038196 0.209448 NA NA
b 0.094454 NA 0.025384 0.159910 NA NA
WRONMA 0.255140 NA 0.167884 0.342686 NA NA
WRONPA 0.246552 NA 0.182677 0.307662 NA NA
d 0.187613 NA 0.143029 0.229854 NA NA
PAWITHMA 0.315018 NA 0.253525 0.376740 NA NA
VocWITHWR 0.101139 NA 0.041612 0.157367 NA NA

Mike Cheung's picture
Offline
Joined: 10/08/2009 - 22:37
Dear Eva,

Dear Eva,

The z and p values are based on the Wald test. When the LBCI is requested, I remove them to avoid confusion.

Since the data are not attached, I don't know the results. It appears that you have missed the LBCI of the direct and indirect effects.

The results may look like this:

## mxAlgebras objects (and their 95% likelihood-based CIs):
##                   lbound  Estimate    ubound
## Indirect[1,1] 0.06535370 0.1074063 0.1565308
## Direct[1,1]   0.02778714 0.1111548 0.1907302

You may refer to standard statistics textbooks on how to interpret the 95% LBCI.

Mike

Eva CUI's picture
Offline
Joined: 03/22/2022 - 10:52
Dear Mike,

Dear Mike,

Thank you so much for providing such detailed instruction. I am wondering if I can attach the R code and data again for your further guidance? I seems that I cannot replicate the results as you have illustrated.

I do encounter some other questions. It would be highly appreciated if you are willing to provide help!! Thank you in advance!

  1. It is known that "LBCI" is preferred when the number of sutdies is small. I am wondering how small it is?
  2. How to interprete the "NA p" in comparing two nested models. Does it mean these two models are not significantly different, or they are not nested?
  3. Can we compare saturated model with over-identified model, if they are nested?
  4. If the two models are significantly different, we would choose the better fit one. I am wondering what if the indexes (e.g., CFI, RMSEA, SRMR) are almost identical?

Last but not the least. I am very motivated about this study because of your guidance. More importantly, I am very much inspired by your dedication and sharing spirit. Your generous sharing and guidance lightens our way, espetially for the beginning researchers like me. Thank you again for your time and attention.

Best regards,
Eva

File attachments: 
Mike Cheung's picture
Offline
Joined: 10/08/2009 - 22:37
Dear Eva,

Dear Eva,

1) 3) and 4) Since MASEM models are SEM, these questions are not unique to MASEM. You may refer to the SEM literature for the relevant discussion.

2) The R code does not work for me. Have you tried it in your R console? It is hard to tell what "NA p" means without knowing what models are being compared.

Mike

Eva CUI's picture
Offline
Joined: 03/22/2022 - 10:52
Dear Mike,

Dear Mike,

Thank you so much for your instruction. I would explore more on the SEM. For the second question, I thought I may have figured it out. The two compared models had an identical degree of freedom.

I am excited about the results based on the MASEM analysis. Thank you for your contribution!

Best regards,
Eva