You are here

FIML estimation of multiple missing covariates in a three-level meta-analysis

10 posts / 0 new
Last post
Denise B's picture
Offline
Joined: 05/09/2021 - 04:45
FIML estimation of multiple missing covariates in a three-level meta-analysis
AttachmentSize
File con_se.csv10.51 KB

Dear fellow researchers,

I am currently writing my PhD thesis in psychology. Part of my thesis is a three-level meta-analysis of the relationship between personality (i.e. conscientiousness) and self-efficacy in school students (in primary and secondary education).

My thesis also includes a meta-regression with covariates that contain missing values (MAR). That is why I would like to use the metaSEM package for an FIML estimation of missing covariates. As it seems possible to include multiple covariates simultaneously [through x2= c(X1, X2,...)], in the results I can only find one beta coefficients for all covariates combined. Is there also a way, to get the FIML estimation for each beta coefficient, separately?
My R code is as follows:

con_se.mod.metaSEM <- meta3X(y.c, v.c, cluster= sample_nr,
x2= c(year, interval.months, idv, class, se , pers_rater , pers_retro),
data=con_se, model.name="con_se 3 level metaSEM",
intervals.type="z")

The moderators included are continuous variables (year=year of publication, interval.months=time between assessment of conscientiousness and self-efficacy, idv=individualism index of the country where study took place, class=mean grade level of participants) and factors (se=domain of self-efficacy, pers_rater=self- vs. other-rated personality, pers_retro=whether assessment of personality took place before, after or at the same time as self-efficacy).

Thank you for your time!

Best regards,
Denise Braul

Mike Cheung's picture
Offline
Joined: 10/08/2009 - 22:37
Could you please attach the

Could you please attach the data, R code, and output?

Denise B's picture
Offline
Joined: 05/09/2021 - 04:45
files are attached now

In the first meta-regression with meta3x(..., x2=cbind()) I get an error warning. The second version with meta3x(..., x2=c()) I only get one beta coefficient combined for all moderator variables which is of no use for me. So I tried to substitute the missing moderator values with the median of all other values (per moderator) and then used the meta3(..., x=c()) function where I was able to use x=c() to include multiple moderators and receive one beta coefficient for each moderator variable. To double-check I ran the same analysis (with subtituted missing values) using an alternative R package (metafor) which yielded similar results.

Is there any way to get beta coefficients for each moderator variable with the FIML estimation using meta3x()?

Thanks for your help!
Best regards,
Denise Braul

File attachments: 
Mike Cheung's picture
Offline
Joined: 10/08/2009 - 22:37
Dear Denise,

Dear Denise,

I don't have a version that works for your model. It appears that there are several issues here:

1) x2= c(X1, X2,...) is the wrong sytax because it concatenates all predictors to a single vector.

2) There are 19 clusters (level-3 units) with 45 effect sizes (level-2 units). I wonder whether there are enough data to fit the model with 10 covariates.

3) In meta3(), the covariates are treated as a design matrix. It does not matter whether they are level-2 or level-3 covariates. In meta3X(), it does because different structures are imposed on the covariates depending on whether they are level-2 or level-3 covariates. You have treated all 10 covariates as level-2 variates. It does not seem correct, at least for "year" and "idv."

Hope it helps.

Mike

Denise B's picture
Offline
Joined: 05/09/2021 - 04:45
Dear Mike,

Dear Mike,

thank you for your comment. I will reply to your concerns in the order they were mentioned.

1) Is there another way to write the syntax to include multiple predictors in one analysis using meta3X()? I would very much appreciate it for having an estimation of R² that includes all moderators simultaneously.

2) You are definitely right that the data for this analysis is scarce. My thesis is actually about the relationship between conscientiousness, neuroticism, self-efficacy, and academic achievement. I am planning to do 6 separate meta-analyses of the bivariate relationships and one MASEM with all variables included. I have chosen this data set for my question especially because it is the smallest and therefore easier to upload and faster to run analyses as the same syntax problems arise in the other bivariate meta-analyses as well. But I will definitely reconsider the choice of moderators for this specific meta-analysis. Thank you for noticing!

3) I am aware that "year" and "idv" would normally be Level-3 variables. Unfortunately, my data set includes several duplicate samples (the identical sample was used for publishing multiple papers). To take into account the dependence of these effect sizes, duplicate samples received the same sample_nr. Thus, there are several values for the variable "year" in one "sample_nr". The same is true for the variable "idv" because my data includes includes PISA samples, where several countries are represented in the same "sample_nr". I suppose that means that I could not include those variables as moderators in my analyses (I was honestly hoping that I could get around this issue by treating "year" and "idv" as Level-2 Moderators because they now also vary within-group)?

Thanks a lot for your help!

Best regards,
Denise

Mike Cheung's picture
Offline
Joined: 10/08/2009 - 22:37
Dear Denise,

Dear Denise,

  1. You have already included multiple predictors by using cbind(). I do not understand why you want a different way to do cbind() without using cbind().

  2. There is a cost in handling missing data with FIML. We have to assume distributions on the missing covariates. If you claim that the missing variables, say "year" and "idv" are level-2 variables, the program calculates variables on "year" and "idv" within the level-3 unit. If there is almost no variation, the program will suck there. BTW, if multiple papers are using the same dataset, it does not seem unreasonable to use the first "year" as the "year."

Best,
Mike

Denise B's picture
Offline
Joined: 05/09/2021 - 04:45
Dear Mike,

Dear Mike,

thank your for remark concerning year of publication! I have already updated my data and attached the new file (using the first year of publication for all following publications).

As for the inclusion of multiple moderator in meta3X(), I agree that I do not need another way than cbind(). The only problem is that I do not seem to be able to include all variables. It works as long as I include only numeric variables as moderators. But I have some dummy-coded variables I would also like to include in the analyses (for the domain of self-efficacy judgements: se_academic, se_subject, se_performance, se_general; and for self- vs. other-rated personality: pers_self, pers_others).

So running the following syntax (without dummy-coded variables) works fine:

meta3X(y=y.c, v=v.c, cluster=sample_nr,
x2 = cbind(interval.months, class),
x3=cbind(year, idv),
data=con_se, model.name = "FIML model",
intervals.type="z")

But once I include the dummy-coded variables in the following way:

meta3X(y=y.c, v=v.c, cluster=sample_nr,
x2 = cbind(interval.months, class,
se_academic, se_subject, se_performance,
pers_self),
x3=cbind(year, idv),
data=con_se, model.name = "FIML model",
intervals.type="z")

I get the following error warning :

Error in running the mxModel:
simpleError: The job for model 'Meta analysis with ML' exited abnormally with the error message: fit is not finite (The continuous part of the model implied covariance (loc2) is not positive definite in data 'Meta analysis with ML.data' row 6. Detail:
covariance = matrix(c( # 9x9
0.120297160465344, 0, 0, 0, 0, 0, 0, 0, 0
, 0, 158.42, -17.12, 1.67, -0.53, 0.07, -1.92, -3.7, -9.18
, 0, -17.12, 6.39, 0.14, -0.06, 0.06, 0.45, -0.42, -4.51
, 0, 1.67, 0.14, 0.36, -0.04, 0.09, 0.07, 0.51, 3.76
, 0, -0.53, -0.06, -0.04, 0.24, -0.03, 0.12, 0.16, -5.86
, 0, 0.07, 0.06, 0.09, -0.03, 0.14, 0.2, 0.04, 0.35
, 0, -1.92, 0.45, 0.07, 0.12, 0.2, 0.16, -0.04, 0.12
, 0, -3.7, -0.42, 0.51, 0.16, 0.04, -0.04, 9.87, 21.78
, 0, -9.18, -4.51, 3.76, -5.86, 0.35, 0.12, 21.78, 633.98), byrow=TRUE, nrow=9, ncol=9))

I would be very thnkful if you could point into the direction of the mistake in my code. Thanks a lot!

Best regards,
Denise

File attachments: 
Mike Cheung's picture
Offline
Joined: 10/08/2009 - 22:37
Dear Denise,

Dear Denise,

I am not optimistic about applying the three-level model to your data. There are only 19 clusters (or subjects from an SEM perspective). There is not not enough to fit a model with so many predictors.

> table(con_se$sample_nr)
 
   14    23    29    30    32    33    38    39    41  1181  1198 10002 10051 10057 
    8     1     2     2     1     1     3     6     1     1     1     2     8     1 
10061 10062 10063 10064 10078 
    1     1     1     2     2 

For the level-2 predictors, it is assumed that they have the same variances within each cluster. This does not seem to be a reasonable assumption from your data.

> library(dplyr)
> con_se %>% group_by(sample_nr) %>% summarise(n(), sd(interval.months), sd(class))
# A tibble: 19 x 4
   sample_nr `n()` `sd(interval.months)` `sd(class)`
       <int> <int>                 <dbl>       <dbl>
 1        14     8                 19.2        1.39 
 2        23     1                 NA         NA    
 3        29     2                 NA          0    
 4        30     2                 NA          0.841
 5        32     1                 NA         NA    
 6        33     1                 NA         NA    
 7        38     3                  0          0    
 8        39     6                  0          0    
 9        41     1                 NA         NA    
10      1181     1                 NA         NA    
11      1198     1                 NA         NA    
12     10002     2                  0          0    
13     10051     8                  1.60       0    
14     10057     1                 NA         NA    
15     10061     1                 NA         NA    
16     10062     1                 NA         NA    
17     10063     1                 NA         NA    
18     10064     2                 NA          0    
19     10078     2                 NA          0  

Most of the binary predictors have 0 variance within each cluster. This indicates that they may not be level-2 variables.

> con_se %>% group_by(sample_nr) %>% summarise(n(), sd(se_academic), sd(se_subject), sd(se_performance), sd(pers_self))
# A tibble: 19 x 6
   sample_nr `n()` `sd(se_academic)` `sd(se_subject)` `sd(se_performance)` `sd(pers_self)`
       <int> <int>             <dbl>            <dbl>                <dbl>           <dbl>
 1        14     8                 0            0                    0                   0
 2        23     1                NA           NA                   NA                  NA
 3        29     2                 0            0                    0                   0
 4        30     2                 0            0                    0                   0
 5        32     1                NA           NA                   NA                  NA
 6        33     1                NA           NA                   NA                  NA
 7        38     3                 0            0                    0                   0
 8        39     6                 0            0                    0                   0
 9        41     1                NA           NA                   NA                  NA
10      1181     1                NA           NA                   NA                  NA
11      1198     1                NA           NA                   NA                  NA
12     10002     2                 0            0                    0                   0
13     10051     8                 0            0.463                0.463               0
14     10057     1                NA           NA                   NA                  NA
15     10061     1                NA           NA                   NA                  NA
16     10062     1                NA           NA                   NA                  NA
17     10063     1                NA           NA                   NA                  NA
18     10064     2                 0            0                    0                   0
19     10078     2                 0            0                    0                   0

Best,
Mike

Denise B's picture
Offline
Joined: 05/09/2021 - 04:45
robust estimation of SE

Thank you for your detailed feedback. I've cut down on several predictor variables. However, I would have one more question concerning the meta()/meta3X() code. In both commands, there is the possibility to adjust SEs through:

summary(meta3(y=y.c, v=v.c, cluster=sample_nr,
x2 = cbind(interval.months, class),
x3=cbind(idv),
data=con_se, model.name = "FIML model",
intervals.type="z"), robust=TRUE )

However, I did not find any specification on how this robust estimation was carried out. Could you provide me with any more detail on how the robust=TRUE function works, please?

Thank you for your time!

Best regards,
Denise

Mike Cheung's picture
Offline
Joined: 10/08/2009 - 22:37
You may try ?summary.meta.

You may try ?summary.meta.

The manual indicates that it calls imxRobustSE from OpenMx.