Attachment | Size |
---|---|
Approach 1 - R Code.txt | 5.42 KB |
Approach 1 - Results.txt | 244.54 KB |
Approach 2 - R Code.txt | 6.04 KB |
Approach 2 - Results.txt | 258.94 KB |
Approach 3 Model 1 - R Code.txt | 2.79 KB |
Approach 3 Model 1 - Results.txt | 105.77 KB |
Approach 3 Model 2 - R Code.txt | 3 KB |
Approach 3 Model 2 - Results.txt | 159.69 KB |
Approach 3 Model 3 - R Code.txt | 3.05 KB |
Approach 3 Model 3 - Results.txt | 195.13 KB |
Dear Mike and Colleagues,
Greetings! I hope you are well.
I am working on a TSSEM project where the study aims to compare three competing models and pick the “best” model. I have attached a file titled Figure Model 1,2,&3.png to illustrate the three models. I am writing this post to run through my analyses and ask a few questions regarding the same. It would be of immense help if you could please throw some light.
My understanding is that these three models are non-nested. From my reading, I understood that for non-nested models, the AIC value would be the only goodness-of-fit index to ascertain the “best” model. Besides, I figured out that AICs aren’t comparable if non-nested models contain different constructs—it appears that to bring the AICs to the same scale, non-nested models need to draw from the same correlation matrix AND the constructs NOT present in a model need to be freely correlated. I am unsure how to set up TSSEM for non-nested models with different constructs such that the AICs are comparable.
Based on my readings and suggestions in this forum, I found two TSSEM approaches (named approach 1 and 2 and described below) that I believe set up my models such that the models draw from the same correlation matrix and the constructs NOT present in a model are freely correlated. Besides, I describe another approach, named approach 3. I have attached the R code and results for all approaches to this post.
Approach 1: In this approach, in stage 1, I estimated ONE BIG pooled correlation matrix using random-effects TSSEM. In stage 2, I have set up all three models where the constructs NOT present in a model are freely correlated with every other variable. This code is adapted from your example in this link: https://openmx.ssri.psu.edu/sites/default/files/Test.pdf.
Approach 2: In this approach, in stage 1, I estimated ONE BIG pooled correlation matrix using random-effects TSSEM. But I haven’t freely correlated the constructs as I did in approach 1. Instead, I used the select variables approach, where the three correlation matrices, one for each model, are constructed from the BIG pooled correlation matrix. This code is adapted from your example in this link: https://openmx.ssri.psu.edu/sites/default/files/select_variables.pdf.
Approach 3: uses the standard TSSEM approach where each model has its stage-1 pooled correlation matrix estimated ONLY using the constructs present in the corresponding model to be estimated in stage-2.
Questions:
(1) From a TSSEM setup standpoint, I am not sure which one of these approaches renders the AICs of my non-nested models with different variables comparable and consequently helps me ascertain the “best” model? If not, could you please provide some guidance?
(2) Interestingly, the results for all three models in approach 1 are identical to approach 2. Does it mean that these approaches are theoretically the same such that approach 1 explicitly models the covariance among variables, and approach 2 implicitly does so by virtue of drawing from the same pooled correlation matrix?
(3) In case approaches 1 or 2 let me compare my AICs: from a reporting standpoint, would it be appropriate to report that ONE BIG stage-1 pooled correlation matrix and then report the three models in my manuscript? I haven’t seen many TSSEM papers like this. Could you point me to TSSEM citations with similar reporting?
(4) If you notice approach 3 results, the N (total sample size) and K (number of studies) for models 1 and 2 are different from that of model 3. But, in approaches 1 and 2, I chose a universal N value for stage-2 estimation by following the code examples. Is there a way to code the three models in approaches 1 and 2 such that I choose different Ns and Ks?
(5) Do the different N and k values render the stage-2 AICs of models 1 and 2 incomparable with that of stage-2 AICs of model 3? If yes, is there another way to ascertain the “best” model? Probably, perform TSSEM only with common studies across models such that N and k values are the same?
(6) The goodness-of-fit indices are identical for models 2 and 3 (in approaches 1 and 2). I am afraid I have set up my models incorrectly. Do my model setups in stage 2 look congruent to the model pictures?
(7) We have many (about 30) matrices that are non-positive definite. From my readings, it is recommended that we remove those studies. However, there seem to be potential alternatives. It appears that there are various ways to correct such correlation matrices. Some of these approaches sound tedious, but it also appears one approach is to set the negative eigenvalues to 0. I am unsure if that’s a good idea or worth the time in TSSEM analyses. Here are some resources I saw on this:
• http://www.deltaquants.com/manipulating-correlation-matrices
• https://www.r-bloggers.com/2012/10/fixing-non-positive-definite-correlation-matrices-using-r-2/#:~:text=When%20a%20correlation%20or%20covariance,to%20noise%20in%20the%20data
Note: Our effect sizes were corrected for reliability.
(8) In one of your papers (Cheung & Hong, 2017), you used AIC to ascertain the “best” model. Could I also use BIC, given the limitation of AIC for large datasets?
Cheung, M. W. L., & Hong, R. Y. (2017). Applications of meta-analytic structural equation modelling in health psychology: Examples, issues, and recommendations. Health Psychology Review, 11(3), 265-279.
I apologize there are a lot of questions. I think it’s crucial to run this by you and get your thoughts and answers before we move ahead and are confident in our analysis. I hereby humbly request your help. Please let me know if there is additional clarification needed.
Thank you so much for your time and kind consideration of my request.
Best Regards,
Srikanth Parameswaran
Dear Srikanth,
I am sorry I cannot go through all the details and questions. Just a few suggestions:
1) TSSEM is an SEM. Therefore, most statistical theories in SEM, including model comparisons, apply to TSSEM.
2) We cannot compare models with different variables.
3) In principle, we can include all variables in the models by correlating "irrelevant" variables. This may allow us to compare models as the same variables are used. However, this approach may affect the model fit. This issue is related to the fixed-x vs. random-x approach in lavaan and Mplus.
Mike
Thanks a lot, Prof. Cheung, for your response—this is very helpful.
I have a follow-up question for your kind consideration. I used the following code snippet to put NA on diagonal for the variable with the least correlations. For me, this step sounds somewhat arbitrary for studies where more than one variable has the least present correlations—consequently, the 1s on the diagonal are arbitrarily assigned. Is there a way to handle any biases\reproducibility issues arising from this preprocessing step?
For example, in the following study, 1.00 is placed on the diagonal of F by this code snippet. I noticed that results differ if 1.00 is placed on the diagonal of another variable, say E.
X 1.000 NA 0.071 0.049 NA NA 0.063 0.103 0.104
Y NA NA NA NA NA NA NA NA NA
A 0.071 NA NA NA NA NA NA NA NA
B 0.049 NA NA NA NA NA NA NA NA
C NA NA NA NA NA NA NA NA NA
M NA NA NA NA NA NA NA NA NA
D 0.063 NA NA NA NA NA NA NA NA
E 0.103 NA NA NA NA NA NA NA NA
F 0.104 NA NA NA NA NA NA NA 1.000
Whether the diagonals are 1.0 or NA only affects tssem1(method="FEM"). It is because the diagonal elements whether the variables are present or missing.
But this has no impact on tssem1(method="REM") as the diagonal elements are not involved in the random-effects model.
Thank you for the response, Prof. Cheung. Your answer was enlightening, and I understood more about TSSEM. However, I am a little bit confused with this code snippet.
If diagonal elements aren't involved in REM, I am not sure why my stage-1 REM results differ after adding this code snippet (i.e., after adding step 4 in the attached code).
Based on your response and my finding of different results after adding this code snippet, I understand that NA in the diagonal indicates that a variable is missing. So, if I use the code snippet, I am excluding correlations that shouldn't be excluded because the random-effects model can handle missing data. So, for a random-effects model, I should not include the code snippet, i.e., step 4. Am I right?
Your thoughts on this question will help me finalize my analysis. Thank you for your kind consideration.
Sorry, there was a typo in the previous post. Please refer to the following post and ignore the previous one.
Thank you for the response, Prof. Cheung. Your answer was enlightening, and I understood more about TSSEM. However, I am a little bit confused with this code snippet.
If diagonal elements aren't involved in REM, I am not sure why my stage-1 REM results differ after adding this code snippet (i.e., after adding step 4, the last step, in the attached code).
Based on your response and my finding of different results after adding this code snippet, I understand that NA in the diagonal for the least present correlation indicates that the variable is missing. So, if I use this specific code snippet, I am excluding correlations that shouldn't be excluded because the random-effects model can handle missing data. So, for a random-effects model, I should not include the code snippet, i.e., step 4 (the last step). Am I right?
Your thoughts on this question will help me finalize my analysis. Thank you for your kind consideration.
Could you please attach the data and R code to reproduce the results?
I am grateful for your help. Thanks a lot.
I am sharing the data (M2 Data - After Preprocessing.csv) and the R code (REM Stage 1.txt) for your kind consideration.
On a related note, I am not sure when to check positive definiteness. is.pd() seems to produce different outputs based on where I use it. Without this snippet (i.e., step 4), is.pd() gives me a lot of NAs. I am attaching the data (M2 Data - Before Preprocessing.csv) and the R code (PreProcessing.txt).
So, from a preprocessing standpoint, in which step should I check positive definiteness for my REM analysis? Also, should I remove only those studies that are indicated as FALSE by is.pd()?
Your thoughts will significantly help me finalize my pipeline (preprocessing+analysis).
After comparing the before and after processing data, it indicates that it replaces a space with NA in your CSV files (please see the attached screenshot). Thus, it has nothing related to OpenMx and metaSEM.
A matrix is either positive definite or not (see Wiki for the definition https://en.wikipedia.org/wiki/Definite_matrix). However, it is unknown when there are NA. The is.pdf() in the metaSEM tries to follow the above definition. The manual says:
"It tests the positive definiteness of a square matrix or a list of square matrices. It returns TRUE if the matrix is positive definite. It returns FALSE if the matrix is either non-positive definite or not symmetric. Variables with NA in the diagonals will be removed before testing. It returns NA when there are missing correlations even after deleting the missing variables."
If it is still unclear, please let me know.
If you are confident that you don't need the tssem1() function to check the positive definiteness, you may assign 1.0 on all the diagonals on the correlation matrices. You may refer to the attached example.
Thanks a lot, Prof. Cheung, for your response.
(1) The column (final_r) referred to in your screenshot was not used in the tssem1 REM analysis. I have removed that column in the attached data (M2 data.csv) and performed tssem1 REM analysis using the attached code (REM Stage 1.txt). Still, I notice that after step 4, the tssem1 REM analysis produces results different from that of the tssem1 REM analysis after step 2 and step 3 (REM Stage 1.txt - Results).
(2) In the example you shared (test.pdf), the data is not missing at the correlation level. However, my data (M2 data.csv) has missing data at the correlation level. Can I assign 1.0 for all the diagonals despite missingness at the correlation level?
(3) "If you are confident that you don't need the tssem1() function to check the positive definiteness, you may assign 1.0 on all the diagonals on the correlation matrices."
Are you referring to the positive definiteness of the pooled correlation matrix or the individual correlation matrices from primary studies?
I am unsure when I can be confident. Can I be confident if I removed the non-positive definite matrices myself using the help of is.pd() command?
(2) Yes, this is the simplest way to address your concern.
(3) If you apply (2), the function won't check whether the individual correlation matrices are positive definite. If you enter the data incorrectly, it is hard to know. Of course, you may also manually check them if you want.
Thanks a lot, Prof. Cheung, for your response.
Your presence and feedback in this community have been invaluable to metaSEM users like me.