Importing correlation matrices (beginner)

Posted on Wed, 10/02/2013 - 10:30

nickz Joined: 10/02/2013

Forums

Dear all,

I recently started exploring the metaSEM package and worked my way through some of the examples found online.

However, I am new to R, OpenMx, and metaSEM and have some difficulty with importing my own data.

I would like to import a number of correlation matrices, corresponding sample sizes, and clusters, but I don't know how to structure the input.

Is it possible to input the data using a csv file? If so, how should the file be structured? If not, what is the best way to input a large number of correlation matrices?

In a different post I saw a similar solution for the standardized mean difference and its sampling variance (http://openmx.psyc.virginia.edu/thread/1521), but not for correlation matrices.

I hope you can point me in the right direction.

Thanks,

Nick

Replied on Wed, 10/02/2013 - 21:20

Mike Cheung Joined: Oct 08, 2009

Hi Nick, The best way is to

Hi Nick,

The best way is to import the correlation matrices, the sample sizes and the clusters separately. The sample sizes and the clusters are just vectors, e.g.,
n <- c(100, 200, 300, 100)
cluster <- c("A", "A", "B", "B")

Suppose there are two correlation matrices to import:
1.0 0.3 0.4
0.3 1.0 0.5
0.4 0.5 1.0

1.0 NA 0.4
NA NA NA
0.4 NA 1.0

metaSEM provides three functions to read correlation matrices.

readFullMat() reads the full matrices, e.g.,
1.0 0.3 0.4
0.3 1.0 0.5
0.4 0.5 1.0
1.0 NA 0.4
NA NA NA
0.4 NA 1.0

readLowTriMat() reads the lower triangle matrices, e.g.,
1.0
0.3 1.0
0.4 0.5 1.0
1.0
NA NA
0.4 NA 1.0

readStackVec() reads the vectors of the correlation matrices, e.g.,
1.0 0.3 0.4 1.0 0.5 1.0
1.0 NA 0.4 NA NA 1.0

You may refer to the examples in the manual by typing ?readFullMat

Mike

Log in or register to post comments

Replied on Thu, 10/03/2013 - 05:15

nickz Joined: Oct 02, 2013

Hi Mike, Thank you for taking

Hi Mike,

Thank you for taking the time to point me in the right direction and also for your quick response.

I can't believe I overlooked the manual (didn't know how to find it...).

I managed to import my own data and am now working on figuring out how to import a large number of (fairly large) matrices efficiently. I'm sure I'll manage with the directions you gave above.

Thanks again,

Nick

Log in or register to post comments

Replied on Fri, 11/01/2013 - 11:21

nickz Joined: Oct 02, 2013

importing matrices and missing data

Dear all,

I came across another issue when trying to import correlation matrices. I am trying to import correlation matrices from various studies. Only a few of the correlation matrices are complete, some have missing variables and some have missing correlations. I would like to know how to handle missing correlation, rather than missing variables.

Missing variables need to be flagged on the diagonal as NA. The next example shows the second variable to be missing and flagged on the diagonal.

1	NA	0.39
NA	NA	NA
0.39	NA	1

However, I am not sure how to handle missing correlations. In the following example there are no missing variables, but the correlation between variable one and two are missing.

1	NA	0.31
NA	1	0.10
0.31	0.10	1

Inputting the correlation matrix above as is results in an error due to the missing correlation.

> fixed1 <- tssem1(data,n,method="FEM")
Error in eigen(x, only.values = TRUE) : infinite or missing values in 'x'

If I flag the variable as above I resolve the error, but I lose the information associated with the correlation. The correlation between the two variables is treated as a missing value.

data[[1]][1,1]<-NA

The following two matrices provide the same result.

NA	NA	0.31
NA	1	0.10
0.31	0.10	1

NA	NA	NA
NA	1	0.10
NA	0.10	1

The problem is also mentioned here (Cheung and Chan, 2005, p.45):

"Missing correlation coefficients, rather than missing variables, are observed in MASEM sometimes. It is not easy to handle missing correlation coefficients in SEM."

Is there a way to handle missing correlations without disregarding them? Is there a suggested workaround?
Disregarding the correlations or study coefficients seems arbitrary and I rather not do that. Substituting an average value also seems abit crude.

I've attached the dataset (fulldata_example1.dat) and code file (example_1.r) to clarify the issue.

Thanks,

Nick

Reference:
Cheung, M.W.-L., & Chan, W. (2005). Meta-analytic structural equation modeling: A two-stage approach. Psychological Methods, 10, 40-64.

Log in or register to post comments

Replied on Fri, 11/01/2013 - 23:00

Mike Cheung Joined: Oct 08, 2009

Hi Nick, There are a couple

Hi Nick,

There are a couple of options. My preferred approach is to use a random-effects model (Cheung, 2013).
## Symmetric matrix for the random effects
random1 <- tssem1(data,n,method="REM")
summary(random1)

# OR diagonal matrix for the random effects
random2 <- tssem1(data,n,method="REM", RE.type="Diag")
summary(random2)

If you still prefer a fixed-effects model, you may fix the variance component of the random effects to zero. This is equivalent to the conventional GLS approach.
fixed4 <- tssem1(data,n,method="REM", RE.type="Zero")
summary(fixed4)

Mike

Cheung, M. W.-L. (2013, June 27). Fixed- and random-effects meta-analytic structural equation modeling: Examples and analyses in R. Behavior Research Methods. Advance online publication. doi:10.3758/s13428-013-0361-y

Log in or register to post comments

Replied on Mon, 11/04/2013 - 04:42

nickz Joined: Oct 02, 2013

reply

Hello Mike,

Thank you for your response. I really appreciate the time you have taken to answer my questions. I'm learning a lot.

I do think that the random effects model is more appropriate and it runs fine. However, I would like to cluster to groups based on study level moderators to see if I can resolve at least part of the heterogeneity. The optional argument 'cluster' is disabled, when the "method = 'REM'" is chosen.

I suppose the solution would be to partition the sample and run the random effects analysis two times (asses the difference in Q and I2 statistics). However, if I understand correctly, you loose the ability to use fit statistics to asses homogeneity as they are not available in stage 1 analysis under the random effects model (Cheung, 2013). Why is this?

One of the advantages of a TSSEM is the ability to evaluate heterogeneity using multiple goodness-of-fit indices under the fixed effects model.

Is there a way to do this is if missing correlations are present? I would like to run the fixed effects model to evaluate whether or not the results are homogeneous (using goodness-of-fit indices) and continue (and repeat stage 1) under the random effects model if the results are not homogeneous.

Kind regards,

Nick

Cheung, M. W.-L. (2013, June 27). Fixed- and random-effects meta-analytic structural equation modeling: Examples and analyses in R. Behavior Research Methods. Advance online publication. doi:10.3758/s13428-013-0361-y

Log in or register to post comments

Replied on Mon, 11/04/2013 - 22:40

Mike Cheung Joined: Oct 08, 2009

Hi, Nick. Fit indices are

Hi, Nick.

Fit indices are available for the fixed-effects model because it uses a multiple-group SEM approach.

For random-effects model, the effect sizes for each study are treated as data points with known sampling covariance/variances. Multivariate meta-analysis (meta() in the metaSEM package) is used to estimate the mean and variance component of the effect sizes under a random-effects model. Suppose there are 3 effect sizes per study, there are totally 3 means and 6 variances/covariances. Thus, the model is saturated. The so-called fit indices do not work here.

If you really want to do it within the fixed-effects model, you may contact Suzanne Jak (http://www.uva.nl/over-de-uva/organisatie/medewerkers/content/j/a/s.jak/s.jak.html). She has done some work on how to handle missing correlations within the TSSEM.

Mike

Jak, S., Roorda, D. L., Oort, F. J. & Koomen, H. M. Y. (2013). Meta-analytic structural equation modelling with missing correlations. Netherlands Journal of Psychology, 67, 132 - 139.

Log in or register to post comments

Replied on Fri, 11/15/2013 - 15:38

Nicola Joined: Nov 15, 2013

Error when running both random effects options.

Hi Mike,

I have the same issue as Nick expressed. However, when I ran the two random-effects options, they did not run successfully. I get the following errors. Attached is my data set. Below is my sample size matrix.

##MATRIX FOR SAMPLE SIZE OF EACH STUDY
TAMn = matrix (c(425, 73, 964, 225, 137, 243, 275, 102, 128, 362, 109), nrow=1, ncol=11, byrow = TRUE)

When I run with the first option, I get the following error.
## Symmetric matrix for the random effects
random1 <- tssem1(TAMdata,TAMn,method="REM")

Error in running mxModel:

When I run with the section option, I get the following error.
# OR diagonal matrix for the random effects
random2 <- tssem1(TAMdata,TAMn,method="REM", RE.type="Diag")
summary(random2)

Error in solve.default(t(X) %*% V_inv %*% X) :
Lapack routine dgesv: system is exactly singular: U[9,9] = 0

I have tried searching for these errors, but have not found how to resolve the issue. Thank you for your assistance.

Nicola

Log in or register to post comments

Replied on Sat, 11/16/2013 - 02:34

Mike Cheung Joined: Oct 08, 2009

Hi Nicola, There are totally

Hi Nicola,

There are totally 10 correlation coefficients in the model. When a diagonal matrix is imposed on the variance component, there are still 20 parameters (10 for the mean correlations and 10 for their variances). I don't think that 11 studies (with missing values) are sufficient to fit this model.

A fixed-effects model also does not work in this example. It is because there is no data for the correlation between X3 and X5.

Hope it helps.

Mike

Log in or register to post comments

Replied on Mon, 11/18/2013 - 11:49

Nicola Joined: Nov 15, 2013

Thank you!

Hi Mike,

Thank you for your response. I really appreciate the time you have taken to answer my questions. I'm learning a lot. Instead of pooling the 11 studies together, I will group the 11 studies into four groups that will have no missing correlations and run a fixed-effects MASEM for the four groups. I assume this is reasonable. Thanks again for you your help.

Kindest regards,
Nicola

Log in or register to post comments

Replied on Tue, 07/26/2016 - 05:26

nastjuscha Joined: Jun 28, 2016

Which option now?

Hi Mike,

We recently tried to run a TSSEM model with a large number of missing correlations and missing variables using REM and RE.type="Diag" plus acov="weighted" (there were error warnings when we tried to run the model without the latter two options). We got a solution returned with this specification, yet with code 6 on line status1.

Unfortunately, re-running the model from its first solution, as suggested, did not help, even after 11 trials. We received the error message that there were negative eigenvalues of Hessian, and further trials even worthened the solution (though by minimal margins).

We then produced the stage 1 Matrix using the GLS option (REM with variance component fixed to zero), as suggested above in your third option. This seems to yield a proper solution, as indicated by the status line (code 0).

Now we are left with two options none of which appears optimal: Either we could proceed with the REM solution that may lack optimality (code 6), or we may impose a FEM on data that in all likelihood are heterogenous.

Is there anything you could suggest?

One suspicion we can offer: In the individual study input matrices we used for computing the stage1 solution, we entered 1s in the diagonals even if the respective variable was not measured in that study. Would it help to replace those by NA?

Thanks, Nastja

Log in or register to post comments

Replied on Tue, 07/26/2016 - 10:43

Mike Cheung Joined: Oct 08, 2009

Hi Nastja, The REM solution

Hi Nastja,

The REM solution is not an option as the results cannot be trusted (error code 6). You may calculate the heterogeneous variances of the correlations by running the analyses on each correlation coefficients. This may give you some ideas on how heterogeneous the correlations are.

Using 1 or NA in the diagonals should not affect it when you are treating them as correlations.

Best,
Mike

Log in or register to post comments

Replied on Wed, 07/27/2016 - 08:11

nastjuscha Joined: Jun 28, 2016

Hi Mike:Thanks for your

Hi Mike:

Thanks for your response. If I got you right, you are suggesting to conduct a traditional bivariate meta-analyses on every single coefficient, right?

Provided we do that and find evidence of heterogeneity, what’s next? Conduct an SEM analysis based on bivariate correlations (which would be the approach TSSEM has been proposed to overcome)? Proceed to stage 2 with the stage 1 FEM-based matrix despite heterogeneity? We are admittedly a little lost now.

There are two additional questions that popped up during our trials to solve the problem:

- For two of the correlation matrices that originally appeared proper in initial checks (#8, 34), we now receive FALSE messages. This is strange, as these matrices were not changed at all for the new analyses.
- Another clarification questions: For stage 2 analyses, in your sample data you used weightings for computing estimates of parameters (like .2 or .3*parameter). How do you come to those weights?

Sorry for being such a pain in the neck.

Best, Nastja

PS: Now we received a new error warning:
"Error in list2matrix(x = suppressWarnings(lapply(my.df, cov2cor)), diag = FALSE) :
length of 'dimnames' [1] not equal to array extent"
The dimnames are x1 and x2 - what is wrong here?

Log in or register to post comments

Replied on Thu, 07/28/2016 - 05:25

Mike Cheung Joined: Oct 08, 2009

Hi Nastja, I suggest to

Hi Nastja,

I suggest to analyze the bivariate correlations in order to get some senses on how large the heterogeneity on the correlations is. If the heterogeneity variances are small, it may be possible to justify the use of the fixed-effects model. I still prefer to apply the TSSEM rather than the bivariate analysis as the TSSEM includes the dependence of the correlation coefficients.

I don't know that there are "correct" solutions for your problems because of the limited data. If you analyze the data with the fixed-effects approach, the standard errors are likely under-estimated if the data are heterogeneous. Another approach is to simplify your model by dropping some variables. Probably you may need to spend some time to compare the pros and cons of various approaches.

Regarding the FALSE messages in checking the correlation matrices, are you referring to the is.pd()? The formula was changed in Release 0.9.9-0 in Github https://github.com/mikewlcheung/metasem/blob/master/NEWS This will affect you if you installed the metaSEM package from Github.

The values in .2 or .3*parameter are the starting values. It is fine to use any reasonable values. The final solutions should not depend on them.

PS. The error message seems to suggest that the dimensions of the matrices are different. The program expects that the no. of variables in the matrices are the same.

Best,
Mike

Log in or register to post comments

Replied on Fri, 07/29/2016 - 08:41

nastjuscha Joined: Jun 28, 2016

Hi Mike, thanks for your

Hi Mike,

thanks for your ideas, we try the different options.
Unfortunately the installation of the metaSEM package from Github didn't solve the problem with is.pd() - see attachment.

Do you have any other ideas?

Thanks, Nastja

Log in or register to post comments

Replied on Fri, 07/29/2016 - 10:20

Mike Cheung Joined: Oct 08, 2009

Hi Nastja, It seems that

Hi Nastja,

It seems that these issues can be better handled offline.
Would you mind sending the data, R code, and the errors to me by email? Thanks.

Best,
Mike

Log in or register to post comments

Replied on Mon, 08/24/2015 - 09:07

RobertSuurmond Joined: Aug 24, 2015

Importing correlation matrices [error]

Dear all,

A similar problem as the one above, which is why i am replying rather than opening a new thread.

I am trying to import 88 correlation matrices with 17 variables. None of these matrices is fully filled, but combined they should give information for all of the cells. However, when I try to read the data i get the following error:

# matrices<-readLowTriMat(file="C:/correlation matrices.dat", no.var=17)
#Read 13455 items
#Error in readLowTriMat(file = "C:/correlation matrices.dat", : No. of elements read != no.var*(no.var+1)*no.of.studies.

Can you help me out on why this happens and how I can fix it? You can find the data file attached.

Best,
Robert

Log in or register to post comments

Replied on Mon, 08/24/2015 - 23:13

Mike Cheung Joined: Oct 08, 2009

Hi Robert, There are

Hi Robert,

There are 17*18/2=153 elements (including the diagonals) in each correlation matrix. Since you have 88 correlation matrices, the total no. of elements is 153*88=13464. The error message indicates that there are only 13455 items read. For some reasons, 9 items are missing in the data file.

Regards,
Mike

Log in or register to post comments

Replied on Tue, 08/25/2015 - 05:33

News

Recent Posts

Importing correlation matrices (beginner)

Hi Nick, The best way is to

Hi Mike, Thank you for taking

importing matrices and missing data

Hi Nick, There are a couple

reply

Hi, Nick. Fit indices are

Error when running both random effects options.

Hi Nicola, There are totally

Thank you!

Which option now?

Hi Nastja, The REM solution

Hi Mike:Thanks for your

Hi Nastja, I suggest to

Hi Mike, thanks for your

Hi Nastja, It seems that

Importing correlation matrices [error]

Hi Robert, There are

Thank you so much!

News

Recent Posts