OpenMx script for twin data to parallelize
Posted on
a.ortega
Joined: 02/05/2011
Forums
Hi!
I have a script for ordinal twin data that takes about 5-6 hours to compute in my laptop. I was wondering whether it would be beneficial in this particular case to parallelize the work in order to reduce the computational time. Would it be possible to adapt this script to do the job?
I have a script for ordinal twin data that takes about 5-6 hours to compute in my laptop. I was wondering whether it would be beneficial in this particular case to parallelize the work in order to reduce the computational time. Would it be possible to adapt this script to do the job?
I have been reading the OpenMx notes in the manual to implement the parallelization with the "Snowfall" package, but they seemed a bit odd to me (I have not too much experience in parallelizing with R). Any indication would be very appreciated. I attach here the script if somebody wish to have a look.
Best regards,
-Alfredo
I will address the question
Log in or register to post comments
In reply to I will address the question by mspiegel
Michael, Considering the long
Considering the long computational time using raw data in this job, and the difficulties for the parallelization, does it make sense to you fitting this longitudinal growth model to "correlation matrices" (instead of to raw ordinal data)?
Do you think that there may be issues with accuracy using this alternative approach?
Thanks!
Log in or register to post comments
In reply to Michael, Considering the long by a.ortega
Correlation matrices
A few points about fitting models to matrices of polychoric or tetrachoric correlation matrices. First, unlike covariance matrices, the precision of tetrachoric correlations varies as a function of where the thresholds are. For example, the tetrachoric correlation between two binary variables with a 50:50 split of 0/1 responses will be more precise than one where the variables are 90:10 split. So it is necessary to "tell" the model-fitting procedure how accurate the correlations are. It is also necessary to tell it how much the correlation statistics covary with each other. One approach is to use a weight matrix; Michael Browne wrote about this method in a seminal 1982 paper, on "Asymptotically Distribution Free" methods. At this time, however, OpenMx does not have a simple way of estimating a suitable weight matrix. However, it might be possible to fit a single model with every correlation (and every threshold) free, and use the calculated Hessian as a weight matrix. There is a need to make this procedure more efficient (it is possible to estimate the correlations in a pairwise fashion which speeds this method up a lot).
Second, an advantage of FIML is that it provides a natural framework for modeling datasets which contain missing values (and most do; be wary of people who say they have no missing data). Missing data patterns can also create variability in the precision of different correlations, and while FIML handles most types of missing data (MCAR and MAR in Little & Rubin terms) well, other methods often do less well.
Third, you are working with longitudinal growth models, which usually involve making predictions about the means (thresholds) as well as the variances. It would be important to include the thresholds in the weight matrix, so that we end up fitting a model to both the thresholds and the means.
In the light of these issues, if you have the patience - or access to a cluster or grid - then I'd still go with the 5 hour version using FIML, and fit a limited set of models. Alternatively, please feel free to write a weight-matrix calculating function :)
Log in or register to post comments
In reply to Correlation matrices by neale
Thank you, Michael! I think I
Log in or register to post comments
In reply to Thank you, Michael! I think I by a.ortega
Starting values can have a
Log in or register to post comments
Currently, OpenMx can execute
Log in or register to post comments