You are here

Growth curve model with ordinal data!

2 posts / 0 new
Last post
Sunny Wu's picture
Offline
Joined: 03/13/2014 - 21:22
Growth curve model with ordinal data!

Hi,
I have 50 binary variables with values of either 0 or 1. I tried to treat these variables as ordinal and fit a growth curve model.
OpenMx gave wrong message:

Running Linear Growth Curve Model
Error: The data object 'Linear Growth Curve Model.data' contains 50 ordered factors but our ordinal integration implementation has a limit of 20 ordered factors.

Is that true that I only can fit 20 ordered factors using OpenMx? Could you point me a direction how to fix this problem? Any help or suggestions will be appreciated. Thanks!

Ryne's picture
Offline
Joined: 07/31/2009 - 15:12
You've hit a limit of OpenMx:

You've hit a limit of OpenMx: we only allow 20 ordinal variables for regular FIML modeling. I'll explain why, then tell you what you can do about it. If I'm too detailed or technical, let me know. If I'm speaking below your level, remember that I'm not only speaking to you, but all future readers on this forum that might need more help.

OpenMx handles ordinal data through integration of your model's expected covariance matrix given the parameters, and generates expected likelihoods for the categorical responses in each data row given that covariance structure. Say we had one person and one ordinal variable with five categories, and the person selects the middle category. OpenMx would figure out the probability of endorsing that category like so:

-Look at the model-implied covariance and means to see what normal distribution underlies that item (say, mean of 0 and variance of 1).
-Find the thresholds that border this response. There will be four thresholds for five categories, with those thresholds splitting the normal distribution into five section. Given that our individual endorsed the middle category, we'd select the second and third thresholds, which might have values of -0.5 and 0.5.
-Integrate the probability density function (pdf) given normal distribution (mean=0, var=1) between the thresholds (-0.5 and 0.5). This is the likelihood of that response given the model.

For those readers unfamiliar with this type of calculus, the pdf is the bell curve you may see when people discuss the normal distribution. The total area between that curve and the x-axis is always 1.0, and the total area under any section of the curve corresponds to the probability of having a score in that section. When we integrate the standard normal distribution between -0.5 and 0.5, we essentially draw vertical lines at those points, and ask how much area is in the section of the bell curve between these lines. For nice univariate problems like this one, you can figure this out using pnorm: pnorm(0.5) - pnorm(-0.5) will show how much of the standard normal distribution is less than 0.5 but not less than -0.5, which is the same thing.

So this process is relatively straightforward for univariate cases. We use Alan Genz routines for multivariate normal integration (http://www.math.wsu.edu/faculty/genz/software/software.html), which is fantastic stuff that interested researchers should read all about. But as you start adding variables and parameters, two things happen. First, normal interation gets much more computationally intensive, with processing time for each integration increasing rapidly with each additional variable. Furthermore, every new parameter in your model adds a new dimension to the likelihood space, doubling or more the processing time associated with each iteration step. For these reasons, Genz' algorithm and OpenMx place a cap of 20 variables, as problems requiring a higher dimensional integration should

Here's a Genz paper comparing different multivariate integration algorithms:
http://www.sci.wsu.edu/math/faculty/genz/papers/mvtcmpn.pdf

Okay, so I haven't told you what to do about it yet. Two features for OpenMx 2.0 are being developed to solve this problem. One is weighted least squares optimization. Under this method, you'll convert your data to a set of polychromic correlations and their asymptotic covariance structure, and OpenMx will optimize your model on this summary of your data rather than the raw data. The other is item factor analysis, which will integrate factors out of your model (I believe by E-M, but other methods may be available), leaving you with locally independent ordinal variables that avoid this integration problem altogether. Other solutions include item parcelling, treating your data as continuous, and using software packages other than OpenMx, each of which I dislike to some degree.

Noting that your problem is a growth curve model, you may have some other solutions. If you are formatting your data "wide" with lots of missing data, it is possible to reformat your model and data to get under this limit. I'm currently dealing with a similar problem: I have 40+ years (spanning ages 30 to 90-something) of a single ordinal item indicator, but no individual is assessed more than 9 or 10 times. While its easiest to spread the data out and pretend I have 60 waves of data, I'm currently writing alternate code that reformats the model for each individual to get around this limitation. If that's what you're doing, I'd be happy to share code when I finish it. If you're assessing a latent construct at multiple occasions, you might instead generate factor scores or IRT-based trait scores at each wave, then fit a growth model to those. Finally, there are ways to fit both item response models and growth curves as multilevel models: you could have items within time points within persons and generate a three-level item-level growth curve, but I don't know that this has actually been done.

Tell me more about your problem and I'll see if I can offer a specific solution. Good luck, and happy modeling!