Ordinal variables with many categories

Posted on
Picture of user. rabil Joined: 01/14/2010
If an ordinal variable has 20 categories, is it practical to try to handle this with thresholds? Or would it be better to treat as continuous?
Replied on Sat, 05/26/2012 - 17:42
Picture of user. neale Joined: 07/31/2009

It is practical, but the different between threshold analysis and continuous would be slight if the density distribution of the categories is approximately normal. A highly skewed 20 category variable may still be better analyzed as ordinal - depending on how it got that way. Of course, it's a lot more parameters and computationally slower to analyze ordinal data with lots of categories.
Replied on Fri, 05/03/2013 - 08:25
No user picture. lingsuer87 Joined: 05/03/2013

In reply to by neale

I am quite new for OpenMX. I have not figure out what the difference between binary data(with one threshold) with more than 2 category (say a variable with N category, so we have N-1 numbers of threshold).How does the script like? could you please give me some basic script as explaination? Thank you so much.
Replied on Wed, 03/25/2015 - 10:31
Picture of user. tbates Joined: 07/31/2009

In reply to by neale

What if the data have lots of levels, but are the right hand side of a normal distribution, with all values below ~ z==0 set to 0. e.g.: clinical symptom data with continuous severity above the threshold for having any symptoms?


> table(mxFactor(ocd$OCI_OBSESSION, levels = c(0:20)))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2813 746 462 295 197 107 84 49 48 37 21 13 14 4 9 7 7 1 2 0 2

Then one needs to stay ordinal to avoid the model treating everyone with score "0" as having risk ~ scores of 1.

Given that analysis is prohibitive (can be hundreds of hours even running multi-core) for a large multivariate model where variables have this many levels (and some empty categories), is there a logical basis for cutting the data into fewer quantiles? How would one justify a given choice (say 5, vs, 7, vs 10 or 12 levels?) Use a criterion like no group smaller than 50 or so?


table(cut2(ocd$OCI_OBSESSION, m=75))
0 1 2 3 4 5 6 [ 7, 9) [ 9,11) [11,29]
2813 746 462 295 197 107 84 97 58 69

Replied on Wed, 03/25/2015 - 11:36
Picture of user. neale Joined: 07/31/2009

In reply to by tbates

Smaller than 50 can work too, but it depends on what the joint distribution with other variables being analyzed looks like. If these other variables are binary or ordinal, and don't have appreciable cell sizes, the polychoric correlation can be indeterminate due to zero cells. So it's not easy to have a rule of thumb to decide minimum cell frequency. If the categories are pretty arbitrary, then you might think about approximate deciles as a compromise between continuous information but ordinal data. If the first cell occupies several deciles, well that it will have to do.