You are here

Ordinal variables with many categories

6 posts / 0 new
Last post
rabil's picture
Offline
Joined: 01/14/2010 - 16:47
Ordinal variables with many categories

If an ordinal variable has 20 categories, is it practical to try to handle this with thresholds? Or would it be better to treat as continuous?

neale's picture
Offline
Joined: 07/31/2009 - 15:14
If you have enough data

It is practical, but the different between threshold analysis and continuous would be slight if the density distribution of the categories is approximately normal. A highly skewed 20 category variable may still be better analyzed as ordinal - depending on how it got that way. Of course, it's a lot more parameters and computationally slower to analyze ordinal data with lots of categories.

lingsuer87's picture
Offline
Joined: 05/03/2013 - 09:10
I am quite new for OpenMX. I

I am quite new for OpenMX. I have not figure out what the difference between binary data(with one threshold) with more than 2 category (say a variable with N category, so we have N-1 numbers of threshold).How does the script like? could you please give me some basic script as explaination? Thank you so much.

neale's picture
Offline
Joined: 07/31/2009 - 15:14
See response to other thread

http://openmx.psyc.virginia.edu/thread/2087

tbates's picture
Offline
Joined: 07/31/2009 - 14:25
lots of levels, but values below a cutoff set to 0

What if the data have lots of levels, but are the right hand side of a normal distribution, with all values below ~ z==0 set to 0. e.g.: clinical symptom data with continuous severity above the threshold for having any symptoms?

> table(mxFactor(ocd$OCI_OBSESSION, levels = c(0:20)))
   0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20
2813  746  462  295  197  107   84   49   48   37   21   13   14    4    9    7    7    1    2    0    2

Then one needs to stay ordinal to avoid the model treating everyone with score "0" as having risk ~ scores of 1.

Given that analysis is prohibitive (can be hundreds of hours even running multi-core) for a large multivariate model where variables have this many levels (and some empty categories), is there a logical basis for cutting the data into fewer quantiles? How would one justify a given choice (say 5, vs, 7, vs 10 or 12 levels?) Use a criterion like no group smaller than 50 or so?

 table(cut2(ocd$OCI_OBSESSION, m=75))
      0       1       2       3       4       5       6 [ 7, 9) [ 9,11) [11,29] 
   2813     746     462     295     197     107      84      97      58      69 
neale's picture
Offline
Joined: 07/31/2009 - 15:14
Maybe not so large

Smaller than 50 can work too, but it depends on what the joint distribution with other variables being analyzed looks like. If these other variables are binary or ordinal, and don't have appreciable cell sizes, the polychoric correlation can be indeterminate due to zero cells. So it's not easy to have a rule of thumb to decide minimum cell frequency. If the categories are pretty arbitrary, then you might think about approximate deciles as a compromise between continuous information but ordinal data. If the first cell occupies several deciles, well that it will have to do.