It is practical, but the different between threshold analysis and continuous would be slight if the density distribution of the categories is approximately normal. A highly skewed 20 category variable may still be better analyzed as ordinal - depending on how it got that way. Of course, it's a lot more parameters and computationally slower to analyze ordinal data with lots of categories.
I am quite new for OpenMX. I have not figure out what the difference between binary data(with one threshold) with more than 2 category (say a variable with N category, so we have N-1 numbers of threshold).How does the script like? could you please give me some basic script as explaination? Thank you so much.
lots of levels, but values below a cutoff set to 0
What if the data have lots of levels, but are the right hand side of a normal distribution, with all values below ~ z==0 set to 0. e.g.: clinical symptom data with continuous severity above the threshold for having any symptoms?
Then one needs to stay ordinal to avoid the model treating everyone with score "0" as having risk ~ scores of 1.
Given that analysis is prohibitive (can be hundreds of hours even running multi-core) for a large multivariate model where variables have this many levels (and some empty categories), is there a logical basis for cutting the data into fewer quantiles? How would one justify a given choice (say 5, vs, 7, vs 10 or 12 levels?) Use a criterion like no group smaller than 50 or so?
Smaller than 50 can work too, but it depends on what the joint distribution with other variables being analyzed looks like. If these other variables are binary or ordinal, and don't have appreciable cell sizes, the polychoric correlation can be indeterminate due to zero cells. So it's not easy to have a rule of thumb to decide minimum cell frequency. If the categories are pretty arbitrary, then you might think about approximate deciles as a compromise between continuous information but ordinal data. If the first cell occupies several deciles, well that it will have to do.
It is practical, but the different between threshold analysis and continuous would be slight if the density distribution of the categories is approximately normal. A highly skewed 20 category variable may still be better analyzed as ordinal - depending on how it got that way. Of course, it's a lot more parameters and computationally slower to analyze ordinal data with lots of categories.
I am quite new for OpenMX. I have not figure out what the difference between binary data(with one threshold) with more than 2 category (say a variable with N category, so we have N-1 numbers of threshold).How does the script like? could you please give me some basic script as explaination? Thank you so much.
http://openmx.psyc.virginia.edu/thread/2087
What if the data have lots of levels, but are the right hand side of a normal distribution, with all values below ~ z==0 set to 0. e.g.: clinical symptom data with continuous severity above the threshold for having any symptoms?
Then one needs to stay ordinal to avoid the model treating everyone with score "0" as having risk ~ scores of 1.
Given that analysis is prohibitive (can be hundreds of hours even running multi-core) for a large multivariate model where variables have this many levels (and some empty categories), is there a logical basis for cutting the data into fewer quantiles? How would one justify a given choice (say 5, vs, 7, vs 10 or 12 levels?) Use a criterion like no group smaller than 50 or so?
Smaller than 50 can work too, but it depends on what the joint distribution with other variables being analyzed looks like. If these other variables are binary or ordinal, and don't have appreciable cell sizes, the polychoric correlation can be indeterminate due to zero cells. So it's not easy to have a rule of thumb to decide minimum cell frequency. If the categories are pretty arbitrary, then you might think about approximate deciles as a compromise between continuous information but ordinal data. If the first cell occupies several deciles, well that it will have to do.