Improving ordinal model specification

9 posts / 0 new
Offline
Joined: 07/31/2009 - 15:24
Improving ordinal model specification

Two suggestions have been proposed by smedland for improving the specification of ordinal models. I support both of these suggestions, and encourage OpenMx users to leave their feedback on these proposed changes.

1. Allow neighboring duplicate values to exist within a column of the thresholds matrix. CURRENT: a column must consist of strictly increasing values. CHANGE: a column must consist of non-decreasing values. Internally, neighboring duplicate values are collapsed into a single threshold.

2. In a FIML objective function, if thresholds are specified and expected means are not specified, then internally create an expected means vector of 0's. CURRENT: all FIML objectives require an expected means vector. CHANGE: FIML objectives with thresholds do not require means.

In addition to these two proposed changes by Sarah, I have two additional proposals of my own. These are intended to simplify the current interface.

1. Eliminate using NAs at the end of columns in the thresholds matrices. Contingent on proposal (1), use duplicate values to fill a column instead of NAs. Possibly use +Inf at the bottom of columns.

2. Eliminate the possibility of multiple threshold matrices. I find this interface confusing and prone to errors.

Offline
Joined: 07/31/2009 - 14:25
1. Allow neighboring
1. Allow neighboring duplicate values (collapsed into a single threshold)
2. Auto creating expected means as a vector of 0's.
make sense

I don't fully understand 3. Are there side effects of 4?

Offline
Joined: 07/31/2009 - 15:10
I agree that we need to take

I agree that we need to take a second look at the way that OpenMx specifies thresholds--it seems like we've introduced too much complexity here.

Some issues I see with the current proposals:

1) Duplicate values will cause some problems, I think. If we allow subsequent thresholds to collapse, it's possible for two thresholds made up of free parameters to overlap and cause a level of that variable to have a zero likelihood. Remember that these are calculated each time in the back-end.

2) We've had a lot of discussion about specifying a default means-model. I'm ok with it either way--it's a question of whether the benefit of forcing users to recognize that they have a means model is cancelled by the inconvenience of actually specifying it. Users, your thoughts?

3) I think you're right about getting rid of NAs. The way that NAs propagate through algebras gets to be fairly inconvenient. Infs don't work any better because Inf*0 == NaN. I'm not quite sure what the appropriate way to specify it should be, however.

4) I kind of like the idea of letting people specify the thresholds column-by-column instead of in a large matrix, but I'm willing to let it go if it's confusing.

Here's an alternative proposal that we've talked about before, but which has resurfaced in a conversation I just had with Mike Neale and Tim Bates.

1) Force people to use R's ordered factors. This requires looking at each column that's ordinal, and using a line like

data$col <- ordered(data$col, levels=0:max(data$col)) # ^^^ Classic Mx Default or data$col <- ordered(data\$col, levels=c("strongly disagree", "disagree", "don't care", "agree", "strongly agree")

and ensures that the factor has a correct ordering of levels and a correct number of levels.

2) Thresholds must be specified as a matrix:

One column must exist for each ordered factor in the covariance algebra. That is, for a FIML Objective, any data columns specified in the  dimnames()  of the  mxFIMLObjective()  statement that are ordered factors are matched in order to the columns of the threshold matrix. If these don't match, an error is thrown. If there are dimnames, that should override order as far as matching data columns to threshold columns.

Within each column, elements must be strictly increasing. For an ordinal factor with N levels, there must be at least (N-1) thresholds. Any additional thresholds are ignored. This allows users to use the trick of specifying offsets and multiplying by a lower triangular matrix, or specify an arbitrary value.

3) I'd recommend against actually defaulting the means vector. Instead, I'd recommend something like  means=0  as an argument to the FIML objective.

Thoughts?

Let me know if there's anything unclear in there and I'll to clarify. Thanks!

Offline
Joined: 07/31/2009 - 15:30
1) I second the idea of using

1) I second the idea of using ordered thresholds. An important advantage may be in the multiple-groups case when different groups may have completely missing data for different cateogories for the same variable. In this case, ordered threshold may allow a natural default.
3) I am not sure what "defaulting to a means vector" would entail. It seems, the user ought to be aware that means (and standard-deviations) are implied, or could be made aware of the fact that a default value of 0.0 is used for all ordinal variables. Including means=0 in an argument may be a way of accomplishing that. We should consider simultaneous treatment of means and standard-deviations (ore precisions).

Offline
Joined: 07/31/2009 - 15:10
On 2): Yes, that's right.

On 2):

Yes, that's right. I'm arguing for allowing the "means" argument of the mxFIMLObjective function (for example) to be set to 0. Under normal circumstances, this would be set to an mxMatrix or mxAlgebra containing the model-implied means. Setting means=0 would generate an appropriate-length vector of fixed 0 values that would be used as the means vector for that Objective function. Users would still have to specify a model-implied covariance matrix.

I'm not sure what you mean about simultaneous treatment of means and standard deviations. In FIML, both means and covariance parameters are estimated simultaneously.

Are you asking for the ability to pass in a matrix of sums of squares and cross-products in lieu of a vector of means and a covariance matrix?

Offline
Joined: 07/31/2009 - 15:30
I was hasty in my earlier

I was hasty in my earlier response.
Generally, speaking means=0 and std=1 will be needed in one group -for setting the origina and the metric of the latent variable.
As long as these are explicit, either by way of user's explicit choice or in the manner in which OpenMx implements these defaults and informs the user of these choices, it is fine. I imagine choices regarding means and sds are likely to be linked.

Offline
Joined: 07/31/2009 - 15:24
Tim Brick's proposal has been

Tim Brick's proposal has been implemented in the subversion trunk, with the exception of the means = 0 changes. We desperately need a rewrite of models/passing/Ordinal.R and models/passing/OrdinalAlgebra.R. They can now be rewritten the same way as in classic Mx. To the generous person who does the re-writing: please put the original copies of these scripts in models/nightly so they will be executed in our nightly tests.

Offline
Joined: 12/22/2009 - 06:13
Hi all, as a just-starting

Hi all,

as a just-starting user, I was just wondering about the means 0 vector in a threshold model, and would like it if it wasn't required in the mxFIMLObjective(). It looks redundant but cannot be removed, which I found confusing.

Lot

Offline
Joined: 07/31/2009 - 15:12
Hi Lot, You bring up an

Hi Lot,

You bring up an interesting point. For many users of threshold models, specifying the means of ordinal variables does not make sense. The thresholds replace the means and variance terms we would find in continuous variables, and the rest of the model builds up from there. This is especially true of item response modeling.

It helps to understand the assumptions of the threshold model. When we estimate thresholds, we assume there is a continuous variable underlying the observed data, and the thresholds identify cutpoints on that continuous variable. For instance, a threshold of zero on a binary (0,1) variable indicates that scores below zero on this underlying continuous variable are observed as a zero in the binary variable, while scores above zero on the continuous dimension are observed as a one on the binary variable.

We do have to scale that underlying variable. Most of the time, we consider it to be standard normal, or a mean of zero and a standard deviation and variance of 1 (unit variance). If we have k categories, we only have k-1 degrees of freedom to estimate not only the thresholds (which there are k-1 of), but also the mean and variance of that continuous underlying variable. Most of the time, we estimate the thresholds and fix the means and variance terms of the variable at zero and one, respectively.

However, there are times when we would want to change that. Multiple group models may have the same thresholds across groups but mean differences in the items, and other approaches may fix some number of item thresholds so we can enact tests on the means and variances. What we're discussing in this thread is whether some type of shortcut so you can just say "means=0" would be good for people in the more traditional case, to keep them from having to specify a matrix of all zeros for the means in every model.