You are here

Categorical data with membership probabilities

3 posts / 0 new
Last post
eivind's picture
Joined: 11/04/2009 - 16:02
Categorical data with membership probabilities

I have ordinal data where the probability for membership in each category is known (e.g. individual i has probability p for being member of category 1 of variable u). Does anybody know if it is possible to utilize this information in OpenMx, for example by weighting the thresholds? The probabilities would be different across twins and across thresholds of the ordinal.

mhunter's picture
Joined: 07/31/2009 - 15:26

Initially, it sounds like you have more information than in typical ordinal data analysis. But I think you actually have less. In typical ordinal situations you also know that individual i has probability p for being member of category 1 of variable u. The probability p is 1 or it is 0. In other words, you're certain of which category was used. From your description, it sounds like you must not be certain because you only have the probability of a category being used.

I believe you could still estimate an ordinal model with this information, assuming the data really are ordinal. The technique I would suggest is to fix the thresholds based on the probabilities.

For example if you're cumulative probabilities for 4 categories are

sumProbs <- c('0'=0.340, '1' = 0.552, '2' = 0.808) #omit last category bc always 1.0

and the expected variance of the observed variable is 1, and the expected mean of the variable is 0 then the thresholds should be at

threshVals <- qnorm(sumProbs, mean=0, sd=1)
#                 0                 1                 2 
# -0.4124631  0.1307160  0.8705498

If you known a priori what the total mean and variance are then you could pre-compute these thresholds for each row, and use definition variables to define different thresholds for each individual. Otherwise, you'll need to use MxAlgebra statements and definition variables to compute the thresholds for each row.

neale's picture
Joined: 07/31/2009 - 15:14
Multiple categories?

It sounds as though you want to fit a model that has one or more thresholds. In the usual case, an individual data vector would be simply 1 or 0 for the probability that they are in that category. This sounds like a weighting problem. The slight difficulty is that we don't know the joint probability of the pair of twins. Still it may be possible to model this.

Usually, the likelihood that a pair falls into a cell of a contingency table is the integral of the bivariate normal for that cell. In the present case, one could compute the marginal probability (across say a row for twin 1 and across a column for twin 2) to arrive at the expected proportions per twin for each of the marginals. Then the likelihood of the pair would be the sum of the marginal likelihoods, each multiplied by their 'observed' probability in the data. This is a bit involved, and I'm not sure that it would work, but might be a way to explore.