# family link functions v. threshold model on polychoric

4 posts / 0 new Offline
Joined: 06/14/2023 - 04:13
family link functions v. threshold model on polychoric

Does OpenMx estimate the ordered family using a link function like probit or logit? Or may I open a feature request on github?

I found one years old Q&A in this forum here stating the the threshold model was akin to the probit. And this other Q&A mentions how to specify a probit model. So I am asking this question for a clearer answer.

It puzzles me that I cannot find software (besides perhaps Stata) that enables estimating a SEM with latent factors using family link functions (e.g. ordered + logit). As far as I understand, only with that method, the factors are estimated in relation to the distribution of the items. Please let me know if you believe there are reasons that this is mistaken or not important.

"Categorical Threshold Estimation -- Models with categorical outcomes can be estimated, including thresholds for the categories."
- openmx-features

Maximum likelihood estimation for ordinal variables is done by generating expected covariance and mean matrices for the latent continuous variables underlying the set of ordinal variables, then integrating the multivariate normal distribution defined by those covariances and means. The likelihood for each row of the data is defined as the multivariate integral of the expected distribution over the interval defined by the thresholds bordering that row’s data.

OpenMx uses Alan Genz’s SADMVN routine for multivariate normal integration (see http://www.math.wsu.edu/faculty/genz/software/software.html for more information). When continuous variables are present, OpenMx utilizes a block decomposition to separate the continuous and ordinal covariance matrices for FIML. The likelihood of the continuous variables is calculated normally. The effects of the point estimates of the continuous variables is projected out of the expected covariance matrix of the ordinal data. The likelihood of the ordinal data is defined as the multivariate integral over the distribution defined by the resulting ordinal covariance matrix. Offline
Joined: 03/01/2013 - 14:09
FIML with ordinal vs. WLS with ordinal

Hi

I'm not sure I understand your question, but I'm going to give it a shot:

There are two main approaches for analyzing ordinal (ordered factor) data in OpenMx. One is to supply the raw data and request the maximum likelihood fit function, which proceeds by generating the expected covariance matrix and means (though these may be zero if the thresholds are being estimated) then calculating the integral of the MVN distribution between the thresholds defined by the particular values of the ordinal variable. So, e.g., analyzing just one 3 category 0/1/2 variable, the likelihood, given the model's parameter values (covariances or a path model that defines them in terms of other parameters) would be the MVN integral from minus infinity to the first threshold for a score of 0, from threshold 1 to threshold 2 for those scoring 1, and from threshold 2 to plus infinity for those scoring 2.

WLS initially operates somewhat similarly, but it first estimates all the polychorics and the thresholds, and the covariance matrix of the parameters, whose inverse is used for the weights. Essentially model fitting then proceeds based on the observed and expected statistics: (o-e)' inv(W) (o-e) where o and e are the previously estimated correlations and thresholds.

Does that help? Offline
Joined: 06/14/2023 - 04:13
Hi Neal,

Hi Neal,

Thank you very much for that kind response.

This was already helpful. I take away that the WLS estimator does not offer what I ask for. But please let me take a step back: Can you help me figure out whether it is possible to use OpenMx a bit like the Stata software's GSEM command (as in Generalized SEM)? -- (a) would that be by the Full Information ML method you described? (b) does it use the probit link function? (c) would you recommend testing for multivariate normality MVN of the ordered items before using this method?

GSEM enables estimating latent factors using ologit, ordered logit, where logit is a link function and the ordered/ordinal scale is a distribution family (see e.g. the documentation). As far as I understand, this estimation technique is necessary to ensure that the model is closely informed by the distribution of the data; the ordered items. In contrast, computing from the ordered items a polychoric correlation matrix piped into a model implies that the model (perhaps estimated with WLS) will be less informed by the actual distribution of the data.

So far, for some days ago, I picked up the lines below from the documentation (leaving it aside to ask the question you help answering):

raw <- sample[, c(vars1, vars2)]

raw$item1_ordinal <- mxFactor(raw$item1_ordinal, levels = c(0,1,2,3))
raw$item2_ordinal <- mxFactor(raw$item2_ordinal, levels = c(0,1,2,3))
raw$item3_ordinal <- mxFactor(raw$item3_ordinal, levels = c(0,1,2,3))
raw$item4_ordinal <- mxFactor(raw$item4_ordinal, levels = c(0,1,2,3))
raw$item5_ordinal <- mxFactor(raw$item5_ordinal, levels = c(0,1,2,3))
raw$item6_ordinal <- mxFactor(raw$item6_ordinal, levels = c(0,1,2,3))
raw$item7_ordinal <- mxFactor(raw$item7_ordinal, levels = c(0,1,2,3))
raw$item8_ordinal <- mxFactor(raw$item8_ordinal, levels = c(0,1,2,3))

dataRaw <- mxData(observed = raw, type="raw")

# residual variances
resVars <- mxPath(from = c(vars1, vars2), arrows = 2, # how about 8?
free = TRUE, values = c(1,1,1,1,1,1,1,1),
labels = c("e1", "e2", "e3", "e4", "e5", "e6", "e7", "e8"))

# latent variance and covariance
latVars <- mxPath(from = c("F1", "F2"), arrows = 2, connect = "unique.pairs", # is "unique.pairs" correct?
free = TRUE, values = c(1, .5, 1), labels = c("varF1", "cov", "varF2"))

free=c(FALSE,TRUE,TRUE,TRUE), values=c(1,1,1,1), labels = c("11","12","13","14"))
free=c(FALSE,TRUE,TRUE,TRUE), values=c(1,1,1,1), labels = c("15","16","17","18"))
# means
means <- mxPath(from = "one", to = c(vars1,vars2,"F1","F2"),
arrows = 1,
free = c(T,T,T,T,T,T,T,T,F,F),
values = c(1,1,1,1,1,1,1,1,0,0),
labels = c("meanitem1","meanitem2","meanitem3","meanitem4","meanitem5","meanitem6","meanitem7", "meanitem8", NA, NA))
# We also need means and intercepts in our model. Exogenous or independent variables have means, while endogenous
# or dependent variables have intercepts. These can be included by regressing both x and y on a constant, which can
# be refered to in OpenMx by "one".

# thresholds
mxThreshold(vars = c(vars1, vars2), nThresh = c(3,3,3,3,3,3,3,3), free = TRUE, values = c(0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2)) # four ordinal values implies: three thresholds and three values (for each variable)
# It is also important to remember that specifying thresholds is not sufficient to get an ordinal data model to run. In
# addition, the scale of each ordinal variable must be identified just like the scale of a latent variable. The most common
# method for this involves constraining a ordinal item’s mean to zero and either its total or residual variance to a constant
# value (i.e., one). For variables with two or more thresholds, ordinal variables may also be identified by constraining
# two thresholds to fixed values

twoFactormodel <- mxmodel("Two Factor Model Path Specification", type="RAM", # is that the type?
manifestVars=c(vars1, vars2), latentVars=c("F1", "F2"),
# When using a path specification of the model, the fit function is always RAM which is indicated by using the type
# argument. We don’t have to specify the fit function explicitly with an mxExpectation() and FitFunction()
# argument, instead we simply add the following argument to the model.

twoFactorResults <- mxRun(twoFactormodel)

summary(twoFactorResults)
output(twoFactorResults) Offline
Joined: 06/14/2023 - 04:13
Ohh yeah, and sure, the

Ohh yeah, and sure, the question was phrased a bit weird. Sorry about that. I guess I meant to ask whether OpenMx is able to estimate models using a (probit/logit) link function to the ordered family distribution.

(I was not able to edit my initial response you your answer, so this comes in a second response)