Categorical predictors and outcomes
Posted on

Forums
Hello,
I am trying to build a model that includes morbidity count (0,1,2,3) as both a predictor and an outcome in the same structural equation model.
We created a simulated data set and tested our model, but unfortunately, the results are funky. Morbidity as the count outcome (with poisson dist) works, but issues arise with morbidity count as a predictor. I am guessing it is treating it as a continuous variable, which is not correct. Does anyone know of any tips/ideas to ensure that the count data is treated as such when it is a predictor?
This is a our SEM structure:
morbidity ~ beginningweight
morbidity ~ gender
endingweight ~ beginningweight
endingweight ~ morbidity
endingweight ~ gender
beginningweight ~ gender
Thank you,
Lauren
SEM’s exogenous distributions
Hi Laura
Usually, the distribution of an exogenous variable is not relevant to the, e.g., multivariate normality assumption because the assumption concerns the residuals’ distribution after the exogenous variables’ effects have been removed. In your case, I wonder what the joint and conditional distributions look like. It’s a bit difficult to tell what’s going on without more diagnostic information than “results are funky.”
I take it that the ~’s are causal paths. It’s a pity the RAM notation of -> and <-> was not adopted by LAVAAN; despite the extra typing, RAM notation makes it immediately obvious what kind of path it is without having to look it up.
Log in or register to post comments
In reply to SEM’s exogenous distributions by AdminNeale
Thanks for the response. I
Thanks for the response. I haven't looked at the joint and conditional distributions... I'm definitely new to SEM so just basic understanding. The issue that we saw was that the coefficient for morbidity -> endingweight was not what we specified while creating the simulation data set.
Yes, I used lavaan so far. I am just exploring other options for now.
Log in or register to post comments
What sort of thing did you
What sort of thing did you have in mind concerning morbidity count as a predictor? Certainly, in a linear regression, a count-variable predictor is treated like any other quantitative predictor.
Yes, but the
~
has the same meaning that it does in R formulae, i.e. "is regressed onto".Log in or register to post comments
demo script
I've attached a script that demonstrates how to use OpenMx to jointly model two continuous variables and a discrete count variable (which follows a negative-binomial distribution). Specifically, the script is for a twin analysis of those 3 traits, so there's a lot in it that isn't really relevant to your case. The way it works is that the count variable is presumed to reflect a latent, normally distributed "liability", and that the liability is what is correlated with the other two variables. The count variable is of type MxFactor, and it has thresholds between integers, as though it were an ordinal variable. However, the thresholds are all functions of the negative-binomial CDF, which in turn depends upon only the two parameters of the negative-binomial distribution, no matter how many thresholds are needed. It's a little bit like a Gaussian copula, except the theory of copulas is for strictly continuous random variables.
Are you sure the distribution of your morbidity variable is reasonably well approximated by a Poisson distribution? Count variables assessed in human subjects are quite often overdispersed, i.e., their variance substantially exceeds their mean (whereas the mean and variance of a Poisson distribution are equal).
Log in or register to post comments
In reply to demo script by AdminRobK
Great, thank you for the
Great, thank you for the script! So this study is actually in feedlot cattle. Morbidity affects weight gain and the Poisson dist works well. The data is simulated and definitely not overdispersed.
I've been using lavaan so that's why I used the "~" symbol, but I was exploring other software/package options because what happens is the coefficient for morbidity -> ending weight is not what we specified in the simulation.
Log in or register to post comments
data generation?
OK good, that keeps things simple.
Could you share how you're simulating your data? If so, I can probably spot any discrepancy between how you're generating data and what the SEM software is doing. I might also be able to write an OpenMx script tailored to your simulated data.
Log in or register to post comments