Hello,

I am trying to build a model that includes morbidity count (0,1,2,3) as both a predictor and an outcome in the same structural equation model.

We created a simulated data set and tested our model, but unfortunately, the results are funky. Morbidity as the count outcome (with poisson dist) works, but issues arise with morbidity count as a predictor. I am guessing it is treating it as a continuous variable, which is not correct. Does anyone know of any tips/ideas to ensure that the count data is treated as such when it is a predictor?

This is a our SEM structure:

morbidity ~ beginningweight

morbidity ~ gender

endingweight ~ beginningweight

endingweight ~ morbidity

endingweight ~ gender

beginningweight ~ gender

Thank you,

Lauren

Hi Laura

Usually, the distribution of an exogenous variable is not relevant to the, e.g., multivariate normality assumption because the assumption concerns the residuals’ distribution after the exogenous variables’ effects have been removed. In your case, I wonder what the joint and conditional distributions look like. It’s a bit difficult to tell what’s going on without more diagnostic information than “results are funky.”

I take it that the ~’s are causal paths. It’s a pity the RAM notation of -> and <-> was not adopted by LAVAAN; despite the extra typing, RAM notation makes it immediately obvious what kind of path it is without having to look it up.

Thanks for the response. I haven't looked at the joint and conditional distributions... I'm definitely new to SEM so just basic understanding. The issue that we saw was that the coefficient for morbidity -> endingweight was not what we specified while creating the simulation data set.

Yes, I used lavaan so far. I am just exploring other options for now.

What sort of thing did you have in mind concerning morbidity count as a predictor? Certainly, in a linear regression, a count-variable predictor is treated like any other quantitative predictor.

Yes, but the

`~`

has the same meaning that it does in R formulae, i.e. "is regressed onto".I've attached a script that demonstrates how to use OpenMx to jointly model two continuous variables and a discrete count variable (which follows a negative-binomial distribution). Specifically, the script is for a twin analysis of those 3 traits, so there's a lot in it that isn't really relevant to your case. The way it works is that the count variable is presumed to reflect a latent, normally distributed "liability", and that the liability is what is correlated with the other two variables. The count variable is of type MxFactor, and it has thresholds between integers, as though it were an ordinal variable. However, the thresholds are all functions of the negative-binomial CDF, which in turn depends upon only the two parameters of the negative-binomial distribution, no matter how many thresholds are needed. It's a little bit like a Gaussian copula, except the theory of copulas is for strictly continuous random variables.

Are you sure the distribution of your morbidity variable is reasonably well approximated by a Poisson distribution? Count variables assessed in human subjects are quite often overdispersed, i.e., their variance substantially exceeds their mean (whereas the mean and variance of a Poisson distribution are equal).

Great, thank you for the script! So this study is actually in feedlot cattle. Morbidity affects weight gain and the Poisson dist works well. The data is simulated and definitely not overdispersed.

I've been using lavaan so that's why I used the "~" symbol, but I was exploring other software/package options because what happens is the coefficient for morbidity -> ending weight is not what we specified in the simulation.

OK good, that keeps things simple.

Could you share how you're simulating your data? If so, I can probably spot any discrepancy between how you're generating data and what the SEM software is doing. I might also be able to write an OpenMx script tailored to your simulated data.