Zero-inflated poisson counts using mxThreshold

Posted on Thu, 06/10/2021 - 11:41

bwiernik Joined: 01/30/2014

Forums

I'm working with a latent variable model and I'd like to predict a negative binomial count response variable. I found the clever cludge that @AdminRobK previously shared to use thresholds fixed to parametric quantiles for negative binomial distribution ([#7856](https://openmx.ssri.psu.edu/comment/7856#comment-7856); [180313.R](https://openmx.ssri.psu.edu/sites/default/files/180313.R) ). I think this approach will work well for me. The issue I've got is that my dependent variable is clearly zero-inflated. I've also got some minor observation-number differences that I'd like to incorporate as an offset.

I'm not exactly sure how to marry a zero-inflation model with the threshold parameterization of the negative binomial Rob used there? Can anyone offer a suggestion?

For the offset, with this parameterization, would it be correct to just add the log(number of observations) as a predictor in my model with a fixed coefficient of 1.0?

Thanks!

Replied on Thu, 06/10/2021 - 12:04

AdminRobK Joined: Jan 24, 2014

built-in functionality

Hi, Brenton! As of OpenMx v2.18, that kludge of mine has built-in support, and it includes zero-inflation. The relevant function is mxMarginalNegativeBinomial(), which has an argument for the zero-inflation proportion, `zeroInf`. This test script demonstrates its use (in the OpenMx source repository, it's tests/testthat/test-discrete.R .

Could you say a bit more about the offset to which you refer?

Replied on Thu, 06/10/2021 - 12:16

bwiernik Joined: Jan 30, 2014

Awesome! That's great!

For the offset: I've got substance use days for participants, but the time periods for each participant vary somewhat--some participants have 25 days of data, some 30, etc. If I were fitting an observed variable Poisson model, I would use a linear predictor like this:

log(count_dv) = log(days_observed) + β0 + β1x1 + β2x2 + …

So log(days_observed) is an offset/variable with coefficient fixed to 1.0.

Replied on Thu, 06/10/2021 - 12:38

AdminRobK Joined: Jan 24, 2014

sounds reasonable

For the offset: I've got substance use days for participants, but the time periods for each participant vary somewhat--some participants have 25 days of data, some 30, etc. If I were fitting an observed variable Poisson model, I would use a linear predictor like this:

log(count_dv) = log(days_observed) + β0 + β1x1 + β2x2 + …

So log(days_observed) is an offset/variable with coefficient fixed to 1.0.

OK, I think I get it. Sounds reasonable.

Replied on Fri, 06/11/2021 - 10:34

bwiernik Joined: Jan 30, 2014

Just to be sure I'm

Just to be sure I'm interpreting these results correctly. Below, use_days_1_30_Use and use_days_1_30_Incar are specified with:

umxPath(v1m0 = criteria), mxMarginalNegativeBinomial(criteria, maxCount = 30, size = 4, mu = 1, zeroInf = .01),

The Estimate for the Delta → use_days paths are regression coefficients on the normal latent "liability" variable underlying the negative binomial indicators, correct?

Do the Discrete parameters 1, 2, 3 correspond to the zero-inflation, size/dispersion, and mu parameters, respectively? (Could those be given informative labels in the output?) Are size and prob/mu parameterized as in ?pnbinom (that is NB2)? And the zero-inflation parameter is the marginal probability of 0 parameter from the binomial model?

matrix row col Estimate Std.Error 10 A use_days_1_30_Use use_days_1_30_Incar -2.799097e-01 0.139894372 17 A use_days_1_30_Use Delta 4.135083e-01 0.211776268 18 A use_days_1_30_Incar Delta 3.062733e-01 0.184349627 31 S Delta Delta 5.157468e-01 0.061836412 35 Discrete 1 use_days_1_30_Use 0.000000000 0.212945574 36 Discrete 2 use_days_1_30_Use 0.055341620 0.011971673 37 Discrete 3 use_days_1_30_Use 3.577466689 1.244645467 38 Discrete 1 use_days_1_30_Incar 0.772313030 0.027908882 39 Discrete 2 use_days_1_30_Incar 1.225494822 0.392580227 40 Discrete 3 use_days_1_30_Incar 44.754645867 9.818826942

Thanks!

Replied on Fri, 06/11/2021 - 13:18

AdminJosh Joined: Dec 12, 2012

Discete parameters

> Do the Discrete parameters 1, 2, 3 correspond to the zero-inflation, size/dispersion, and mu parameters, respectively? (Could those be given informative labels in the output?)

You can easily figure it out using code like this,

```
m1 = mxModel("test", type="RAM", manifestVars = c('x1'),
mxMarginalNegativeBinomial(
"x1", maxCount = 30, size = 4, mu = 1, zeroInf = .01))
m1$Discrete$values
```

Where the values show up in `mxMatrix` `Discrete` tells you the meaning of the parameter. You can add labels to the `Discrete$labels` matrix to get labelled output in the summary table.

> Are size and prob/mu parameterized as in ?pnbinom (that is NB2)?

You can use `mxGetExpected(model, "thresholds")` to check whether the proportions match what you expect.

Replied on Fri, 06/11/2021 - 13:24

AdminJosh Joined: Dec 12, 2012

zero-inflation

> And the zero-inflation parameter is the marginal probability of 0 parameter from the binomial model?

Actually I think it is the extra probability of zero beyond what the binomial model would predict.

Replied on Fri, 06/11/2021 - 10:41

bwiernik Joined: Jan 30, 2014

Also, is it possible to

Also, is it possible to disable the zero-inflation parameter?

Replied on Fri, 06/11/2021 - 12:48

AdminJosh Joined: Dec 12, 2012

disable the zero-inflation parameter

Yes, just set it fixed to 0 or whatever value you prefer.

Replied on Fri, 06/11/2021 - 12:49

bwiernik Joined: Jan 30, 2014

Ah, of course.

Replied on Fri, 05/10/2024 - 11:10

p1981thompson Joined: May 10, 2024

simple CFA with zero-inflated Poisson manifest variables- error

I have three zero inflated manifest variables for a CFA model. I assume that it is possible to use the mxMarginalPoisson to specify this model. This will form part of a larger model but I have deliberately kept this piece separate in the first instance to get something working. The model has been specified as follows (I am a novice OpenMX user having only used it briefly around 10 years ago).
model1 <- mxModel("ZIP CFA", type = "RAM", mxData(data, type = "raw"), manifestVars = c("x1", "x2", "x3"), mxPath(from = "F1", to = c("x1", "x2", "x3"),free=c(FALSE,TRUE,TRUE),values=c(1,1,1),labels=c("l1","l2","l3")), mxPath(from = manifestVars, arrows = 2), mxPath(from = "Externalising", arrows = 2, free = FALSE, values = 1.0), mxPath( from=c("x1", "x2", "x3"), arrows=2, free=TRUE, values=c(1,1,1),labels=c("e1","e2","e3") ), mxPath(from = "one", to = manifestVars, arrows = 1, free = TRUE, values = 1.0), mxPath(from = "one", to = "F1", arrows = 1, free = FALSE, values = 0), mxMarginalPoisson(vars=c("x1", "x2", "x3"),c(14,12,13),c(6,6,6),zeroInf=c(0.4,0.4,0.4))

)
The error that I am receiving is:

*Running ZIP CFA with 14 parameters*
*Warning message:*
*In model 'ZIP CFA' Optimizer returned a non-zero status code 6. The model does not satisfy the first-order* *optimality conditions to the required accuracy, and no improved point for the merit function could be found during* *the final linesearch (Mx status RED) *

If I run with 'mxTryHard':

*Retry limit reached; Best fit=2771.4993 (started at 73171074) (11 attempt(s): 11 valid, 0 errors)*

Is the model specification incorrect? OR is this likely to be a data issue? Thanks for any insights.

Replied on Fri, 05/24/2024 - 10:25

AdminRobK Joined: Jan 24, 2014

First of all, the message you

First of all, the message you're seeing is a *warning*, not an *error*. And, it happens to be a pretty common warning for the kind of analysis you're doing. See, when OpenMx is modeling count variables with `mxMarginalPoisson()` and `mxMarginalNegativeBinomial()`, it's using the same algorithm that underlies its modeling of binary/ordinal threshold variables. And, for technical reasons, status code 6 (Mx status RED) is rather common with threshold data even when the optimizer has found a local minimum. This issue has been discussed on these forums many, many, many times before. For more information, I'd suggest searching this website with search terms such as "status code 6", "mx status red", "ordinal", "threshold", etc.

It probably wasn't obvious to you that you are doing an ordinal analysis "under the hood", but you are, so you really should be using `mxTryHardOrdinal()` instead of `mxTryHard()`. Vanilla `mxTryHard()` has defaults that can be downright counterproductive for analyses of ordinal variables.

Replied on Mon, 05/13/2024 - 05:03

p1981thompson Joined: May 10, 2024

Apologies. correction to model above.

model1 <- mxModel("ZIP CFA", type = "RAM", mxData(data, type = "raw"), manifestVars = c("x1", "x2", "x3"), mxPath(from = "F1", to = c("x1", "x2", "x3"),free=c(FALSE,TRUE,TRUE),values=c(1,1,1),labels=c("l1","l2","l3")), mxPath(from = manifestVars, arrows = 2), mxPath(from = "F1", arrows = 2, free = FALSE, values = 1.0), mxPath( from=c("x1", "x2", "x3"), arrows=2, free=TRUE, values=c(1,1,1),labels=c("e1","e2","e3") ), mxPath(from = "one", to = manifestVars, arrows = 1, free = TRUE, values = 1.0), mxPath(from = "one", to = "F1", arrows = 1, free = FALSE, values = 0), mxMarginalPoisson(vars=c("x1", "x2", "x3"),c(14,12,13),c(6,6,6),zeroInf=c(0.4,0.4,0.4))

)

Replied on Fri, 05/24/2024 - 10:15

AdminNeale Joined: Mar 01, 2013

As a mixture? And a concern.

Hi Brenton

One could in principle specify a ZIP with a mixture distribution. However, I am in general leery of ZIPs as a kludge for dealing with distributions that arise due to a scale with a serious floor effect. Thus if one collects data from a community sample of a test intended for clinical populations, often there is a substantial pile-up of people who say no to every item/symptom. Can end up hemi-modal. Under these circumstances, I don't think that ZIP is appropriate. Since we know the generating mechanism, it's better to work with that. Of course the zeroes have the problem that they are less precisely measured than those with higher scores. A cool way to deal with that is to estimate factor scores by ML, then use the standard errors of those FS estimates to weight the residual error variance of the factor score in a two-stage path analysis. See Lai and Hsiao's nice paper for work on this.

News

Recent Posts

Zero-inflated poisson counts using mxThreshold

built-in functionality

Awesome! That's great!

sounds reasonable

Just to be sure I'm

Discete parameters

zero-inflation

Also, is it possible to

disable the zero-inflation parameter

Ah, of course.

simple CFA with zero-inflated Poisson manifest variables- error

First of all, the message you

Apologies. correction to model above.

As a mixture? And a concern.

News

Recent Posts