Multilevel Logistic Regression

Posted on
No user picture. samson91787 Joined: 02/10/2014
Hi,

I'm wondering if I could get some tips on how to fit a multilevel logistic regression. I can fit a multilevel model with a continuous outcome without any problems. However, once I set up a threshold, I get an error code of 5 or 6. I am trying to fit a three-level model, so I went with the example posted at https://github.com/OpenMx/OpenMx/blob/3cb6593af0754be134e107cf4f42315d8c18db2b/inst/models/passing/xxm-3.R and modified the outcome to be binary. The code is below:

data('Pastes')
Pastes\$strength = if_else(Pastes\$strength < 60, 1, 0)
Pastes\$strength = mxFactor(Pastes\$strength, levels = c(0, 1))

batch <- mxModel(
'batch', type="RAM",
latentVars = c('batch'),
mxData(data.frame(batch=unique(Pastes$batch)), 'raw', primaryKey='batch'),
mxPath('batch', arrows=2, values=1))

sample <- mxModel(
'sample', type="RAM", batch,
latentVars = c('sample'),
mxData(as.data.frame(Pastes[!duplicated(Pastes$sample), c('batch','sample')]),
'raw', primaryKey='sample'),
mxPath('sample', arrows=2, values=1),
mxPath('batch.batch', 'sample', values=1, free=FALSE, joinKey='batch'))

strength <- mxModel(
'strength', type='RAM', sample,
manifestVars = c('strength'),
mxData(Pastes, 'raw'),
mxPath('one', 'strength'),
mxPath('strength', arrows=2, values=1),
mxThreshold('strength', nThresh = 1, values = 1),
mxPath('sample.sample', 'strength', free=FALSE, values=1, joinKey="sample"))

strength <- mxRun(strength)

I assumed it was because I am estimating the residual for strength, but if I fix the the residual to 1, with

mxPath('strength', arrows=2, values=1, free = F),

I get the model to converge, but the parameters are totally off from the equivalent model with lme4

(fm01 <- glmer(strength ~ 1 + (1 | sample) + (1 | batch), Pastes, family = binomial(link = 'logit')))

In fact, it appears that I get a negative variance for sample.

I get identical results to lme4 when I don't treat strength as a factor.

Any help would be greatly appreciated.

Replied on Wed, 09/18/2019 - 13:27
Picture of user. AdminRobK Joined: 01/24/2014

OpenMx's multilevel feature doesn't work with threshold variables. I'm surprised your code runs. It should raise an error instead. That might be a bug.
Replied on Wed, 09/18/2019 - 15:45
Picture of user. AdminNeale Joined: 03/01/2013

Sam - yes, there is probably is a way via user-defined fit functions, but these might prove tedious to implement.

It is difficult to see how the OpenMx approach to multilevel would work with ordinal data under the threshold model, because to my knowledge there aren't shortcuts (like marginal maximum likelihood) built in, which in turn would require numerical integration over as many dimensions as there are people within a batch.

Replied on Wed, 09/18/2019 - 15:56
No user picture. samson91787 Joined: 02/10/2014

In reply to by AdminNeale

Thanks! Do you have any resources/examples I could look into to learn how to code that? Also, would a probit model be equally as difficult? The reason I want to use OpenMx, as opposed to lme4, is that I have a substantial amount of missing data (~10-20%), and I wanted to take advantage of OpenMx's Full Information Maximum Likelihood.
Replied on Thu, 09/19/2019 - 11:15
Picture of user. AdminRobK Joined: 01/24/2014

In reply to by samson91787

Also, would a probit model be equally as difficult?

Holding all else constant, doing a probit model in OpenMx would be easier. Doing a non-multilevel probit regression in OpenMx is pretty straightforward.

The reason I want to use OpenMx, as opposed to lme4, is that I have a substantial amount of missing data (~10-20%), and I wanted to take advantage of OpenMx's Full Information Maximum Likelihood.

In OpenMx, if you model the binary response variable conditional on the regressors (i.e., you treat the covariates as "definition variables"), you won't gain any benefit. Observations with missing data on the response will just be ignored, as they would be in lme4. And if you try to run an MxModel when you have `NA`s on covariates, you'll get an error. The only way OpenMx's FIML estimation is gong to help you is if you model the joint distribution of the response and the covariance, and I'm not sure how to do that in OpenMx given the multilevel nature of your data, and the fact that the response is binary (it would be fairly simple to do with non-multilevel data).

I would suggest one of two things. One is to use multiple imputation to deal with your missing data problem, and stick with lme4 to actually fit your model. The other alternative is to use MCMC Bayes, which treats missing values the way it treats unknown parameters: as random variables. In particular, MCMC Bayes is especially useful for really complicated generalized linear mixed-effects models that just won't converge when fitted by maximum likelihood.