Hi,
I'm wondering if I could get some tips on how to fit a multilevel logistic regression. I can fit a multilevel model with a continuous outcome without any problems. However, once I set up a threshold, I get an error code of 5 or 6. I am trying to fit a three-level model, so I went with the example posted at https://github.com/OpenMx/OpenMx/blob/3cb6593af0754be134e107cf4f42315d8c18db2b/inst/models/passing/xxm-3.R and modified the outcome to be binary. The code is below:
data('Pastes') Pastes\$strength = if_else(Pastes\$strength < 60, 1, 0) Pastes\$strength = mxFactor(Pastes\$strength, levels = c(0, 1)) batch <- mxModel( 'batch', type="RAM", latentVars = c('batch'), mxData(data.frame(batch=unique(Pastes$batch)), 'raw', primaryKey='batch'), mxPath('batch', arrows=2, values=1)) sample <- mxModel( 'sample', type="RAM", batch, latentVars = c('sample'), mxData(as.data.frame(Pastes[!duplicated(Pastes$sample), c('batch','sample')]), 'raw', primaryKey='sample'), mxPath('sample', arrows=2, values=1), mxPath('batch.batch', 'sample', values=1, free=FALSE, joinKey='batch')) strength <- mxModel( 'strength', type='RAM', sample, manifestVars = c('strength'), mxData(Pastes, 'raw'), mxPath('one', 'strength'), mxPath('strength', arrows=2, values=1), mxThreshold('strength', nThresh = 1, values = 1), mxPath('sample.sample', 'strength', free=FALSE, values=1, joinKey="sample")) strength <- mxRun(strength)
I assumed it was because I am estimating the residual for strength, but if I fix the the residual to 1, with
mxPath('strength', arrows=2, values=1, free = F),
I get the model to converge, but the parameters are totally off from the equivalent model with lme4
(fm01 <- glmer(strength ~ 1 + (1 | sample) + (1 | batch), Pastes, family = binomial(link = 'logit')))
In fact, it appears that I get a negative variance for sample.
I get identical results to lme4 when I don't treat strength as a factor.
Any help would be greatly appreciated.
OpenMx's multilevel feature doesn't work with threshold variables. I'm surprised your code runs. It should raise an error instead. That might be a bug.
Thanks for the reply! Is there any way to run a multilevel logistic regression in OpenMx?
Currently, there's no easy way. It might be possible to kluge something together with an algebra fitfunction and
mxEvaluateOnGrid()
, but I've never actually tried.Thanks! Do you have any resources you could point me towards to learn how to do that?
Sam - yes, there is probably is a way via user-defined fit functions, but these might prove tedious to implement.
It is difficult to see how the OpenMx approach to multilevel would work with ordinal data under the threshold model, because to my knowledge there aren't shortcuts (like marginal maximum likelihood) built in, which in turn would require numerical integration over as many dimensions as there are people within a batch.
Thanks! Do you have any resources/examples I could look into to learn how to code that? Also, would a probit model be equally as difficult? The reason I want to use OpenMx, as opposed to lme4, is that I have a substantial amount of missing data (~10-20%), and I wanted to take advantage of OpenMx's Full Information Maximum Likelihood.
Holding all else constant, doing a probit model in OpenMx would be easier. Doing a non-multilevel probit regression in OpenMx is pretty straightforward.
In OpenMx, if you model the binary response variable conditional on the regressors (i.e., you treat the covariates as "definition variables"), you won't gain any benefit. Observations with missing data on the response will just be ignored, as they would be in lme4. And if you try to run an MxModel when you have
NA
s on covariates, you'll get an error. The only way OpenMx's FIML estimation is gong to help you is if you model the joint distribution of the response and the covariance, and I'm not sure how to do that in OpenMx given the multilevel nature of your data, and the fact that the response is binary (it would be fairly simple to do with non-multilevel data).I would suggest one of two things. One is to use multiple imputation to deal with your missing data problem, and stick with lme4 to actually fit your model. The other alternative is to use MCMC Bayes, which treats missing values the way it treats unknown parameters: as random variables. In particular, MCMC Bayes is especially useful for really complicated generalized linear mixed-effects models that just won't converge when fitted by maximum likelihood.