Continuous and categorical variables in same model?

Posted on
No user picture. svrieze Joined: 01/03/2010
Is this possible yet?

-Scott

Replied on Wed, 07/20/2011 - 13:35
Picture of user. mspiegel Joined: 07/31/2009

Yes! Please test joint continuous and ordinal integration, as this is one of the features for the OpenMx 1.2 release. There is more information regarding the beta release at this announcement.
Replied on Wed, 07/20/2011 - 14:26
Picture of user. tbates Joined: 07/31/2009

In reply to by mspiegel

I think this example file needs comments, and a section in the .Rd files about how to set up thresholds for some (but not all) variables...

And also how to go from the example (roughly:
lm(cont ~BIN)
to a mixed type structural model (roughly (and not real code):
sem( ~ acont+ bord + ccont + dord)

Replied on Wed, 07/20/2011 - 14:53
No user picture. svrieze Joined: 01/03/2010

In reply to by tbates

Great! I'm glad tos ee this is supported. Especially because I'm working on a project that requires it, with a deadline in late August!

I'm afraid Tim's right. I wouldn't know how to use that script to inform my model. Are there any examples of a saturated means and var-covar model with some manifest variables dichotomous and some continuous?

The project (I'm still formulating) will be a longitudinal two-step ACE model of cigarette smoking. Andy Heath has done this sort of thing, at least cross-sectionally. At each of 4 time points two variables are measured: (1) a dichotomous variable indexing whether the subject has ever smoked, and (2) a quasi-continuous variable representing the number of cigarettes that subject has ever smoked. If the subject has never smoked, then (2) is set to missing.

I'm not sure how this plays out in a longitudinal model (i.e., if and how cross-time correlations are freed or fixed to zero in fitting), but it should be an interesting project.

-Scott

Replied on Mon, 07/25/2011 - 00:16
Picture of user. neale Joined: 07/31/2009

In reply to by svrieze

Hi Scott

We usually call the model you are speaking of a ccc model (conditional common pathway causal) because it involves one variable which is considered to be causally downstream and only measured when the upstream variable, initiation, is observed. Although there may be some continuous variables that are only really measurable in those who have initiated, I would not consider quantity smoked one. This measure is usually measured on an ordinal scale (or at least subjects have trouble reporting more precisely than a few categories of number of cigarettes smoked). Although such a variable might be better thought of as Poisson distributed, I think it is reasonable to consider it ordinal with an underlying normal distribution. Indeed, I think this is likely much better than considering it as continuous. So, I would favor using ordinal FIML analysis, and declare both initiation and quantity as ordinal. While there may exist a measure of lifetime quantity, which might be closer to continuous, I would not want to use this without taking care of censoring - older participants or those who started earlier would have more chance to increase their lifetime quantity smoked than would younger ones.

HTH
Mike

Replied on Mon, 07/25/2011 - 22:15
No user picture. svrieze Joined: 01/03/2010

In reply to by neale

Thanks Mike, that's very helpful. I'm using incident smoking during the 12 months prior to assessment. I was using a continuous model because I'm interested in a fixed effect (a SNP score) on the mean of cigarettes per day, as well as how the heritable variance changes after including the SNP score in the model. With continuous variables this is a very straightforward ACE model with some definition variables. How would the fixed effect work in a model with ordinal measures? Would the covariate effect be on the thresholds?

Do any scripts exist that have definition variables in ordinal models? None seem to on the OpenMx website.

-Scott

Replied on Fri, 07/29/2011 - 23:09
Picture of user. neale Joined: 07/31/2009

In reply to by svrieze

Sorry for the delay. We think there is one around, but I am having trouble locating it. In principle it's pretty simple: the ordinal variables have to be mxFactor()'d so openmx knows what is the highest and lowest theoretical categories (since in a dataset the extremes may not be observed yet we'd not want to integrate from threshold to infinity by mistake). Second, with 3 or more categories you can rescale the model by estimating both means and variances but fixing the first two thresholds to zero and one. In addition you could make the predicted mean a function of the definition variables, simply by labeling the relevant matrix element as a definition variable. I'll keep trying to dig up an example.
Replied on Sat, 07/30/2011 - 15:00
No user picture. svrieze Joined: 01/03/2010

In reply to by neale

thanks again for the help. I think I'm following this. I can get a threshold model to fit when I have a simple means model (no definition variables).

When I add in the definition variable effect on the means I get an error:
"Error: The job for model 'mm' exited abnormally with the error message: Objective function returned an infinite value."

The definition variable is not missing for any subject.

A short script with the two models and their output is attached, in case anyone would like to see what I've tried. Maybe (hopefully) it's a simple syntax error.

-Scott

Replied on Thu, 08/04/2011 - 11:41
Picture of user. mspiegel Joined: 07/31/2009

In reply to by svrieze

Sorry for the delay. I believe this error is usually seen when one of your data rows yields a likelihood value of 0. One way to debug this problem would be to build OpenMx from the source code repository. That version has a new feature where the likelihood vector is returned in the objective function by inspecting mmdef$MZ.objective@likelihoods after the model has finished running. Another way to debug the problem is to use the current binary version of OpenMx and add the argument "vector=TRUE" when you call the mxFIMLObjective() functions. But then you would need to change your mxAlgebraObjective() to: -2 * (sum(log(MZ.objective)) + sum(log(DZ.objective))). This way you can inspect the likelihood vector at mxEval(MZ.objective, mmdef) or mxEval(DZ.objective, mmdef) after the model has executed.

In either case, you'll need to undo the safety net that prevents the user from publishing results when the model throws an error. You will call mxRun() with the argument "unsafe=TRUE" and then the model results will be available to you when the error is thrown.

Replied on Tue, 10/01/2013 - 11:06
Picture of user. feihe Joined: 05/30/2012

In reply to by mspiegel

Hi,

I have a question about the interpretation of the coefficient in ordinal outcome models. When I was doing the simulation of joint model based on a binary variable and continuous variable. I don't quite understand the meaning of the coefficient in the ordinal outcome output. They seem totally different from the continuous outcome. I can understand for ordinal variable or binary variable, the FIML is based on cumulative normal distribution. But what's the interpretation of the coefficients is not clear. Is there any documents covering this information in OpenMx, because I didn't find anything in the guidance.

I appreciate your kindly help!

Fei

Replied on Tue, 10/01/2013 - 12:39
Picture of user. Ryne Joined: 07/31/2009

In reply to by feihe

There are two things to note about interpreting parameters involving ordinal variables under SEM/FIML.

First, all relationships between an ordinal/binary variable and any other variable (continuous or ordinal) are actually modeled as the relationship between the assumed latent continuous variable "underneath" your ordinal variable and the second variable. As an example, a covariance between a binary variable and a continuous variable in this approach will be modeled as the covariance between two continuous variables, one of which is measured and one of which is dichotomized to create your binary variable. In this example, the modeled correlation between these two variables should be stronger than the observed pearson correlation between the observed variables.

Second, the scale of this latent variable is set by you. You may say that the total variance is a particular value, or say that is residual variance is a particular value, or say that the distance between two of the thresholds is a particular value, or something else entirely. The raw value of whatever regressions and covariances involving that variable will depend on how you identify or scale this variable.

To sum up, all parameters involving ordinal variables come from the assumption that underneath your ordinal variable is a continuous normal distribution. All relationships involving this variable reflect associations involving this continuous normal variable. As we're assuming that the continuous variable exists, its mean and variance (i.e., its scale) depends on how you define or identify the ordinal variable.

Hope that helps,
Ryne

Replied on Tue, 10/01/2013 - 13:16
Picture of user. feihe Joined: 05/30/2012

In reply to by Ryne

Hi Ryne,

Thanks a lot for the detailed explanation. This really helps me understanding the logic behind. But I faced some problem when I was doing the simulation. Here is my code for a probit model. I found that if I used "ystar" (the latent variable for the binary variable) instead of "y" as the response variable, I can perfectly estimate the beta1=3, beta0=-6 (since "ystar" is a continuous variable). But once I try to estimate the coefficients between "y" (the binary variable I defined based on "ystart") and "x", I got lost by the estimates I had ( I had beta1=0.23, beta0=1.19). It's hard for me to find any connection between these estimates based on "y" and previous estimates based on "ystar". Do you have any idea? Thank you very much.

library(OpenMx)

set.seed(2534)

simulate <- function(N) {
x <- 2*runif(N,1,2)
ystar <- -6 + 3*x + rnorm(N)
y <- as.numeric(ystar>0)
print(table(y))
data.frame(y, x)}

data.simu <- simulate(100)
#colnames(data.simu)<- c(paste("y",seq(1,10),sep=""),paste("x",seq(1,10),sep=""))
name<- colnames(data.simu)

data.simu$y <- mxFactor(data.simu$y, levels=c(0, 1))

CrossLaggedModel1<-mxModel("probit model",
type="RAM",
manifestVars=name,

mxPath(from=name[2],to=name[1],
arrows=1, free=TRUE, value=1, labels="beta1"),

mxPath(from=name,
arrows=2,
free=TRUE,
values = c(1,1),
labels=c("residual","varx")
),

mxPath(from="one",
to=name,
arrows=1,
free=TRUE,
values=c(1,1),
labels=c("beta0","meanx")
) ,

mxData(observed=data.simu, type="raw")

)

fit1 <- mxRun(CrossLaggedModel1)
summary(fit1)

Replied on Mon, 10/28/2013 - 14:57
Picture of user. mhunter Joined: 07/31/2009

In reply to by feihe

I think the issue was still just setting the scale underlying the categorical variables. The attached script produces pretty good estimates of your generating parameters.


free parameters:
name matrix row col Estimate Std.Error Std.Estimate Std.SE
1 beta1 A y x 2.8128912 0.109627030 0.8519141 0.03320171
2 varx S x x 0.3344653 0.004729648 1.0000000 0.01414092
3 beta0 M 1 y -5.5888361 0.254768400 NA NA
4 meanx M 1 x 3.0056161 0.005783298 NA NA