Product of Variables Identification
Attachment | Size |
---|---|
model (8).pdf | 56.42 KB |
my aim is to use product of variables to fit a random-intercept random-slope model with latent mean centering. To be precise, my model is as follows. $i$ indexes person, $t$ indexes observations within a person.
$$Y_{it} = \beta_i (X_{it}-meanX_i) + meanY_i + residualY_{ij}$$
$$X_{it} = meanX_i + residualX_{ij} $$
$$\beta_i \sim N(\mu_\beta, \sigma_\beta^2), \quad meanX_i \sim N(\mu_X, \sigma_X^2), \quad meanY_i \sim N(\mu_Y, \sigma_Y^2)$$
If using a fixed slope, that is, $\beta_i=\beta$, this can be fitted in any SEM programm. The path diagram is as attached. Apologies for the difference in notation. To make $\beta$ (or a in the path diagram notation) random, it needs to switch from a path loading to a latent variable, and the regression prediction is essentially represented by a product node involving the latent variable $a$ and $X_1$ or $X_2$ etc. Mplus seems to fit this model using a Bayesian approach, and there is also a tutorial on how to fit similar models using brms/stan (https://vuorre.com/posts/latent-mean-centering/).
After reading the recent paper by Steve et al. about Products of variables, I tried to fit it in OpenMx to get frequentist estimates. Unfortunately, there seems to be a problem with identification. I get the warning message: In model 'SEM' Optimizer returned a non-zero status code. It seems the model is not identified. Can somebody help with that? Is the model truely not identified or is this an OpenMx problem? I am not sure how to assess indentification with product variables. Did I make a mistake?
Below is my code:
library(OpenMx)
library(dplyr)
library(tidyr)
simulateData <- function(n_subjs, n_timepoints) {
N <- n_subjs * n_timepoints
the_data <- data.frame(ID = numeric(N), T = numeric(N), X = numeric(N), Y = numeric(N))
Var_X <- 1
residual_Y <- 1
within_coef_dist <- function() rnorm(1, mean = 0.2)
for (i in 1:n_subjs) {
rIcep1 <- rnorm(1)
rIcep2 <- rIcep1 + rnorm(1)
dev1 <- rnorm(n_timepoints)
var1 <- rIcep1 + dev1
dev2 <- within_coef_dist() * dev1 + rnorm(n_timepoints)
var2 <- rIcep2 + dev2
the_data[(i - 1) * n_timepoints + (1:n_timepoints), ] <- c(rep(i, n_timepoints), 1:n_timepoints, var1, var2)
}
return(the_data)
}
nVars <- 6 #
varNamesX <- paste0("X_", 1:nVars)
varNamesY <- paste0("Y_", 1:nVars)
resXnames <- paste0("res_x", 1:nVars)
resYnames <- paste0("res_y", 1:nVars)
productVars <- paste("coeffcient", resXnames, sep = "_")
# Define model
manifests <- c(varNamesX, varNamesY)
latents <- c("InterceptX", "InterceptY", "coefficient", resXnames, resYnames)
model_template <- mxModel(model = "SEM",
type="RAM",
manifestVars=manifests,
latentVars=latents,
# random intercepts
mxPath(from="one", to=c("InterceptX", "InterceptY"), arrows=1,
free=TRUE, values=1, labels=c("muX","muY")),
mxPath(from=c("InterceptX"), to=varNamesX, arrows=1, free=FALSE, values=1),
mxPath(from=c("InterceptY"), to=varNamesY, arrows=1, free=FALSE, values=1),
mxPath(from = "InterceptX", to = "InterceptY", arrows = 2 , free = TRUE, label = "covXY"),
mxPath(from = "InterceptX", to = "InterceptX", arrows = 2 , free = TRUE, label = "varIX"),
mxPath(from = "InterceptY", to = "InterceptY", arrows = 2 , free = TRUE, label = "varIY"),
# residuals
mxPath(from=resXnames, to=varNamesX, arrows=1, free=FALSE, values=1),
mxPath(from=resYnames, to=varNamesY, arrows=1, free=FALSE, values=1),
mxPath(from=resXnames, to=resXnames, arrows=2, free=TRUE, values=0.1, labels="res_x"),
mxPath(from=resYnames, to=resYnames, arrows=2, free=TRUE, values=0.1, labels="res_y"),
#random slope
mxPath(from = "coefficient", to = "coefficient", arrows = 2 , free = TRUE, label = "varCoef"),
mxPath(from = "one", to = "coefficient", arrows = 1 , free = TRUE, label = "muCoef")
)
# add product nodes for random slope
model_template <- mxModel(model_template, productVars=productVars)
for(i in 1:nVars){
model_template <- mxModel(model_template,
mxPath(from=c("coefficient",resXnames[i]), to=productVars[i], arrows=1, free=FALSE, values=1), # connect the latent predictor and moderator to the product node
mxPath(from=productVars[i], to=varNamesY[i], arrows=1, free=FALSE, values=1) # connect the product node to the outcome
)
}
test <- simulateData(300, nVars)
wide_data <- test %>%
pivot_wider(names_from = "T", values_from = c("X", "Y"))
model <- mxModel(model_template, mxData(wide_data, type = "raw"))
fitted <- mxRun(model)
COnfirmed
I can confirm the issue you report. It seems odd that the optimizer seems happy with the solution, albeit with some very large standard errors for varCoef and res_y yet the local identification check fails. We will be looking into this issue, thank you for raising it!
Best,
Mike
Log in or register to post comments
The problem is that the
Log in or register to post comments
Thanks for clarifying steve!
Log in or register to post comments