Product of Variables Identification

Posted on
No user picture. jkarch Joined: 03/15/2011
Hey,

my aim is to use product of variables to fit a random-intercept random-slope model with latent mean centering. To be precise, my model is as follows. $i$ indexes person, $t$ indexes observations within a person.

$$Y_{it} = \beta_i (X_{it}-meanX_i) + meanY_i + residualY_{ij}$$
$$X_{it} = meanX_i + residualX_{ij} $$

$$\beta_i \sim N(\mu_\beta, \sigma_\beta^2), \quad meanX_i \sim N(\mu_X, \sigma_X^2), \quad meanY_i \sim N(\mu_Y, \sigma_Y^2)$$

If using a fixed slope, that is, $\beta_i=\beta$, this can be fitted in any SEM programm. The path diagram is as attached. Apologies for the difference in notation. To make $\beta$ (or a in the path diagram notation) random, it needs to switch from a path loading to a latent variable, and the regression prediction is essentially represented by a product node involving the latent variable $a$ and $X_1$ or $X_2$ etc. Mplus seems to fit this model using a Bayesian approach, and there is also a tutorial on how to fit similar models using brms/stan (https://vuorre.com/posts/latent-mean-centering/).

After reading the recent paper by Steve et al. about Products of variables, I tried to fit it in OpenMx to get frequentist estimates. Unfortunately, there seems to be a problem with identification. I get the warning message: In model 'SEM' Optimizer returned a non-zero status code. It seems the model is not identified. Can somebody help with that? Is the model truely not identified or is this an OpenMx problem? I am not sure how to assess indentification with product variables. Did I make a mistake?

Below is my code:


library(OpenMx)
library(dplyr)
library(tidyr)

simulateData <- function(n_subjs, n_timepoints) {
N <- n_subjs * n_timepoints
the_data <- data.frame(ID = numeric(N), T = numeric(N), X = numeric(N), Y = numeric(N))
Var_X <- 1
residual_Y <- 1
within_coef_dist <- function() rnorm(1, mean = 0.2)
for (i in 1:n_subjs) {
rIcep1 <- rnorm(1)
rIcep2 <- rIcep1 + rnorm(1)
dev1 <- rnorm(n_timepoints)
var1 <- rIcep1 + dev1
dev2 <- within_coef_dist() * dev1 + rnorm(n_timepoints)
var2 <- rIcep2 + dev2
the_data[(i - 1) * n_timepoints + (1:n_timepoints), ] <- c(rep(i, n_timepoints), 1:n_timepoints, var1, var2)
}
return(the_data)
}

nVars <- 6 #
varNamesX <- paste0("X_", 1:nVars)
varNamesY <- paste0("Y_", 1:nVars)
resXnames <- paste0("res_x", 1:nVars)
resYnames <- paste0("res_y", 1:nVars)
productVars <- paste("coeffcient", resXnames, sep = "_")

# Define model
manifests <- c(varNamesX, varNamesY)
latents <- c("InterceptX", "InterceptY", "coefficient", resXnames, resYnames)
model_template <- mxModel(model = "SEM",
type="RAM",
manifestVars=manifests,
latentVars=latents,

# random intercepts
mxPath(from="one", to=c("InterceptX", "InterceptY"), arrows=1,
free=TRUE, values=1, labels=c("muX","muY")),
mxPath(from=c("InterceptX"), to=varNamesX, arrows=1, free=FALSE, values=1),
mxPath(from=c("InterceptY"), to=varNamesY, arrows=1, free=FALSE, values=1),
mxPath(from = "InterceptX", to = "InterceptY", arrows = 2 , free = TRUE, label = "covXY"),
mxPath(from = "InterceptX", to = "InterceptX", arrows = 2 , free = TRUE, label = "varIX"),
mxPath(from = "InterceptY", to = "InterceptY", arrows = 2 , free = TRUE, label = "varIY"),

# residuals
mxPath(from=resXnames, to=varNamesX, arrows=1, free=FALSE, values=1),
mxPath(from=resYnames, to=varNamesY, arrows=1, free=FALSE, values=1),
mxPath(from=resXnames, to=resXnames, arrows=2, free=TRUE, values=0.1, labels="res_x"),
mxPath(from=resYnames, to=resYnames, arrows=2, free=TRUE, values=0.1, labels="res_y"),

#random slope
mxPath(from = "coefficient", to = "coefficient", arrows = 2 , free = TRUE, label = "varCoef"),
mxPath(from = "one", to = "coefficient", arrows = 1 , free = TRUE, label = "muCoef")
)

# add product nodes for random slope
model_template <- mxModel(model_template, productVars=productVars)
for(i in 1:nVars){
model_template <- mxModel(model_template,
mxPath(from=c("coefficient",resXnames[i]), to=productVars[i], arrows=1, free=FALSE, values=1), # connect the latent predictor and moderator to the product node
mxPath(from=productVars[i], to=varNamesY[i], arrows=1, free=FALSE, values=1) # connect the product node to the outcome
)
}

test <- simulateData(300, nVars)
wide_data <- test %>%
pivot_wider(names_from = "T", values_from = c("X", "Y"))
model <- mxModel(model_template, mxData(wide_data, type = "raw"))
fitted <- mxRun(model)

Replied on Tue, 05/07/2024 - 10:22
Picture of user. AdminNeale Joined: 03/01/2013

Hi

I can confirm the issue you report. It seems odd that the optimizer seems happy with the solution, albeit with some very large standard errors for varCoef and res_y yet the local identification check fails. We will be looking into this issue, thank you for raising it!

Best,
Mike

Replied on Fri, 05/10/2024 - 10:10
Picture of user. sboker Joined: 07/28/2009

The problem is that the residuals always have a mean of zero. Products with multiplicands that have a mean of zero are unidentified in the method we use.