# Product of Variables Identification

Attachment | Size |
---|---|

model (8).pdf | 56.42 KB |

my aim is to use product of variables to fit a random-intercept random-slope model with latent mean centering. To be precise, my model is as follows. $i$ indexes person, $t$ indexes observations within a person.

$$Y_{it} = \beta_i (X_{it}-meanX_i) + meanY_i + residualY_{ij}$$

$$X_{it} = meanX_i + residualX_{ij} $$

$$\beta_i \sim N(\mu_\beta, \sigma_\beta^2), \quad meanX_i \sim N(\mu_X, \sigma_X^2), \quad meanY_i \sim N(\mu_Y, \sigma_Y^2)$$

If using a fixed slope, that is, $\beta_i=\beta$, this can be fitted in any SEM programm. The path diagram is as attached. Apologies for the difference in notation. To make $\beta$ (or a in the path diagram notation) random, it needs to switch from a path loading to a latent variable, and the regression prediction is essentially represented by a product node involving the latent variable $a$ and $X_1$ or $X_2$ etc. Mplus seems to fit this model using a Bayesian approach, and there is also a tutorial on how to fit similar models using brms/stan (https://vuorre.com/posts/latent-mean-centering/).

After reading the recent paper by Steve et al. about Products of variables, I tried to fit it in OpenMx to get frequentist estimates. Unfortunately, there seems to be a problem with identification. I get the warning message: In model 'SEM' Optimizer returned a non-zero status code. It seems the model is not identified. Can somebody help with that? Is the model truely not identified or is this an OpenMx problem? I am not sure how to assess indentification with product variables. Did I make a mistake?

Below is my code:

library(OpenMx)

library(dplyr)

library(tidyr)

```
```simulateData <- function(n_subjs, n_timepoints) {

N <- n_subjs * n_timepoints

the_data <- data.frame(ID = numeric(N), T = numeric(N), X = numeric(N), Y = numeric(N))

Var_X <- 1

residual_Y <- 1

within_coef_dist <- function() rnorm(1, mean = 0.2)

for (i in 1:n_subjs) {

rIcep1 <- rnorm(1)

rIcep2 <- rIcep1 + rnorm(1)

dev1 <- rnorm(n_timepoints)

var1 <- rIcep1 + dev1

dev2 <- within_coef_dist() * dev1 + rnorm(n_timepoints)

var2 <- rIcep2 + dev2

the_data[(i - 1) * n_timepoints + (1:n_timepoints), ] <- c(rep(i, n_timepoints), 1:n_timepoints, var1, var2)

}

return(the_data)

}

nVars <- 6 #

varNamesX <- paste0("X_", 1:nVars)

varNamesY <- paste0("Y_", 1:nVars)

resXnames <- paste0("res_x", 1:nVars)

resYnames <- paste0("res_y", 1:nVars)

productVars <- paste("coeffcient", resXnames, sep = "_")

# Define model

manifests <- c(varNamesX, varNamesY)

latents <- c("InterceptX", "InterceptY", "coefficient", resXnames, resYnames)

model_template <- mxModel(model = "SEM",

type="RAM",

manifestVars=manifests,

latentVars=latents,

# random intercepts

mxPath(from="one", to=c("InterceptX", "InterceptY"), arrows=1,

free=TRUE, values=1, labels=c("muX","muY")),

mxPath(from=c("InterceptX"), to=varNamesX, arrows=1, free=FALSE, values=1),

mxPath(from=c("InterceptY"), to=varNamesY, arrows=1, free=FALSE, values=1),

mxPath(from = "InterceptX", to = "InterceptY", arrows = 2 , free = TRUE, label = "covXY"),

mxPath(from = "InterceptX", to = "InterceptX", arrows = 2 , free = TRUE, label = "varIX"),

mxPath(from = "InterceptY", to = "InterceptY", arrows = 2 , free = TRUE, label = "varIY"),

# residuals

mxPath(from=resXnames, to=varNamesX, arrows=1, free=FALSE, values=1),

mxPath(from=resYnames, to=varNamesY, arrows=1, free=FALSE, values=1),

mxPath(from=resXnames, to=resXnames, arrows=2, free=TRUE, values=0.1, labels="res_x"),

mxPath(from=resYnames, to=resYnames, arrows=2, free=TRUE, values=0.1, labels="res_y"),

#random slope

mxPath(from = "coefficient", to = "coefficient", arrows = 2 , free = TRUE, label = "varCoef"),

mxPath(from = "one", to = "coefficient", arrows = 1 , free = TRUE, label = "muCoef")

)

# add product nodes for random slope

model_template <- mxModel(model_template, productVars=productVars)

for(i in 1:nVars){

model_template <- mxModel(model_template,

mxPath(from=c("coefficient",resXnames[i]), to=productVars[i], arrows=1, free=FALSE, values=1), # connect the latent predictor and moderator to the product node

mxPath(from=productVars[i], to=varNamesY[i], arrows=1, free=FALSE, values=1) # connect the product node to the outcome

)

}

`test <- simulateData(300, nVars)`

wide_data <- test %>%

pivot_wider(names_from = "T", values_from = c("X", "Y"))

model <- mxModel(model_template, mxData(wide_data, type = "raw"))

fitted <- mxRun(model)

## COnfirmed

I can confirm the issue you report. It seems odd that the optimizer seems happy with the solution, albeit with some very large standard errors for varCoef and res_y yet the local identification check fails. We will be looking into this issue, thank you for raising it!

Best,

Mike

Log in or register to post comments

## The problem is that the

Log in or register to post comments

## Thanks for clarifying steve!

Log in or register to post comments