Attachment | Size |
---|---|
data_wide.csv [6] | 9.36 KB |
I am trying to fit a nonlinear latent growth curve model (as part of simulations I am conducting) using the three-parameter logistic (s-shaped change) function below
$$y_{obs} = \frac{diff}{1+e^{\frac{\beta - time_i}{\gamma}}}, $$
where $diff$ represents the difference between the starting and ending values (i.e., how much change occurs over time), $\beta$ represents the number of days needed to reach the midpoint of the logistic curve, and $\gamma$ represents the number of days needed to go from the midpoint to approximately 73% of the observed change (i.e., a satiation point). Because the SEM framework only allows any given outcome variable to be predicted as a weighted sum of the predictor variables (i.e., SEM framework is inherently linear), the logistic function cannot simply be inserted into a latent growth curve model. As per the structured latent curve approach outlined in several articles (e.g., Browne & du Toit, 1991; Grimm, Ram, & Estabrook 2010; Preacher & Hancock 2015), I linearized the logistic function using a first-order Taylor series expansion.
The code below outlines the nonlinear latent growth curve model that I have fit to a generated data set (data_wide.csv
). The generated data set contains scores from 100 people measured at five points over a year period (specifically on days 1, 150, 180, 210, and 360) and is generated from a population model where change is logistic. I have also attached a set of starting values (generated by largely copying the starting value procedure in MPlus) to initialize parameter estimation.
Given that my model contains latent variables (i.e., diff, beta[$\beta$], and gamma [$\gamma$]), I would like to use a parameter estimation algorithm that is well suited to dealing with latent variables. From what I understand, the mxFitFunctionML()
function uses full information maximum likelihood (FIML) estimation to estimate parameter values when raw data are used (as in my case). Unfortunately, (and from what I understand about estimation algorithms), FIML estimation is not well suited to dealing with latent variables since it does not iteratively compute lower bounds of the marginal log-likelihood and maximize these lower bounds (i.e., the e- and m-steps of the EM algorithm). Given the iterative nature of the EM algorithm, it makes sense that I have realized considerable performance improvements (both in time and parameter estimation accuracy) when I have used the EM algorithm (versus FIML) to fit growth mixture models.
Although I was successful in implementing the EM algorithm for growth mixture models, it was largely as a result of copying over the example code from the documentation of mxComputeEM()
. I have looked at other posts and have not been able to work out how to implement the EM algorithm for my nonlinear latent growth curve model. Just wondering if anyone might be able to help me out.
data_wide <- read_csv('data_wide.csv') #data table of randomly generated starting values (procedure largely mirrors process followed in MPlus) starting_values <- data.table(beta_fixed = 65.57385, gamma_fixed = 20, diff_fixed = 80, beta_rand = 21.75747, gamma_rand = 0.5, diff_rand = 0.8290784, epsilon = 0.9832023 ) #setup variables manifest_vars <- c("t1_1", "t2_150", "t3_180", "t4_210", "t5_360") latent_vars <- c('diff', 'beta', 'gamma') measurement_days <- as.numeric(str_extract(string = names(data_wide[ 2:ncol(data_wide)]), pattern = '[^_]*$')) #1 150 180 210 360; sets values for 'time' object #code adapted from Grimm, Ram, & Estabrook (2016; Chapter 12) model <- mxModel(model = 'nonlin_lgc', type = 'RAM', mxData(observed = data_wide, type = 'raw'), manifestVars = manifest_vars, latentVars = latent_vars, #Manifest means manMeans = mxPath(from = 'one', to = manifest_vars, free = FALSE, arrows = 1, values = 0), #Latent variable means (linear parameters). ##Note that the nonlinear parameters (i.e., beta, gamma) do not have estimated means mxPath(from = 'one', to = c('diff'),free = T, arrows = 1, values = starting_values$diff_fixed, labels = 'diff'), #Residual variances; by using one label, they are assumed to all be equal (homogeneity of variance) mxPath(from = manifest_vars, arrows=2, free=TRUE, values = starting_values$epsilon, labels='epsilon'), #Latent variable covariances and variances mxPath(from = latent_vars, connect='unique.pairs', arrows=2, #aa(var_diff), ab(cov_diff_beta), ac(cov_diff_gamma), bb(var_beta), bc(var_beta_gamma), cc(var_gamma) free = c(TRUE, FALSE, FALSE, TRUE, FALSE, TRUE), values=c(starting_values$diff_rand, NA,NA, starting_values$beta_rand, NA, starting_values$gamma_rand), labels=c('var_diff', 'NA(cov_diff_beta)','NA(cov_diff_gamma)', 'var_beta', 'NA(var_beta_gamma)', 'var_gamma')), #Functional constraints mxMatrix(type = 'Full', nrow = length(manifest_vars), ncol = 1, free = TRUE, values = starting_values$diff_fixed, labels = 'diff', name = 'd'), mxMatrix(type = 'Full', nrow = length(manifest_vars), ncol = 1, free = TRUE, values = starting_values$beta_fixed, labels = 'beta', name = 'b'), mxMatrix(type = 'Full', nrow = length(manifest_vars), ncol = 1, free = TRUE, values = starting_values$gamma_fixed, labels = 'gamma', name = 'g'), mxMatrix(type = 'Full', nrow = length(manifest_vars), ncol = 1, free = FALSE, values = measurement_days, name = 'time'), #partial derivative of logistic function with respect to each parameter (diff, beta, gamma, respectively) mxAlgebra(expression = 1/(1 + exp((beta - time)/gamma)), name="Dl"), mxAlgebra(expression = -(diff*(exp((beta - time)/gamma) * (1/gamma))/(1 + exp((beta - time)/gamma))^2), name = 'Bl'), mxAlgebra(expression = diff * (exp((beta - time)/gamma) * ((beta - time)/gamma^2))/(1 + exp((beta - time)/gamma))^2, name = 'Gl'), #Factor loadings; all fixed and, importantly, constrained to equal their partial derivatives mxPath(from = 'diff', to = manifest_vars, arrows=1, free=FALSE, labels = c("Dl[1,1]", "Dl[2,1]", "Dl[3,1]", "Dl[4,1]", "Gl[5,1]")), mxPath(from='beta', to = manifest_vars, arrows=1, free=FALSE, labels = c("Bl[1,1]", "Bl[2,1]", "Bl[3,1]", "Bl[4,1]", "Gl[5,1]")), mxPath(from='gamma', to = manifest_vars, arrows=1, free=FALSE, labels = c("Gl[1,1]", "Gl[2,1]", "Gl[3,1]", "Gl[4,1]", "Gl[5,1]")), #Enable likelihood vector; returns vector of likelihoods for each data point (i.e., p(X|parameter_values)) mxFitFunctionML(vector = TRUE) #implement EM algorithm to estimate parameter values?? ) mxRun(model) #it appears no errors (hopefully) were made in defining the model, but the model does not converge (can I use EM algorithm?)
#not sure how to repurpose this code for the nonlinear latent growth curve model mxModel( "Mixture", data4mx, class1, class2, mxAlgebra((1-Posteriors) * Class1.fitfunction, name="PL1"), mxAlgebra(Posteriors * Class2.fitfunction, name="PL2"), mxAlgebra(PL1 + PL2, name="PL"), mxAlgebra(PL2 / PL, recompute='onDemand', initial=matrix(runif(N,.4,.6), nrow=N, ncol = 1), name="Posteriors"), mxAlgebra(-2*sum(log(PL)), name="FF"), mxFitFunctionAlgebra(algebra="FF"), mxComputeEM( estep=mxComputeOnce("Mixture.Posteriors"), mstep=mxComputeGradientDescent(fitfunction="Mixture.fitfunction")))
References
Browne, M. W., & du Toit, S. H. C. (1991). Models for learning data. In L. M. Collins & J. L. Horn (Eds.), Best methods for the analysis of change (pp. 47–68). Washington, DC: American Psychological Association.
Grimm, K. J., Ram, N., & Estabrook, R. (2010). Nonlinear structured growth mixture models in Mplus and OpenMx. Multivariate Behavioral Research, 45(6), 887–909. https://doi.org/10.1080/00273171.2010.531230
Preacher, K. J., & Hancock, G. R. (2015). Meaningful aspects of change as novel random coefficients: A general method for reparameterizing longitudinal models. Psychological Methods, 20(1), 84–101. http://dx.doi.org/10.1037/met0000028