You are here

multiple group LGM when one group has incomplete wave structure

19 posts / 0 new
Last post
ethibodeau's picture
Offline
Joined: 05/16/2018 - 17:05
multiple group LGM when one group has incomplete wave structure

I'm interested in running a multiple group latent growth model to test for invariance in growth parameters across two groups, by constraining particular parameters to be equal across groups (vs freely estimating) and doing LL chi-square model comparisons.

My groups are a little awkward because they're actually two independent studies. Both studies measured a variable across time in adolescence; the variables have been harmonized across studies and the range in ages/birth cohorts is relatively similar. However, one study had 5 waves of data collection and the other study has 3 waves. Say I wanted to know if the mean and variance of the intercept was invariant across studies.

From what I understand, a multiple group model with parameters constrained across groups is not possible to estimate using traditional FIML unless both groups have complete covariance structures. Males and females cannot be modeled in a multiple group CFA if all males happen to be missing values for one or more of the CFA manifest indicators. From what I understand, the logic extends to growth factors. While I'm not using groups from one study, rather im treating two independent studies as the groups in order to test for invariance of parameter estimates. If I run this model in Mplus, for example, where I specify a multiple group LGM with five time variables, the model fails to run because the study with three waves is being treated as if there is complete missing data on the 4th and 5th wave variables (i.e. incomplete covariance structure). This is the case even when I use age as the time metric (creating individually varying definition variables and fixing slope factor loadings to each individuals age at respective wave). Similar problem occurs if I specify a three time point multiple group LGM. One fix in Mplus is to use a pattern mixture procedure (see Kim et al., Widaman et al, below) by specifying a growth mixture model with known classes based on the grouping variable (in my case study). From what I understand, this is a fix because the mixture modeling estimation procedure does not require complete covariance structure as each study is assumed to be a derivative of one overall mixture. Parameters can be constrained across latent classes, covariance structures are allowed to be class (group) specific, allowing for LL model comparisons. According to Kim et al's simulation, running a multiple group LGM when both groups have complete covariance structure produces equivalent results when conducted through a pattern mixture procedure. Widaman et al. describe other work arounds not focused on mixture modeling procedures.

Does anyone know 1) if this is an issue with the FIML estimation procedure used by openmx and if so, 2.) are there recommended solutions?

thanks,

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3864537/
https://www.tandfonline.com/doi/full/10.1080/10705511.2013.797819

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
probably not an issue
Does anyone know 1) if this is an issue with the FIML estimation procedure used by openmx and if so, 2.) are there recommended solutions?

It should not be an issue, provided that the group with 3 waves of assessment has a 3x3 expected covariance matrix and an order-3 mean vector, and the group with 5 waves has a 5x5 expected covariance matrix and an order-5 mean vector (I'm assuming here that participants were sampled independently in both studies). In other words, don't construct the 3-wave group with a dataset that has columns full of NAs for waves 4 and 5 and treat all 5 waves as endogenous variables of the model. Obviously, you won't be able to test for factorial invariance for anything more than a first-degree (intercept and slope only) growth curve, since anything higher would be degenerate in the 3-wave group. And, if you allow a different residual variance at each wave, then obviously you won't be able to test for invariance w/r/t the wave 4 & 5 residual variances. How complicated is your model going to be, anyhow?

Although you could do a growth mixture model in OpenMx if you wanted to, it shouldn't be necessary. That approach looks to be a workaround for a peculiarity of MPlus.

ethibodeau's picture
Offline
Joined: 05/16/2018 - 17:05
what about LR chi-square difference test?

Thanks Rob, this is super helpful indeed! Let's assume I do run a simultaneous multiple group model with one group having a 3x3 expected covariance matrix and an order 3 mean vector, and the other group having a 5x5 expected covariance matrix and an order 5 mean vector. Let's assume that I'm testing for invariance with respect to slope and intercept means, and to do so I run two such multiple group models. In one of these multiple group models I place equality constraints on all parameter estimates. In the other multiple group model I freely estimate the slope/intercept means across groups. I then preform a LR chi-square difference test as a formal test of slope/intercept mean invariance. Doesn't such a test assume some basic degree of configural invariance, at least with respect to number of indicators? I can't seem to find any literature specifically stating this. I want to run some models. But if I go with the 'pattern mixture', known class, approach, because the missing data estimation has different assumptions, I can specify a model in which the 3-wave group has 5x5 expected covariance matrix. They just happen to be missing completely on waves 4 and 5. In fact, a residual variance for those two indicators gets estimated in the 3-wave group. I'm going to pose this question to SEMNET I think, but I wonder which option is a more appropriate option. Multiple group model with each group having a unique expected covariance matrix (as you recommend) or pattern mixture multiple group model where all groups have the same expected covariance matrix and the mixture modeling estimation handles the missing data on the two waves. In any case I imagine the results will be similar. I also wonder if your suggestion will produce the same LL value as a pattern mixture multiple group model where I fix the residual variance of the 4th and 5th wave indicators to zero in the group with three waves. It's all confusing because at the end of the day, because we're using age definition variables, it's age NOT wave that is the time metric. The three wave data set and the five wave data set both have a similar age range and when modeled using age definition variables produce parameter estimates that can be directly compared. For example, the intercept mean, if age is centered at years, is interpreted as the mean level of the variable at age 16. And it's interpreted that way in both data sets regardless of wave structure.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
LGM
Doesn't such a test assume some basic degree of configural invariance, at least with respect to number of indicators?

For CFA in general, I'd say yes, at least as far as the interpretation is concerned. But in the special case of LGM, I'd say no, because LGMs are just linear mixed-effects regression recast as factor analysis. Theoretically, we don't think of the indicators in an LGM to be different variables, we think of them as the same variable at different times. Further, the loadings aren't unknown parameters--they're a deterministic function of the time metric (i.e., data).

It's all confusing because at the end of the day, because we're using age definition variables, it's age NOT wave that is the time metric. The three wave data set and the five wave data set both have a similar age range and when modeled using age definition variables produce parameter estimates that can be directly compared. For example, the intercept mean, if age is centered at years, is interpreted as the mean level of the variable at age 16. And it's interpreted that way in both data sets regardless of wave structure.

So why not use age instead of wave as the time metric? You could certainly do that with FIML in OpenMx. For people in the 3-wave group, you'd need to set their fourth and fifth scores on the response variable to NA, and set their fourth and fifth ages to a "pseudo-missing" value like -999. The reason for the pseudo-missing value is that NAs aren't allowed on definition variables, but if the scores on the endogenous variables that correspond to missing definition variables are set to NA, then OpenMx should never see the pseudo-missing value (and if it does, its extreme value should throw off the results enough to alert you that something is wrong). Note that if you are allowing different residual variances by wave (which is perhaps unwise if age is the time metric), you'll need to fix the 4th- and 5th-wave residual variances in the 3-wave group, since they would not be identified if free.

ethibodeau's picture
Offline
Joined: 05/16/2018 - 17:05
let me make sure I understand you correctly

So I have been working in Mplus and I'm wondering if this is easier in OpenMX, sounding like it is.

I am using age as the time metric, via use of definition variables. If I run a multiple group, latent growth model in Mplus, I need to specify an overall model. Even though age is the time metric, it's the number of waves that still determines the covariance structure, I think. So If one of my groups has five waves of data, I have to specify an overall model with a slope and an intercept with five indicators; it just so happens that the factor loadings from the slope are fixed individually in terms of each person's age at wave. Like you said, I give the folks in the 3 wave study missing on wave 4 and wave 5 and a pseudo-missing value for age.

However, when Mplus goes to run this model it stops and says, wait a minute, one of your groups is missing completly on two indicators (coming from waves 4 and 5). It's use of FIML makes a check of complete covariance structure, across groups, before running a multiple group model. Even though age not wave is the actual time metric.

This is the case even if you fix the residual variance of those two indicators to zero, because as you say it's not totally identified. The work around, at least from that Kim et al article is to instead specify a growth mixture model with known classes. The known classes being the two groups. Within the mixture modeling missing data estimation framework, classes are allowed to be missing completely on an indicator. Moreover, it's actually possible (at least Mplus provides this) to estimate the residual covariance of an indicator within the mixture modeling framework. It appears as if this is working for me in Mplus, but it's not as efficient as I would like and it's hard to explain in a publication.

So are you saying that openMX will run a simultaneous multiple group growth model, with use of age definition variables, when the groups have different wave structures (# of waves), allowing me to constrain parameter estimates and test for factorial invariance? Assuming I fix the residual variance of the fourth indicator in the 3 wave study?

Thanks!

ethibodeau's picture
Offline
Joined: 05/16/2018 - 17:05
and assuming I give everyone

and assuming I give everyone missing values on those two indicators and pseudo-missing ages

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
yeah
So are you saying that openMX will run a simultaneous multiple group growth model, with use of age definition variables, when the groups have different wave structures (# of waves), allowing me to constrain parameter estimates and test for factorial invariance? Assuming I fix the residual variance of the fourth indicator in the 3 wave study?

and assuming I give everyone missing values on those two indicators and pseudo-missing ages

Yes (I assume you mean you'll fix the residual variance for the fourth and fifth waves in the 3-wave group). Each group in a multigroup analysis can be an arbitrary MxModel object, except that all the groups' fitfunctions must have the same units. See for instance one of the scripts in our test suite, models/passing/checkStandardizedLoadingsEtcetera.R. The MxModel bigmod is a multigroup model in which the groups are a confirmatory factor analysis, a linear growth model, and a simple monophenotype two-group twin model. Since all the groups' fitfunctions are in -2lnL units, the multigroup fitfunction is able to sum the groups' fitfunction values to get an omnibus fitfunction value. Granted, this is an artificial example, because it is meant to test OpenMx features rather than analyze real data, but it does serve to illustrate that OpenMx doesn't care how different the groups in a multigroup analysis are from one another (as long as they have compatible fit units).

It would also work to use the approach I described in my first reply to this thread.

ethibodeau's picture
Offline
Joined: 05/16/2018 - 17:05
equality constraints

I like that option, I don't think you can do something like that in Mplus unless you run such models within a mixture modeling framework.

One final question related to this. If groups in a multiple group model in OpenMx can be arbitrary with respect to any aspect, can model constraints be trusted when the purpose is to test for factorial invariance?

What I want to do is run a multiple group growth model to test for factorial invariance via the procedure of constraining parameter estimates to equality across groups vs. freeing parameters across groups followed by LR chi-square difference tests.

Isn't the correct estimation of equal parameters across groups, for the purpose of invariance testing, predicated on one overall model implied mean vector and covariance matrix structure? Such that the estimation is trying to find parameter estimates that are equal across groups with the assumption that those groups both have the same model structure? I think this question is slightly different than my question regarding structural invariance needing to be prerequisite for factorial invariance in growth models. Thanks!

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
not sure I follow

Sorry, I don't really understand the question?

Also, bear in mind that, because the definition variables may vary from person to person, the model-expected mean vector and covariance matrix may also vary from person to person.

ethibodeau's picture
Offline
Joined: 05/16/2018 - 17:05
I'll try asking it this way

let's assume I was going to do a standard multiple group growth model from one data set that had 5 waves.

Let's assume my grouping variable was sex ( male vs female). I would specify two 5 vector mean, 5x5 covariance matrix models to be run simultaneously, one for each sex. If I constrained all parameters to equality across the groups, that is equivalent to running the growth model on the data set without regard for group. That is, my group specific parameter estimates would equal the parameter estimates from the growth model that disregarded sex.

If I run a multiple group growth model in OpenMx with my particular data, where I specify a 3 vector mean 3x3 covariance matrix for my 3 wave data and a 5 vector mean 5x5 covariance matrix for my 5 wave data and constrain all parameters to equality across the two models, would you expect that to be equivalent to running one 5 vector mean 5x5 covariance matrix model where the 3 wave data was giving NAs in the 4th and 5th wave columns and given pseudo-missing age values? At least with respect to parameter estimates like the means, variances, and covariances of the growth factors. If so, then that's what I mean by correctly estimating the constrained parameters for use in an invariance testing setting. If done as a mixture model that is what happens. But if the same results are obtained using OpenMx's ability to specify multiple group models with arbitrary group specific mean and covariance structures, then that would make more sense to me.

"Also, bear in mind that, because the definition variables may vary from person to person, the model-expected mean vector and covariance matrix may also vary from person to person." --- is this why it's very hard to test for absolute fit to the data in growth models that have age definition variables. If people vary in their model-expected mean vector and covariance matrix, how can absolute fit be assessed?

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
go for it
If I run a multiple group growth model in OpenMx with my particular data, where I specify a 3 vector mean 3x3 covariance matrix for my 3 wave data and a 5 vector mean 5x5 covariance matrix for my 5 wave data and constrain all parameters to equality across the two models, would you expect that to be equivalent to running one 5 vector mean 5x5 covariance matrix model where the 3 wave data was giving NAs in the 4th and 5th wave columns and given pseudo-missing age values?

Yes. When a row of raw data has missing values on the endogenous variables, OpenMx drops the rows and columns corresponding to the NAs from the mean vector and covariance matrix, for the purpose of evaluating the marginal loglikelihood of the non-missing data in that row.

"Also, bear in mind that, because the definition variables may vary from person to person, the model-expected mean vector and covariance matrix may also vary from person to person." --- is this why it's very hard to test for absolute fit to the data in growth models that have age definition variables. If people vary in their model-expected mean vector and covariance matrix, how can absolute fit be assessed?

I don't have a very good answer to that question, but perhaps something like the "quadratic loss" metric I used for cross-validation in this article? It represents both (1) variation of data rows about their model-expected first moment (the mean vector), and (2) variation of outer products of data rows with themselves about their model-expected second moment (see article). If the model predicted each individual's conditional mean and variance perfectly, the quadratic loss would be zero.

ethibodeau's picture
Offline
Joined: 05/16/2018 - 17:05
Thanks!

All very helpful, thanks for answering all my questions

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
You're welcome.

You're welcome.

ethibodeau's picture
Offline
Joined: 05/16/2018 - 17:05
related question

A related question is can you run a quadratic growth model in openMX with data that only has 3 waves of data but using age as the time metric (via definition variables)? Same problem in Mplus, it's the wave structure that seems limiting even when it's not even the time metic being used.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
A quadratic growth model

A quadratic growth model would be degenerate with only three timepoints. There'd be no residual variance.

ethibodeau's picture
Offline
Joined: 05/16/2018 - 17:05
can you do the same model in the linear mixed effects framework?

Isn't it possible to run a linear mixed effects variable where age is the time variable, it's run as quadratic model, but it just so happens that there are only 3 waves of data that provided the age variable? If so, why can't such a model be run in the SEM framework? Thanks

ethibodeau's picture
Offline
Joined: 05/16/2018 - 17:05
Isn't it possible to run a

Isn't it possible to run a linear mixed effects model**...

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
saturated is degenerate

Well, OK, the saturated form of such a model would be degenerate. If there is no random variation in, say, curvature (the quadratic coefficient), then it would be OK.

AdminNeale's picture
Offline
Joined: 03/01/2013 - 14:09
Maximum of 3 occasions within-person, many more between persons

With definition variables for age, many more than 3 ages may be assessed across the sample, if people differ in either their ages at assessment or the intervals between their assessments (or both). It seems to me that a quadratic model may be identified under these circumstances - somewhat akin to estimating LD score/GREML genetic correlations between two traits when nobody has been assessed on both traits.

Three equal intervals on a sample of identically-aged subjects would indeed be inadequate for quadratic growth, but I'm not so sure when the intervals vary across persons.