Definition variables, use rows outside of MxData

Posted on
No user picture. Leo Joined: 01/09/2020
Hey,

I'm unsure if I understood the concept of definition variables right. So I have an ordinal manifest and I would like to add covariates that are calculated outside of mxData which I use to calculate the models (i.e. covariates are calculated using a bigger sample, but only a smaller sample is used for analysis). With continuous variables, I would know how do it (simply performing a linear regression beforehand with the bigger sample and only using the residuals in the actual analysis with the smaller sample). How can I do this with definition variables? Or is it even possible?

Thanks in advance!
P.S. I'm using path specification

Replied on Mon, 07/20/2020 - 08:57
Picture of user. Ryne Joined: 07/31/2009

Hi Leo,

Definition variables don't work exactly the way you think they do, but I think they can help you with your problem. Definition variables are observed variables that you think affect the structure of your model and the relationships between other variables. You can use these to do regression as you describe, but they have lots of other uses, including the use of zygocity or time variables as shown in the documentation. I'm happy to describe extended uses if of interest.

OpenMx doesn't have an easy way for your sample size to change part of the way through your analysis, but it does have excellent missing data handling. Why are you trying to work with two different sample sizes? If it's because you're missing data on a key variable in your second analysis, that's typically not a problem (conditional on your missing data mechanism).

For example, let's consider a covariate C, a predictor X, and an outcome Y. You're missing data on Y, and want to find the effect of X on Y controlling for C. You're proposing a two-step analysis:regress X on C, get residuals, then do a second regression of residualized X on Y for people with both variables. However, in OpenMx, you can just put all three variables in the same model, fit a multiple regression where C and X covary and both predict X, and OpenMx will use all available data for each part of the analysis.

Would that work for your problem?

Replied on Mon, 07/20/2020 - 15:31
No user picture. Leo Joined: 01/09/2020

Hi Ryne. Thank you for your reply. So my question was just a first step in solving my real problem. I have two measurements of a phenotype at different times t1: 2014-2016, t2: 2017-2018 of the same twins with 3 cohorts of different ages. Now at time 1, heritability is increasing with age which is consistent with the literature. However, at time 2 all of the sudden heritability decreases (although all respondents are older now) and a substantial C appears even if it wasn’t there at time one. I analyzed the cohorts separately. Now I strongly believe this is connected to the year of measurement (election year at time 2). However, I have methodological difficulties in examining this closer. My thought right now is to try gxe. See if time of measurement is moderating the phenotype (after controlling for age, as everyone is getting older). What would be a way to test this?
Replied on Wed, 07/22/2020 - 08:49
Picture of user. Ryne Joined: 07/31/2009

In reply to by Leo

If your trying to test whether the heritability is the same at each wave, I'd put both waves in the same model (so four observed variables, twin1t1, twin2t1, twin1t2, twin2t2). You can then test the equality of the heritability parameters via constraint.
Replied on Wed, 07/22/2020 - 09:22
No user picture. Leo Joined: 01/09/2020

Hi Ryne,

I have already done that. The second wave does differ in A, C and E. I am thinking more about the following: As there is some variation of the time of measurement (2014-2018), I was thinking if that could be analyzed continuously, even maybe on a monthly basis so I end up with a graph that looks very similarly to GxE graphs? I am thinking about: year of measurement moderates the phenotype (after controlling for age!). Or: age moderates the phenotype, after controlling for year of measurement.

Replied on Wed, 07/22/2020 - 11:34
Picture of user. Ryne Joined: 07/31/2009

In reply to by Leo

You could do that. A lot would depend on the distribution of (monthly) timepoints. If you think it's election-related, you're going to have a very non-linear effect. Assuming you mean US elections, you'll have a small spike in the second half of 2016 and a larger one late in 2018. Do you have enough data at those months to consider a sophisticated monthly non-linear model? Else, I'd probably keep it at year.