Definition variables, use rows outside of MxData

6 posts / 0 new
Offline
Joined: 01/09/2020 - 14:36
Definition variables, use rows outside of MxData

Hey,

I'm unsure if I understood the concept of definition variables right. So I have an ordinal manifest and I would like to add covariates that are calculated outside of mxData which I use to calculate the models (i.e. covariates are calculated using a bigger sample, but only a smaller sample is used for analysis). With continuous variables, I would know how do it (simply performing a linear regression beforehand with the bigger sample and only using the residuals in the actual analysis with the smaller sample). How can I do this with definition variables? Or is it even possible?

P.S. I'm using path specification

Offline
Joined: 07/31/2009 - 15:12
Hi Leo,

Hi Leo,

Definition variables don't work exactly the way you think they do, but I think they can help you with your problem. Definition variables are observed variables that you think affect the structure of your model and the relationships between other variables. You can use these to do regression as you describe, but they have lots of other uses, including the use of zygocity or time variables as shown in the documentation. I'm happy to describe extended uses if of interest.

OpenMx doesn't have an easy way for your sample size to change part of the way through your analysis, but it does have excellent missing data handling. Why are you trying to work with two different sample sizes? If it's because you're missing data on a key variable in your second analysis, that's typically not a problem (conditional on your missing data mechanism).

For example, let's consider a covariate C, a predictor X, and an outcome Y. You're missing data on Y, and want to find the effect of X on Y controlling for C. You're proposing a two-step analysis:regress X on C, get residuals, then do a second regression of residualized X on Y for people with both variables. However, in OpenMx, you can just put all three variables in the same model, fit a multiple regression where C and X covary and both predict X, and OpenMx will use all available data for each part of the analysis.

Would that work for your problem?

Offline
Joined: 01/09/2020 - 14:36
Hi Ryne. Thank you for your

Hi Ryne. Thank you for your reply. So my question was just a first step in solving my real problem. I have two measurements of a phenotype at different times t1: 2014-2016, t2: 2017-2018 of the same twins with 3 cohorts of different ages. Now at time 1, heritability is increasing with age which is consistent with the literature. However, at time 2 all of the sudden heritability decreases (although all respondents are older now) and a substantial C appears even if it wasn’t there at time one. I analyzed the cohorts separately. Now I strongly believe this is connected to the year of measurement (election year at time 2). However, I have methodological difficulties in examining this closer. My thought right now is to try gxe. See if time of measurement is moderating the phenotype (after controlling for age, as everyone is getting older). What would be a way to test this?

Offline
Joined: 07/31/2009 - 15:12

If your trying to test whether the heritability is the same at each wave, I'd put both waves in the same model (so four observed variables, twin1t1, twin2t1, twin1t2, twin2t2). You can then test the equality of the heritability parameters via constraint.

Offline
Joined: 01/09/2020 - 14:36
Hi Ryne,

Hi Ryne,

I have already done that. The second wave does differ in A, C and E. I am thinking more about the following: As there is some variation of the time of measurement (2014-2018), I was thinking if that could be analyzed continuously, even maybe on a monthly basis so I end up with a graph that looks very similarly to GxE graphs? I am thinking about: year of measurement moderates the phenotype (after controlling for age!). Or: age moderates the phenotype, after controlling for year of measurement.

Offline
Joined: 07/31/2009 - 15:12
You could do that. A lot

You could do that. A lot would depend on the distribution of (monthly) timepoints. If you think it's election-related, you're going to have a very non-linear effect. Assuming you mean US elections, you'll have a small spike in the second half of 2016 and a larger one late in 2018. Do you have enough data at those months to consider a sophisticated monthly non-linear model? Else, I'd probably keep it at year.