impact of ignoring higher level clustering

Posted on
No user picture. George Richardson Joined: 10/18/2018
Hi OpenMx Forum,
I've read before that when cases are nested, ignoring the clustering generally affects standard errors but not the regression coefficients. Is this the case for ACE model parameters as well, e.g., when twin pairs are nested in schools or geographic regions?

Thanks!
George

Replied on Fri, 08/18/2023 - 13:27
Picture of user. mhunter Joined: 07/31/2009

The math suggests that the ACE model parameters should not be effected by ignoring higher levels of nesting *per se*. This should apply if there was only an A, C, and E variance that also existed at the school level in addition to the twin pair level. To elaborate, if you had twin pair as level 1 and school as level 2, then you could have an A variance at level one and an A variance at level 2. The level 1 A variance estimate should be unaffected by including or excluding the level 2 A variance in the model.

However, if there is some other source of variance, this likely changes things. If there is some other source of variance at the school level and you exclude that from the model, all of your estimates at the twin level could be biased. The key is the structure of the relatedness matrices (see e.g., [Hunter, Garrison, Burt, & Rodgers, 2021 in Behavior Genetics](https://doi.org/10.1007/s10519-021-10055-x)).

For intuition on this, consider the standard ACE twin model as a multilevel model itself (see e.g., [Hunter, 2021 in Behavior Genetics](https://doi.org/10.1007/s10519-021-10045-z)). Level 1 is the person; level 2 is the family. The E variance is entirely level 1; the C variance is entirely level 2. The variance is partially at level 1 and partially at level 2. The estimates of the E variance certainly change when you add or remove the C variance parameter from the model. The same generally applies to adding other variance components because the additional variance components are not totally independent of the "lower level" variance components.

By contrast in the multilevel regression case, the higher level variance components are independent of the level 1 residual variance.

Replied on Tue, 08/22/2023 - 16:53
No user picture. George Richardson Joined: 10/18/2018

In reply to by mhunter

Thanks for your response, Michael. These are interesting papers and I wish I had the second one about five years ago when I started trying to understand how univariate (or monophenotype, I see they has been relabeled) ACE factor models map onto two-level MLMs. Maybe I understand now that because there is some cross-level dependence, thinking about the higher levels is not as simple as for traditional MLM, and we seem to need a way to identify higher-level variances like in the AC'RE model you wrote about.

Do you think the estimation of higher-level variances or cross-level interactions is likely to change the overall picture provided by ACE models (i.e., A and E capturing most of the phenotype variance for most traits)? Extrapolating from the findings reported in systematic reviews of non-genetically sensitive MLM studies, I'd guess that there is not a lot of higher-level variance or cross-level interaction, but maybe it could be important if, e.g., the common environmental variance is found to be a bit more important once MLMs are used in some of the ways you have suggested?

Replied on Wed, 08/23/2023 - 20:28
Picture of user. AdminNeale Joined: 03/01/2013

In reply to by George Richardson

I doubt if higher level effects would downwardly bias estimates of C. Consider the case of a trait that linearly regresses on age. If one ignored the ages of the twin pairs, the age (level 2) effect would contribute to C.

One has to get a bit tricky to do ACE modeling in twin pairs, because what is level 1 for DZ pairs is level 2 for the MZ pairs (the .5 of the genome not shared by DZs). But thanks to definition variables and smart people like Prof Hunter, we can get around that.

Replied on Thu, 08/24/2023 - 16:11
No user picture. George Richardson Joined: 10/18/2018

In reply to by AdminNeale

Thanks, Mike. That's what I had seen in a few papers, with the inclusion of pairwise variables just reducing C, and I tried it with parental divorce before and found the same (ofc it makes sense because it doesn't vary within the pairs). Also, from what I remember, C seems similar in studies of relatives more discordant on things like divorce (e.g., full sibs and cousins) and maybe even area of residence, so I guess that appears consistent with your doubt about higher level variance components changing things much?

I'm doing some work finding a lack of early environmental effects that kind of coheres well with small estimates of C for the phenotypes involved and have had people wondering if things would look different in samples with more different environments (BTW, the last study used a national USA sample, so I even more diverse than that, I guess).

Are there many regional or cross-country studies using behavioral genetic MLMs or multi-group ACE models? I found this recent one on wellbeing, https://journals.sagepub.com/doi/full/10.1177/17456916231178716 , which seems to just simulate global data from existing ACE estimates, and this recent one on height in the BG journal, https://link.springer.com/article/10.1007/s10519-021-10047-x .

Replied on Wed, 08/30/2023 - 08:06
Picture of user. AdminNeale Joined: 03/01/2013

In reply to by George Richardson

Hi George

I'm not aware of many studies of this type, but there may be others out there. If there's one thing that seems likely to result in low estimates of C among adults, I would say it is the decay of effects over time. Once moving away from home (or cotwin), sharing environmental factors almost certainly decreases, such that C estimates will drift down to an asymptote. This is in stark contrast to the effects of genotype, which can continue to generate (co-)variation in both twins whether they live together or not. You can run away from your environment, but you cannot escape your genotype. Yet.