ACE including non-twin siblings

Posted on
No user picture. Aurina Joined: 01/05/2016
Dear all,

I'm trying to adapt ACE heritability model to include both twins and their non-twin siblings. The data used for the analysis are imaging-derived phenotypes from the human connectome project (HCP) which includes both MZ and DZ twin pairs as well as their non-twin siblings for a subset of twins. Would be really helpful if anyone could answer a few questions.

Without any additional assumptions about the data, is it acceptable to treat the non-twin siblings of twins as an additional DZ twin? Or should common environmental effects be separated into sibling-specific effects and twin-specific effects? There is also an issue that not all twin pairs have a non-twin sibling. How much missing data could be allowed in such an analysis?

Below is my attempt to modify the ACE model to include non-twin siblings. Could you please advise me if this modification including the use of covariates and matrix specification is correct (in case we're treating non-twin siblings as an additional DZ twin)? Also, can variables used in the heritability analysis be z-scored or should they be used in their original scale?

Thanks a lof for your help!


data_covar = readMat("TESTcovar.mat")
data = readMat("TESTdata.mat")

# mark 0 values as NA because these will be the outliers in the data - we want to exclude them.
data$Output.DZ[data$Output.DZ == 0] <- NA
data$Output.MZ[data$Output.MZ == 0] <- NA

edge=1;
# each connection (phenotype of interest) is denoted as edge - here just selecting first to do the analysis
myDataMZ<-data.frame(data$Output.MZ[,,edge], data_covar$MZ.age[,1], data_covar$MZ.age[,2],data_covar$MZ.age[,3],data_covar$MZ.sex[,1],data_covar$MZ.sex[,2],data_covar$MZ.sex[,3])
myDataDZ<-data.frame(data$Output.DZ[,,edge], data_covar$DZ.age[,1], data_covar$DZ.age[,2],data_covar$DZ.age[,3],data_covar$DZ.sex[,1],data_covar$DZ.sex[,2],data_covar$DZ.sex[,3])

myDataMZ_measure<-data$Output.MZ[,,edge]
myDataDZ_measure<-data$Output.DZ[,,edge]

colnames(myDataMZ) <- c('twin1', 'twin2', 'sib','ageT1MZ', 'ageT2MZ', 'ageSIBMZ', 'sexT1MZ', 'sexT2MZ', 'sexSIBMZ')
colnames(myDataDZ) <- c('twin1', 'twin2', 'sib','ageT1DZ', 'ageT2DZ', 'ageSIBDZ', 'sexT1DZ', 'sexT2DZ', 'sexSIBDZ')
selVars <- c('twin1','twin2', 'sib')

# "complete" specifies use only cases with data in all columns;
# Could this be computed if some twin pairs are missing a non-twin sibling?
CovMZ = cov(data$Output.MZ[,,edge],use="complete")
CovDZ = cov(data$Output.DZ[,,edge],use="complete")

# mean across MZ twins
MeanMZ = colMeans(data$Output.MZ[,1:2,edge],na.rm=TRUE)
MeanMZ = mean(MeanMZ)
# mean across DZ twins
MeanDZ = colMeans(data$Output.DZ[,1:2,edge],na.rm=TRUE)
MeanDZ = mean(MeanDZ)
# mean across siblings
MeanSIBMZ = mean(data$Output.MZ[,3,edge],na.rm=TRUE)
MeanSIBDZ = mean(data$Output.DZ[,3,edge],na.rm=TRUE)

# -----------------------------------------------------------------------
#Fit ACE Model with RawData and Matrices Input
# -----------------------------------------------------------------------
twinACE <- mxModel("twinACE",
# Matrices X, Y, and Z to store a, c, and e path coefficients
mxMatrix( type="Full", nrow=1, ncol=1, free=TRUE, values=sqrt((CovDZ[2,2]/3)), label="a", name="X" ),
mxMatrix( type="Full", nrow=1, ncol=1, free=TRUE, values=sqrt((CovDZ[2,2]/3)), label="c", name="Y" ),
mxMatrix( type="Full", nrow=1, ncol=1, free=TRUE, values=sqrt((CovDZ[2,2]/3)), label="e", name="Z" ),
# Matrices A, C, and E compute variance components
mxAlgebra( expression=X %*% t(X), name="A" ),
mxAlgebra( expression=Y %*% t(Y), name="C" ),
mxAlgebra( expression=Z %*% t(Z), name="E" ),

# Algebra for expected variance/covariance matrix in MZ+sibling
mxAlgebra(
expression= rbind (cbind(A+C+E , A+C, 0.5%x%A+C),
cbind(A+C , A+C+E, 0.5%x%A+C),
cbind(0.5%x%A+C, 0.5%x%A+C, A+C+E)),
name="expCovMZ"),

# Algebra for expected variance/covariance matrix in DZ+siblig
mxAlgebra(
expression= rbind (cbind(A+C+E , 0.5%x%A+C, 0.5%x%A+C),
cbind(0.5%x%A+C , A+C+E, 0.5%x%A+C),
cbind(0.5%x%A+C , 0.5%x%A+C, A+C+E)),
name="expCovDZ"),

mxModel("MZ", mxData( observed=myDataMZ, type="raw" ),
# Algebra for making the means a function of the definition variables age and sex
mxMatrix( type="Full", nrow=1, ncol=3, free=T, c(MeanMZ,MeanMZ,MeanSIBMZ), labels =c("b0_mz1","b0_mz2","b0_mzsib"), name="Intercepts"),
mxMatrix( type="Full", nrow=1, ncol=2, free=T, values= 0, labels=c("betaAge","betaSex"), name="beta"),
mxMatrix( type="Full", nrow=2, ncol=3, free=F, labels=c("data.ageT1MZ","data.sexT1MZ","data.ageT2MZ","data.sexT2MZ","data.ageSIBMZ","data.sexSIBMZ"), name="MZDefVars"),
mxAlgebra( expression=Intercepts + beta %*% MZDefVars, name="expMeanMZ"),
mxExpectationNormal( covariance="twinACE.expCovMZ", means="expMeanMZ", dimnames=selVars ),
mxFitFunctionML()),

mxModel("DZ", mxData( observed=myDataDZ, type="raw" ),
mxMatrix( type="Full", nrow=1, ncol=3, free=T, c(MeanDZ,MeanDZ,MeanSIBDZ), labels=c("b0_dz1","b0_dz2","b0_dzsib"), name="Intercepts"),
mxMatrix( type="Full", nrow=1, ncol=2, free=T, values= 0, labels=c("betaAge","betaSex"), name="beta"),
mxMatrix( type="Full", nrow=2, ncol=3, free=F, labels=c("data.ageT1DZ","data.sexT1DZ","data.ageT2DZ","data.sexT2DZ","data.ageSIBDZ","data.sexSIBDZ"), name="DZDefVars"),
mxAlgebra( expression=Intercepts + beta %*% DZDefVars, name="expMeanDZ"),
mxExpectationNormal( covariance="twinACE.expCovDZ", means="expMeanDZ", dimnames=selVars ),
mxFitFunctionML()),

mxFitFunctionMultigroup( c("MZ.fitfunction", "DZ.fitfunction"))
)
twinACEFit<-mxTryHard(twinACE)

Replied on Wed, 10/17/2018 - 11:26
Picture of user. AdminRobK Joined: 01/24/2014

Your script looks reasonable to me.

Without any additional assumptions about the data, is it acceptable to treat the non-twin siblings of twins as an additional DZ twin? Or should common environmental effects be separated into sibling-specific effects and twin-specific effects?

Well, you could additionally try a model that includes a twin-specific variance component--say, _T_--such that the total phenotypic variance is `A+C+T+E`, the MZ covariance is `A+C+T`, the DZ covariance is `0.5%x%A+C+T`, and the non-twin sib covariance is `0.5%x%A+C`. Then, compare the fit of the ACTE and ACE models.

There is also an issue that not all twin pairs have a non-twin sibling. How much missing data could be allowed in such an analysis?

You're going to be fitting the model via full-information maximum likelihood (FIML), so as long as there are enough datapoints to identify all of the free parameters in the model being fitted, the MxModel should run.

Also, can variables used in the heritability analysis be z-scored or should they be used in their original scale?

Leave them in their original scale.

Replied on Fri, 10/19/2018 - 01:18
No user picture. Aurina Joined: 01/05/2016

In reply to by AdminRobK

Thank you again for your help! In addition to the previous questions, I have a question regarding sample size.
In the dataset that I'm using, there are:
117 pairs of MZ twins, 90 of those pairs have one additional sibling, 18 of those pairs also have a 2nd additional sibling;
60 pairs of DZ twins, 54 of those pairs have one additional sibling, 12 of those pairs also have a 2nd additional sibling;

Given those numbers, is it a good idea to include 2nd additional siblings into the heritability analysis? Or the number of 2nd additional siblings is too low?
Also, when calculating covariances for pairs of twins and siblings, there will be a different number of datapoints in each case (more for twin1 vs twin2 and less for twin1/2 vs sibling). Is it acceptable to calculate covariance using use="pairwise.complete.obs" option, so we wouldn't need to exclude subjects that are missing an additional sibling from the analysis?

Thank you very much!

Replied on Wed, 10/31/2018 - 10:33
Picture of user. AdminRobK Joined: 01/24/2014

In reply to by Aurina

Given those numbers, is it a good idea to include 2nd additional siblings into the heritability analysis? Or the number of 2nd additional siblings is too low?

I don't see any harm in including them. You won't need to introduce any new parameters to model them (except maybe for regression intercepts), and having more data is better than having fewer data.

Also, when calculating covariances for pairs of twins and siblings, there will be a different number of datapoints in each case (more for twin1 vs twin2 and less for twin1/2 vs sibling). Is it acceptable to calculate covariance using use="pairwise.complete.obs" option, so we wouldn't need to exclude subjects that are missing an additional sibling from the analysis?

You're only using the `cov()` function to calculate start values, so I doubt it will make much difference either way.