Dear all,

I'm trying to adapt ACE heritability model to include both twins and their non-twin siblings. The data used for the analysis are imaging-derived phenotypes from the human connectome project (HCP) which includes both MZ and DZ twin pairs as well as their non-twin siblings for a subset of twins. Would be really helpful if anyone could answer a few questions.

Without any additional assumptions about the data, is it acceptable to treat the non-twin siblings of twins as an additional DZ twin? Or should common environmental effects be separated into sibling-specific effects and twin-specific effects? There is also an issue that not all twin pairs have a non-twin sibling. How much missing data could be allowed in such an analysis?

Below is my attempt to modify the ACE model to include non-twin siblings. Could you please advise me if this modification including the use of covariates and matrix specification is correct (in case we're treating non-twin siblings as an additional DZ twin)? Also, can variables used in the heritability analysis be z-scored or should they be used in their original scale?

Thanks a lof for your help!

data_covar = readMat("TESTcovar.mat") data = readMat("TESTdata.mat") # mark 0 values as NA because these will be the outliers in the data - we want to exclude them. data$Output.DZ[data$Output.DZ == 0] <- NA data$Output.MZ[data$Output.MZ == 0] <- NA edge=1; # each connection (phenotype of interest) is denoted as edge - here just selecting first to do the analysis myDataMZ<-data.frame(data$Output.MZ[,,edge], data_covar$MZ.age[,1], data_covar$MZ.age[,2],data_covar$MZ.age[,3],data_covar$MZ.sex[,1],data_covar$MZ.sex[,2],data_covar$MZ.sex[,3]) myDataDZ<-data.frame(data$Output.DZ[,,edge], data_covar$DZ.age[,1], data_covar$DZ.age[,2],data_covar$DZ.age[,3],data_covar$DZ.sex[,1],data_covar$DZ.sex[,2],data_covar$DZ.sex[,3]) myDataMZ_measure<-data$Output.MZ[,,edge] myDataDZ_measure<-data$Output.DZ[,,edge] colnames(myDataMZ) <- c('twin1', 'twin2', 'sib','ageT1MZ', 'ageT2MZ', 'ageSIBMZ', 'sexT1MZ', 'sexT2MZ', 'sexSIBMZ') colnames(myDataDZ) <- c('twin1', 'twin2', 'sib','ageT1DZ', 'ageT2DZ', 'ageSIBDZ', 'sexT1DZ', 'sexT2DZ', 'sexSIBDZ') selVars <- c('twin1','twin2', 'sib') # "complete" specifies use only cases with data in all columns; # Could this be computed if some twin pairs are missing a non-twin sibling? CovMZ = cov(data$Output.MZ[,,edge],use="complete") CovDZ = cov(data$Output.DZ[,,edge],use="complete") # mean across MZ twins MeanMZ = colMeans(data$Output.MZ[,1:2,edge],na.rm=TRUE) MeanMZ = mean(MeanMZ) # mean across DZ twins MeanDZ = colMeans(data$Output.DZ[,1:2,edge],na.rm=TRUE) MeanDZ = mean(MeanDZ) # mean across siblings MeanSIBMZ = mean(data$Output.MZ[,3,edge],na.rm=TRUE) MeanSIBDZ = mean(data$Output.DZ[,3,edge],na.rm=TRUE) # ----------------------------------------------------------------------- #Fit ACE Model with RawData and Matrices Input # ----------------------------------------------------------------------- twinACE <- mxModel("twinACE", # Matrices X, Y, and Z to store a, c, and e path coefficients mxMatrix( type="Full", nrow=1, ncol=1, free=TRUE, values=sqrt((CovDZ[2,2]/3)), label="a", name="X" ), mxMatrix( type="Full", nrow=1, ncol=1, free=TRUE, values=sqrt((CovDZ[2,2]/3)), label="c", name="Y" ), mxMatrix( type="Full", nrow=1, ncol=1, free=TRUE, values=sqrt((CovDZ[2,2]/3)), label="e", name="Z" ), # Matrices A, C, and E compute variance components mxAlgebra( expression=X %*% t(X), name="A" ), mxAlgebra( expression=Y %*% t(Y), name="C" ), mxAlgebra( expression=Z %*% t(Z), name="E" ), # Algebra for expected variance/covariance matrix in MZ+sibling mxAlgebra( expression= rbind (cbind(A+C+E , A+C, 0.5%x%A+C), cbind(A+C , A+C+E, 0.5%x%A+C), cbind(0.5%x%A+C, 0.5%x%A+C, A+C+E)), name="expCovMZ"), # Algebra for expected variance/covariance matrix in DZ+siblig mxAlgebra( expression= rbind (cbind(A+C+E , 0.5%x%A+C, 0.5%x%A+C), cbind(0.5%x%A+C , A+C+E, 0.5%x%A+C), cbind(0.5%x%A+C , 0.5%x%A+C, A+C+E)), name="expCovDZ"), mxModel("MZ", mxData( observed=myDataMZ, type="raw" ), # Algebra for making the means a function of the definition variables age and sex mxMatrix( type="Full", nrow=1, ncol=3, free=T, c(MeanMZ,MeanMZ,MeanSIBMZ), labels =c("b0_mz1","b0_mz2","b0_mzsib"), name="Intercepts"), mxMatrix( type="Full", nrow=1, ncol=2, free=T, values= 0, labels=c("betaAge","betaSex"), name="beta"), mxMatrix( type="Full", nrow=2, ncol=3, free=F, labels=c("data.ageT1MZ","data.sexT1MZ","data.ageT2MZ","data.sexT2MZ","data.ageSIBMZ","data.sexSIBMZ"), name="MZDefVars"), mxAlgebra( expression=Intercepts + beta %*% MZDefVars, name="expMeanMZ"), mxExpectationNormal( covariance="twinACE.expCovMZ", means="expMeanMZ", dimnames=selVars ), mxFitFunctionML()), mxModel("DZ", mxData( observed=myDataDZ, type="raw" ), mxMatrix( type="Full", nrow=1, ncol=3, free=T, c(MeanDZ,MeanDZ,MeanSIBDZ), labels=c("b0_dz1","b0_dz2","b0_dzsib"), name="Intercepts"), mxMatrix( type="Full", nrow=1, ncol=2, free=T, values= 0, labels=c("betaAge","betaSex"), name="beta"), mxMatrix( type="Full", nrow=2, ncol=3, free=F, labels=c("data.ageT1DZ","data.sexT1DZ","data.ageT2DZ","data.sexT2DZ","data.ageSIBDZ","data.sexSIBDZ"), name="DZDefVars"), mxAlgebra( expression=Intercepts + beta %*% DZDefVars, name="expMeanDZ"), mxExpectationNormal( covariance="twinACE.expCovDZ", means="expMeanDZ", dimnames=selVars ), mxFitFunctionML()), mxFitFunctionMultigroup( c("MZ.fitfunction", "DZ.fitfunction")) ) twinACEFit<-mxTryHard(twinACE)

Your script looks reasonable to me.

Well, you could additionally try a model that includes a twin-specific variance component--say,

T--such that the total phenotypic variance is`A+C+T+E`

, the MZ covariance is`A+C+T`

, the DZ covariance is`0.5%x%A+C+T`

, and the non-twin sib covariance is`0.5%x%A+C`

. Then, compare the fit of the ACTE and ACE models.You're going to be fitting the model via full-information maximum likelihood (FIML), so as long as there are enough datapoints to identify all of the free parameters in the model being fitted, the MxModel should run.

Leave them in their original scale.

Thank you very much.

This was really helpful!

Thank you again for your help! In addition to the previous questions, I have a question regarding sample size.

In the dataset that I'm using, there are:

117 pairs of MZ twins, 90 of those pairs have one additional sibling, 18 of those pairs also have a 2nd additional sibling;

60 pairs of DZ twins, 54 of those pairs have one additional sibling, 12 of those pairs also have a 2nd additional sibling;

Given those numbers, is it a good idea to include 2nd additional siblings into the heritability analysis? Or the number of 2nd additional siblings is too low?

Also, when calculating covariances for pairs of twins and siblings, there will be a different number of datapoints in each case (more for twin1 vs twin2 and less for twin1/2 vs sibling). Is it acceptable to calculate covariance using use="pairwise.complete.obs" option, so we wouldn't need to exclude subjects that are missing an additional sibling from the analysis?

Thank you very much!

I don't see any harm in including them. You won't need to introduce any new parameters to model them (except maybe for regression intercepts), and having more data is better than having fewer data.

You're only using the

`cov()`

function to calculate start values, so I doubt it will make much difference either way.