Attachment | Size |
---|---|

Figure 1.jpg | 619.6 KB |

Figure 2.png | 46.32 KB |

Hi,

I have encountered several problems in doing bivariate SEM models, which is an important part of my graduation project.The main problems are as follows:

1. Could I put a tri-categorised but not ordinal variable and a binary variable into the bivariate SEM model? e.g. smoking status (0-never smokers,1-former smokers, 2-current smokers) and abdominal obesity (0-WHtR≤0.5, 1-WHtR>0.5). I didn't know whether the non-ordinal variable was accepted, so I converted "smoking status" into 3 binary variables that reflected smoking initiation (0-never smokers, 1-former or current smokers), current smoking (0-never or former smokers, current smokers) and smoking persistence (0-former smokers, 1-current smokers).

2. For current smoking and abdominal obesity (both binary traits), ACE model had lower AIC than AE and CE models, and the difference of model fitting between ACE and AE or CE was statistically significant (P-value<0.05).However, the phenotypic correlation (rp) was only 0.04, common environmental correlation (rc) was 0.102, and the special environmental correlation (re) was -0.143, of which the direction is opposite of rp and rc, I didn't know how to explain this situation.

3. Based on the results above, I wanted to calculate bivariate h2, c2 and e2, but I was not sure about the right calcualtion formula (See Figure1). And if the pathway coefficient such as a11 was not significant (95%CI included zero), was it still necessary to put the estimate of a11 into the formula in Figure1.

4. After adjusting for age (continuous variable) in the bivariate model above, results were different from above. Especially, I found a negative bivariate h2 and e2, and I didn't know how to explain this situation? (See Figure2)

5. On top of that, I wanted to know if saturated model can output tetrachoric correlations？

I'm looking forward to your help, thanks!

Cheeres,

Yaning

No, OpenMx doesn't know what to do with non-ordinal categorical variables.

Interesting idea! For the "smoking persistence" variable, how did you code never-smokers? Were they coded as 0, or as

`NA`

? I'm guessing`NA`

, because otherwise the third variable would be redundant with the second.At any rate, you can represent all of the information in your 3-category variable with only two binary variables. Have you ever heard of dummy coding? You could use nonsmokers as the "reference group", make the first binary variable a dummy for being a former smoker, and the second binary variable a dummy for being a current smoker. Then, nonsmokers would be coded as (0,0), former smokers as (1,0), and current smokers as (0,1). You might want to try that instead.

Are you dichotomizing a continuous variable here?

Could you be more specific about what confuses you here? The shared-environmental correlation of 0.102 indicates that there is only a small degree of overlap in the shared-environmental influences on the two traits. The nonshared-environmental correlation of -0.143 indicates that some nonshared-environmental influences tend to increase one trait and decrease the other.

If it was indeed a free parameter of the model, then yes.

If age (and for that matter, sex) have an effect on the phenotypes, then you shouldn't interpret results you get without adjusting for age (and sex if appropriate). Anyhow, the interpretation of those statistics as "bivariate heritability [etc.]" comes unraveled if any of them are negative (or if any of them exceed 1.0), since they'd not be interpretable as proportions. Nonetheless, they still have an interpretation as the ratio of (say) the genetic covariance to the phenotypic covariance. BTW, if you run

`summary()`

on your model with argument`verbose=TRUE`

, you should be able to see why the confidence limits aren't showing up in the highlighted rows of the table in Figure 2.I'd need to see your script to be able to answer the other questions.

Many thanks for the reply! Here's more information you might need.

1. Thanks for your advice on dummy coding, I've heard of it but didn't know how to run with dummy coding in bivariate SEM models. Could you please help me check or modify the code of bivariate SEM model? P.S. My code is in the attachment.

In my script, "whtr_g" and "smk_cur" represented two binary traits.

BTW, there's some questions remaining in my script, so could you please give some advice on my script? e.g.

I didn't know how to define age[etc] as a covariate in my code.

I didn't know how to add a formula or code to calculate the bivariate h2, c2 or e2.

I dichotomized a continuous variable (Waist-height ratio) into a binary variable( abdominal obesity).The information of waist circumference and height was collected from self-reported questionnaire, tending to concentrate in integers or tenth (such as 60, 65,70 and so on).Thus the distribution of WhtR and WC couldn't be converted to normal distribution by log- or ln- transformation. I chose to dichotomize WHtR because I had no idea whether the non-normally distributed continuous variable could be used in SEM model. Are there any good ideas about the issue?

How can I give a reasonable explanation if "bivariate heritability " was negative but bivariate c2 was positive (or exceed 1.0)?

Thanks for your help!

Yours sincerely,

Yaning

Let's get your phenotypes straightened out before anything else.

Well, you would be analyzing three variables per twin instead of two. But I'm wondering now whether dummy coding is the right way to proceed (it would certainly have some interpretational peculiarities). What are your hypotheses / research questions?

Dichotomizing a continuous variable simply because it has an undesirable distribution is not a good idea. A better choice would be to ordinalize it into more than two ordered categories, and then analyze it as a multi-threshold variable. The choice of how many categories to create, and where the cutoffs (thresholds) between them should be, is always arbitrary to some extent. There are also other things you could do instead. See this previous post of mine for details (note that

`imxRobustSE()`

now works for some multigroup models).Thanks for your reply!

1. My research purpose is to explore the association between smoking and abdominal obesity. And my analysis was based on several steps in the following:

(1) We explored the association between smoking-related indicators and adiposity by conditional logistic regression model. For thin individuals, the logistic regression model of generalized equation estimation was used.

(2) Comparing the results above, we can infer whether the common genetic and environmental factors may influence the association.

(3) We performed bivariate genetic model to analyse the relative role of genetic and environmental factors quantitatively.

What do you think about my analysis? Is it reasonable?

2. Smoking status was tri-categorised in regression analysis, but recoded into 3 bines when using bivariate SEM. I'was also confused whether this was reasonable and comparable with regression analysis resutls. Note that I also put 3 bines into regression analysis later, but it might be difficult to explain the results because we had different reference group with the 3 binary traits.

3. Did you hear about

rank order normalization? Maybe I could have a try to normalize continuous variable with this method?Are you interested in waist-to-height ratio

per se? Or are you simply using it as an indicator of "obesity disease status" (affected vs. unaffected)? If it's the latter, then I suppose you can justify dichotomizing it.In any event, if you transform waist-to-height ratio, you'll want to use the same transformation on it in your regression analyses and in your biometrical-genetics analyses.

The way you turned the tri-category variable into 3 indicator variables of clinically relevant stati with respect to smoking was an interesting idea. Concerning doing it that way versus dummy coding--would you rather report and interpret 3 different analyses of two traits at a time, or one analysis of three traits?