Attachment | Size |
---|---|
OrdinalContModel.R | 17.28 KB |
miFunctions3.R | 8.8 KB |
Hi,
this is my first post here, so I hope I am doing things correctly. I could also have posted this in the Categorical Outcomes forum but since I am working with a twin model I decided to post it here. I attached my script but I will give some additional info below. Unfortunately I cannot share the data.
I am working with a relatively big model (10 traits), with 5 zygosity groups (MZm, DZm, MZf, DZf, and DZo). Within each group, there are twins (either one or both), and for some information one 1 sibling is available. However, to account for the gender of the sib, male and female sibs are put in different columns in the data. So in total, within each group there are 40 columns of data (let's say variables for Twin1, Twin2, SibMale and SibFemale, 10 each). Because of the set up, if information is available for SibMale, then by definition information on SibFemale is missing (since only one sib is included for each twin (pair)).
The order of my variables is: (ordinal, ordinal, rep(continuous, 8)). Both ordinal variables have three categories so two thresholds. I fixed the means and variances of the ordinal variables to 0 and 1 respectively in order to estimate the thresholds and included age as a covariate, separately for the continuous variables and on the thresholds of the ordinal variables.
Note that my age is in centuries, so I divided the age by 100. Also, following another post on this forum, I set missing age values to a huge number in order to keep as much data as possible (i.e., age was only missing when all trait values were missing). I used the 'lower matrix' trick to specify the thresholds as an intercept/threshold and increment. Also, I transformed the variables all to have a variance of around 1 to ease optimization. I am also using the helper functions by Hermine Maes (miFunctions3.R).
My problem is that the thresholds seem to be estimated (see below) but they stay really really close to the starting values
This makes me question whether I can trust my results. I am also noticing that my age effects are quite different from when I do regular regression in R (just with lm) in the different groups but this may be due to the relatively large number of missing values I have (and OpenMx uses FIML). Things I have tried to fix it:
- in this script, I estimate 4 age effects (age on threshold 1 and 2 of ordinal variables 1 and 2). In other scripts I saw that only two age effects were estimated (age on both thresholds together, for ord var 1 and for ord var 2). I did this but the same thing happened
- changed the starting values, again, the values stayed around these new values
Note that the correct amount of parameters are estimated (I think): in each group there are (40*(40-39))/2 = 780 covariances, 8x4 = 32 means (only for the continuous vars) and 8x4 = 32 variances, and (2x2)x4=16 thresholds: 780 + 32 + 32 + 16 = 860 parameters in each group. 860 x 5 = 4300 parameters across groups. In total there are 8 age effects on the continuous variables and 4 age effects on the thresholds. 4300 + 8 + 4 = 4312 estimated parameters in total. Because of the size of the model, it takes forever to run the code, so I run it on a cluster computer, therefore the optimizer is set to CSOLNP.
As a final note, I have checked whether the thresholds conditioned on age seem reasonable. To check this I created a loop with the following code for each row in my data (Mzf in this example): mxEval(expression=threMZf,model=fitSAT2$MZf,compute=T,defvar.row=i). If I then average the estimated thresholds across persons then these averages almost exactly match the Z values for the thresholds as observed in my data (e.g., the first ordinal variable has 36% in the first category in my observed data, the average first predicted threshold (using the above method) is -0.29 which corresponds to pnorm(-0.29)=36%). This seems to imply that the results are correct.
Is there anything wrong with my model or are the results trustworthy? Hope someone can help!
I just saw in my code that I only estimate two age effects for the ordinal variables so there are 4310 estimated parameters in this case. But as noted, the issue with the thresholds estimates is similar.