Thank you.
I did try:
mzData <- subset(TwD_DM, zyg.1=1, useVars)
bu the error message still occur.
So I did what you have kindly recommended.
TwD_DM <- reshape(D_DM, idvar = c("PAIRID", "ZYG"), timevar = "TVAB", direction = "wide")
and I think the data is now formatted correctly.
> str(TwD_DM)
'data.frame': 43565 obs. of 8 variables:
$ PAIRID: num 21 22 23 24 26 29 31 33 34 35 ...
$ ZYG : num 1 4 2 2 4 1 4 4 4 1 ...
$ DM.1 : num 0 0 NA 0 NA 0 NA 0 NA 0 ...
$ D.1 : num 0 0 NA 1 NA 0 NA 0 NA 0 ...
$ Age.1 : num 73 74 NA 74 NA 74 NA NA NA NA ...
$ DM.2 : num 0 NA 0 0 0 0 0 0 0 0 ...
$ D.2 : num 0 NA 1 0 0 0 0 0 0 0 ...
$ Age.2 : num 73 NA 73 74 74 74 NA NA NA NA ...
- attr(*, "reshapeWide")=List of 5
..$ v.names: NULL
..$ timevar: chr "TVAB"
..$ idvar : chr "PAIRID" "ZYG"
..$ times : num 1 2
..$ varying: chr [1:3, 1:2] "DM.1" "D.1" "Age.1" "DM.2" ...
But I still cannot select the subset of MZ:
> vars <-c('DM','D')
> selVars <-c('DM1', 'D1', 'DM2', 'D2')
> useVars <-c('DM1', 'D1', 'age1', 'DM2', 'D2', 'age2')
> mzData <- subset(TwD_DM, zyg=1, useVars)
Error in subset.data.frame(TwD_DM, ZYG = 1, selVars) :
'subset' must be logical
Yet, if I remove "SelVars" from the command and change ZYG<2, the correct subset is selected.
> mzData <- subset(TwDEP_DM, ZYG<2)
> str(mzData)
'data.frame': 12363 obs. of 8 variables:
$ PAIRID: num 21 29 35 38 43 45 232 240 248 255 ...
$ ZYG : num 1 1 1 1 1 1 1 1 1 1 ...
$ DM.1 : num 0 0 0 0 NA 0 0 0 0 0 ...
$ D.1 : num 0 0 0 1 NA 0 0 0 0 0 ...
$ Age.1 : num 73 74 NA NA NA NA 72 74 74 73 ...
$ DM.2 : num 0 0 0 0 0 0 0 0 NA NA ...
$ D.2 : num 0 0 0 0 0 0 0 0 NA NA ...
$ Age.2 : num 73 74 NA NA NA NA 72 74 NA NA ...
Otherwise,
> mzData <- subset(TwD_DM, ZYG<2, useVars)
Error in [.data.frame(x, r, vars, drop = drop) :
undefined columns selected
I do not know if there is something wrong with the way I label useVars as they are categorical data.
Sorry for being so slow with this and thank you for your help.
Carol
The double equals-sign, ==, asks 'is zyg.1 equal to 1?' and returns a logical TRUE or FALSE accordingly, which is what the subset() function needs. In R, a single equals-sign, =, does assignment (like <-) or sets the value of function arguments.
, but consider what the actual column names of your dataframe are:
> str(TwD_DM)
'data.frame': 43565 obs. of 8 variables:
$ PAIRID: num 21 22 23 24 26 29 31 33 34 35 ...
$ ZYG : num 1 4 2 2 4 1 4 4 4 1 ...
$ DM.1 : num 0 0 NA 0 NA 0 NA 0 NA 0 ...
$ D.1 : num 0 0 NA 1 NA 0 NA 0 NA 0 ...
$ Age.1 : num 73 74 NA 74 NA 74 NA NA NA NA ...
$ DM.2 : num 0 NA 0 0 0 0 0 0 0 0 ...
$ D.2 : num 0 NA 1 0 0 0 0 0 0 0 ...
$ Age.2 : num 73 NA 73 74 74 74 NA NA NA NA ...
There are periods in the column (variable) names, and also, the case of "Age" versus "age" doesn't match (R is case-sensitive). I assume you're planning to use these data in an OpenMx analysis, so be advised that OpenMx disallows periods in column names. The simplest thing to do is probably add this line to your syntax,
After you use reshape to make your twin data wide:
reshape create two variables for zygosity "zyg.1" and "zyg.2"
Then the old subset code doesn't work, of course:
edit: there were multiple things going on here... look at Rob's email below, not this....
You could just change zyg == 1 to the new name (note, as Rob points out below, you want a logical comparison "==" not assignment "="
Alternatively, use the "v.names" parameter of reshape to exclude "zyg" from the list of things reshape thinks should be copied wide.
Thank you.
I did try:
mzData <- subset(TwD_DM, zyg.1=1, useVars)
bu the error message still occur.
So I did what you have kindly recommended.
TwD_DM <- reshape(D_DM, idvar = c("PAIRID", "ZYG"), timevar = "TVAB", direction = "wide")
and I think the data is now formatted correctly.
> str(TwD_DM)
'data.frame': 43565 obs. of 8 variables:
$ PAIRID: num 21 22 23 24 26 29 31 33 34 35 ...
$ ZYG : num 1 4 2 2 4 1 4 4 4 1 ...
$ DM.1 : num 0 0 NA 0 NA 0 NA 0 NA 0 ...
$ D.1 : num 0 0 NA 1 NA 0 NA 0 NA 0 ...
$ Age.1 : num 73 74 NA 74 NA 74 NA NA NA NA ...
$ DM.2 : num 0 NA 0 0 0 0 0 0 0 0 ...
$ D.2 : num 0 NA 1 0 0 0 0 0 0 0 ...
$ Age.2 : num 73 NA 73 74 74 74 NA NA NA NA ...
- attr(*, "reshapeWide")=List of 5
..$ v.names: NULL
..$ timevar: chr "TVAB"
..$ idvar : chr "PAIRID" "ZYG"
..$ times : num 1 2
..$ varying: chr [1:3, 1:2] "DM.1" "D.1" "Age.1" "DM.2" ...
But I still cannot select the subset of MZ:
> vars <-c('DM','D')
> selVars <-c('DM1', 'D1', 'DM2', 'D2')
> useVars <-c('DM1', 'D1', 'age1', 'DM2', 'D2', 'age2')
> mzData <- subset(TwD_DM, zyg=1, useVars)
Error in subset.data.frame(TwD_DM, ZYG = 1, selVars) :
'subset' must be logical
Yet, if I remove "SelVars" from the command and change ZYG<2, the correct subset is selected.
> mzData <- subset(TwDEP_DM, ZYG<2)
> str(mzData)
'data.frame': 12363 obs. of 8 variables:
$ PAIRID: num 21 29 35 38 43 45 232 240 248 255 ...
$ ZYG : num 1 1 1 1 1 1 1 1 1 1 ...
$ DM.1 : num 0 0 0 0 NA 0 0 0 0 0 ...
$ D.1 : num 0 0 0 1 NA 0 0 0 0 0 ...
$ Age.1 : num 73 74 NA NA NA NA 72 74 74 73 ...
$ DM.2 : num 0 0 0 0 0 0 0 0 NA NA ...
$ D.2 : num 0 0 0 0 0 0 0 0 NA NA ...
$ Age.2 : num 73 74 NA NA NA NA 72 74 NA NA ...
Otherwise,
> mzData <- subset(TwD_DM, ZYG<2, useVars)
Error in
[.data.frame
(x, r, vars, drop = drop) :undefined columns selected
I do not know if there is something wrong with the way I label useVars as they are categorical data.
Sorry for being so slow with this and thank you for your help.
Carol
Instead of
you would want to use
The double equals-sign,
==
, asks 'iszyg.1
equal to 1?' and returns a logicalTRUE
orFALSE
accordingly, which is what thesubset()
function needs. In R, a single equals-sign,=
, does assignment (like<-
) or sets the value of function arguments.I did try that ZYG==1 but then I received the following error message. So I thought that it may be something wrong in the way I defined useVars
> mzData <- subset(TwD_DM, ZYG==1, useVars)
Error in
[.data.frame
(x, r, vars, drop = drop) :undefined columns selected
Based on what you've posted, it appears that you are correct that it's a problem with the definition of
useVars
. You've defineduseVars
as, but consider what the actual column names of your dataframe are:
There are periods in the column (variable) names, and also, the case of "Age" versus "age" doesn't match (R is case-sensitive). I assume you're planning to use these data in an OpenMx analysis, so be advised that OpenMx disallows periods in column names. The simplest thing to do is probably add this line to your syntax,
probably after the
reshape()
command.Thank you for the suggestion in renaming the variables. It finally works!
Thank you for helping me. It is much appreciated.
Carol