reshape create two variables for zygosity "zyg.1" and "zyg.2"
Then the old subset code doesn't work, of course:
vars <-c('DM','D')
selVars <-c('DM1', 'D1', 'DM2', 'D2')
useVars <-c('DM1', 'D1', 'age1', 'DM2', 'D2', 'age2')
mzData <- subset(TwD_DM, zyg=1, useVars)
edit: there were multiple things going on here... look at Rob's email below, not this....
You could just change zyg == 1 to the new name (note, as Rob points out below, you want a logical comparison "==" not assignment "="
mzData <- subset(TwD_DM, zyg.1 == 1, useVars)
Alternatively, use the "v.names" parameter of reshape to exclude "zyg" from the list of things reshape thinks should be copied wide.
Thank you.
I did try:
mzData <- subset(TwD_DM, zyg.1=1, useVars)
bu the error message still occur.
So I did what you have kindly recommended.
TwD_DM <- reshape(D_DM, idvar = c("PAIRID", "ZYG"), timevar = "TVAB", direction = "wide")
and I think the data is now formatted correctly.
> str(TwD_DM)
'data.frame': 43565 obs. of 8 variables:
$ PAIRID: num 21 22 23 24 26 29 31 33 34 35 ...
$ ZYG : num 1 4 2 2 4 1 4 4 4 1 ...
$ DM.1 : num 0 0 NA 0 NA 0 NA 0 NA 0 ...
$ D.1 : num 0 0 NA 1 NA 0 NA 0 NA 0 ...
$ Age.1 : num 73 74 NA 74 NA 74 NA NA NA NA ...
$ DM.2 : num 0 NA 0 0 0 0 0 0 0 0 ...
$ D.2 : num 0 NA 1 0 0 0 0 0 0 0 ...
$ Age.2 : num 73 NA 73 74 74 74 NA NA NA NA ...
- attr(*, "reshapeWide")=List of 5
..$ v.names: NULL
..$ timevar: chr "TVAB"
..$ idvar : chr "PAIRID" "ZYG"
..$ times : num 1 2
..$ varying: chr [1:3, 1:2] "DM.1" "D.1" "Age.1" "DM.2" ...
But I still cannot select the subset of MZ:
> vars <-c('DM','D')
> selVars <-c('DM1', 'D1', 'DM2', 'D2')
> useVars <-c('DM1', 'D1', 'age1', 'DM2', 'D2', 'age2')
> mzData <- subset(TwD_DM, zyg=1, useVars)
Error in subset.data.frame(TwD_DM, ZYG = 1, selVars) :
'subset' must be logical
Yet, if I remove "SelVars" from the command and change ZYG<2, the correct subset is selected.
> mzData <- subset(TwDEP_DM, ZYG<2)
> str(mzData)
'data.frame': 12363 obs. of 8 variables:
$ PAIRID: num 21 29 35 38 43 45 232 240 248 255 ...
$ ZYG : num 1 1 1 1 1 1 1 1 1 1 ...
$ DM.1 : num 0 0 0 0 NA 0 0 0 0 0 ...
$ D.1 : num 0 0 0 1 NA 0 0 0 0 0 ...
$ Age.1 : num 73 74 NA NA NA NA 72 74 74 73 ...
$ DM.2 : num 0 0 0 0 0 0 0 0 NA NA ...
$ D.2 : num 0 0 0 0 0 0 0 0 NA NA ...
$ Age.2 : num 73 74 NA NA NA NA 72 74 NA NA ...
Otherwise,
> mzData <- subset(TwD_DM, ZYG<2, useVars)
Error in `[.data.frame`(x, r, vars, drop = drop) :
undefined columns selected
I do not know if there is something wrong with the way I label useVars as they are categorical data.
Sorry for being so slow with this and thank you for your help.
Carol
Instead of
mzData <- subset(TwD_DM, zyg.1=1, useVars)
you would want to use
mzData <- subset(TwD_DM, zyg.1==1, useVars)
The double equals-sign, ==, asks 'is zyg.1 equal to 1?' and returns a logical TRUE or FALSE accordingly, which is what the subset() function needs. In R, a single equals-sign, =, does assignment (like <- ) or sets the value of function arguments.
Based on what you've posted, it appears that you are correct that it's a problem with the definition of useVars. You've defined useVars as
useVars <-c('DM1', 'D1', 'age1', 'DM2', 'D2', 'age2') , but consider what the actual column names of your dataframe are:
> str(TwD_DM)
'data.frame': 43565 obs. of 8 variables:
$ PAIRID: num 21 22 23 24 26 29 31 33 34 35 ...
$ ZYG : num 1 4 2 2 4 1 4 4 4 1 ...
$ DM.1 : num 0 0 NA 0 NA 0 NA 0 NA 0 ...
$ D.1 : num 0 0 NA 1 NA 0 NA 0 NA 0 ...
$ Age.1 : num 73 74 NA 74 NA 74 NA NA NA NA ...
$ DM.2 : num 0 NA 0 0 0 0 0 0 0 0 ...
$ D.2 : num 0 NA 1 0 0 0 0 0 0 0 ...
$ Age.2 : num 73 NA 73 74 74 74 NA NA NA NA ...
There are periods in the column (variable) names, and also, the case of "Age" versus "age" doesn't match (R is case-sensitive). I assume you're planning to use these data in an OpenMx analysis, so be advised that OpenMx disallows periods in column names. The simplest thing to do is probably add this line to your syntax,
colnames(TwD_DM) <- c('PAIRID', 'ZYG', 'DM1', 'D1', 'age1','DM2', 'D2', 'age2')
zygosity renamed by reshape
TwD_DM <- reshape(D_DM, idvar = "PAIRID", timevar = "TVAB", direction ="wide")
reshape create two variables for zygosity "zyg.1" and "zyg.2"
Then the old subset code doesn't work, of course:
vars <-c('DM','D')
selVars <-c('DM1', 'D1', 'DM2', 'D2')
useVars <-c('DM1', 'D1', 'age1', 'DM2', 'D2', 'age2')
mzData <- subset(TwD_DM, zyg=1, useVars)
edit: there were multiple things going on here... look at Rob's email below, not this....
You could just change zyg == 1 to the new name (note, as Rob points out below, you want a logical comparison "==" not assignment "="
mzData <- subset(TwD_DM, zyg.1 == 1, useVars)
Alternatively, use the "v.names" parameter of reshape to exclude "zyg" from the list of things reshape thinks should be copied wide.
Log in or register to post comments
In reply to zygosity renamed by reshape by tbates
Thank you. I did try: mzData
I did try:
mzData <- subset(TwD_DM, zyg.1=1, useVars)
bu the error message still occur.
So I did what you have kindly recommended.
TwD_DM <- reshape(D_DM, idvar = c("PAIRID", "ZYG"), timevar = "TVAB", direction = "wide")
and I think the data is now formatted correctly.
> str(TwD_DM)
'data.frame': 43565 obs. of 8 variables:
$ PAIRID: num 21 22 23 24 26 29 31 33 34 35 ...
$ ZYG : num 1 4 2 2 4 1 4 4 4 1 ...
$ DM.1 : num 0 0 NA 0 NA 0 NA 0 NA 0 ...
$ D.1 : num 0 0 NA 1 NA 0 NA 0 NA 0 ...
$ Age.1 : num 73 74 NA 74 NA 74 NA NA NA NA ...
$ DM.2 : num 0 NA 0 0 0 0 0 0 0 0 ...
$ D.2 : num 0 NA 1 0 0 0 0 0 0 0 ...
$ Age.2 : num 73 NA 73 74 74 74 NA NA NA NA ...
- attr(*, "reshapeWide")=List of 5
..$ v.names: NULL
..$ timevar: chr "TVAB"
..$ idvar : chr "PAIRID" "ZYG"
..$ times : num 1 2
..$ varying: chr [1:3, 1:2] "DM.1" "D.1" "Age.1" "DM.2" ...
But I still cannot select the subset of MZ:
> vars <-c('DM','D')
> selVars <-c('DM1', 'D1', 'DM2', 'D2')
> useVars <-c('DM1', 'D1', 'age1', 'DM2', 'D2', 'age2')
> mzData <- subset(TwD_DM, zyg=1, useVars)
Error in subset.data.frame(TwD_DM, ZYG = 1, selVars) :
'subset' must be logical
Yet, if I remove "SelVars" from the command and change ZYG<2, the correct subset is selected.
> mzData <- subset(TwDEP_DM, ZYG<2)
> str(mzData)
'data.frame': 12363 obs. of 8 variables:
$ PAIRID: num 21 29 35 38 43 45 232 240 248 255 ...
$ ZYG : num 1 1 1 1 1 1 1 1 1 1 ...
$ DM.1 : num 0 0 0 0 NA 0 0 0 0 0 ...
$ D.1 : num 0 0 0 1 NA 0 0 0 0 0 ...
$ Age.1 : num 73 74 NA NA NA NA 72 74 74 73 ...
$ DM.2 : num 0 0 0 0 0 0 0 0 NA NA ...
$ D.2 : num 0 0 0 0 0 0 0 0 NA NA ...
$ Age.2 : num 73 74 NA NA NA NA 72 74 NA NA ...
Otherwise,
> mzData <- subset(TwD_DM, ZYG<2, useVars)
Error in `[.data.frame`(x, r, vars, drop = drop) :
undefined columns selected
I do not know if there is something wrong with the way I label useVars as they are categorical data.
Sorry for being so slow with this and thank you for your help.
Carol
Log in or register to post comments
In reply to Thank you. I did try: mzData by CarolKan
Need to use ==
mzData <- subset(TwD_DM, zyg.1=1, useVars)
you would want to use
mzData <- subset(TwD_DM, zyg.1==1, useVars)
The double equals-sign,
==
, asks 'iszyg.1
equal to 1?' and returns a logicalTRUE
orFALSE
accordingly, which is what thesubset()
function needs. In R, a single equals-sign,=
, does assignment (like<-
) or sets the value of function arguments.Log in or register to post comments
In reply to Need to use == by RobK
Error in `[.data.frame`(x, r, vars, drop = drop)
> mzData <- subset(TwD_DM, ZYG==1, useVars)
Error in `[.data.frame`(x, r, vars, drop = drop) :
undefined columns selected
Log in or register to post comments
In reply to Error in `[.data.frame`(x, r, vars, drop = drop) by CarolKan
Column (variable) name mismatch
useVars
. You've defineduseVars
as
, but consider what the actual column names of your dataframe are:useVars <-c('DM1', 'D1', 'age1', 'DM2', 'D2', 'age2')
There are periods in the column (variable) names, and also, the case of "Age" versus "age" doesn't match (R is case-sensitive). I assume you're planning to use these data in an OpenMx analysis, so be advised that OpenMx disallows periods in column names. The simplest thing to do is probably add this line to your syntax,
colnames(TwD_DM) <- c('PAIRID', 'ZYG', 'DM1', 'D1', 'age1','DM2', 'D2', 'age2')
probably after the
reshape()
command.Log in or register to post comments
In reply to Column (variable) name mismatch by RobK
Thank you!
Thank you for helping me. It is much appreciated.
Carol
Log in or register to post comments