You are here

Q

7 posts / 0 new
Last post
CarolKan's picture
Offline
Joined: 04/17/2014 - 11:12
Q

Q

tbates's picture
Offline
Joined: 07/31/2009 - 14:25
zygosity renamed by reshape

After you use reshape to make your twin data wide:

TwD_DM <- reshape(D_DM, idvar = "PAIRID", timevar = "TVAB", direction ="wide")

reshape create two variables for zygosity "zyg.1" and "zyg.2"

Then the old subset code doesn't work, of course:

vars <-c('DM','D')
selVars <-c('DM1', 'D1', 'DM2', 'D2')
useVars <-c('DM1', 'D1', 'age1', 'DM2', 'D2', 'age2')
mzData <- subset(TwD_DM, zyg=1, useVars)

edit: there were multiple things going on here... look at Rob's email below, not this....

You could just change zyg == 1 to the new name (note, as Rob points out below, you want a logical comparison "==" not assignment "="

mzData <- subset(TwD_DM, zyg.1 == 1, useVars)

Alternatively, use the "v.names" parameter of reshape to exclude "zyg" from the list of things reshape thinks should be copied wide.

CarolKan's picture
Offline
Joined: 04/17/2014 - 11:12
Thank you. I did try: mzData

Thank you.
I did try:
mzData <- subset(TwD_DM, zyg.1=1, useVars)
bu the error message still occur.

So I did what you have kindly recommended.
TwD_DM <- reshape(D_DM, idvar = c("PAIRID", "ZYG"), timevar = "TVAB", direction = "wide")
and I think the data is now formatted correctly.

> str(TwD_DM)
'data.frame': 43565 obs. of 8 variables:
$ PAIRID: num 21 22 23 24 26 29 31 33 34 35 ...
$ ZYG : num 1 4 2 2 4 1 4 4 4 1 ...
$ DM.1 : num 0 0 NA 0 NA 0 NA 0 NA 0 ...
$ D.1 : num 0 0 NA 1 NA 0 NA 0 NA 0 ...
$ Age.1 : num 73 74 NA 74 NA 74 NA NA NA NA ...
$ DM.2 : num 0 NA 0 0 0 0 0 0 0 0 ...
$ D.2 : num 0 NA 1 0 0 0 0 0 0 0 ...
$ Age.2 : num 73 NA 73 74 74 74 NA NA NA NA ...
- attr(*, "reshapeWide")=List of 5
..$ v.names: NULL
..$ timevar: chr "TVAB"
..$ idvar : chr "PAIRID" "ZYG"
..$ times : num 1 2
..$ varying: chr [1:3, 1:2] "DM.1" "D.1" "Age.1" "DM.2" ...

But I still cannot select the subset of MZ:
> vars <-c('DM','D')
> selVars <-c('DM1', 'D1', 'DM2', 'D2')
> useVars <-c('DM1', 'D1', 'age1', 'DM2', 'D2', 'age2')
> mzData <- subset(TwD_DM, zyg=1, useVars)
Error in subset.data.frame(TwD_DM, ZYG = 1, selVars) :
'subset' must be logical

Yet, if I remove "SelVars" from the command and change ZYG<2, the correct subset is selected.
> mzData <- subset(TwDEP_DM, ZYG<2)
> str(mzData)
'data.frame': 12363 obs. of 8 variables:
$ PAIRID: num 21 29 35 38 43 45 232 240 248 255 ...
$ ZYG : num 1 1 1 1 1 1 1 1 1 1 ...
$ DM.1 : num 0 0 0 0 NA 0 0 0 0 0 ...
$ D.1 : num 0 0 0 1 NA 0 0 0 0 0 ...
$ Age.1 : num 73 74 NA NA NA NA 72 74 74 73 ...
$ DM.2 : num 0 0 0 0 0 0 0 0 NA NA ...
$ D.2 : num 0 0 0 0 0 0 0 0 NA NA ...
$ Age.2 : num 73 74 NA NA NA NA 72 74 NA NA ...

Otherwise,
> mzData <- subset(TwD_DM, ZYG<2, useVars)
Error in [.data.frame(x, r, vars, drop = drop) :
undefined columns selected

I do not know if there is something wrong with the way I label useVars as they are categorical data.

Sorry for being so slow with this and thank you for your help.
Carol

RobK's picture
Offline
Joined: 04/19/2011 - 21:00
Need to use ==

Instead of

mzData <- subset(TwD_DM, zyg.1=1, useVars)

you would want to use

mzData <- subset(TwD_DM, zyg.1==1, useVars)

The double equals-sign, ==, asks 'is zyg.1 equal to 1?' and returns a logical TRUE or FALSE accordingly, which is what the subset() function needs. In R, a single equals-sign, =, does assignment (like  <- ) or sets the value of function arguments.

CarolKan's picture
Offline
Joined: 04/17/2014 - 11:12
Error in `[.data.frame`(x, r, vars, drop = drop)

I did try that ZYG==1 but then I received the following error message. So I thought that it may be something wrong in the way I defined useVars

> mzData <- subset(TwD_DM, ZYG==1, useVars)
Error in [.data.frame(x, r, vars, drop = drop) :
undefined columns selected

RobK's picture
Offline
Joined: 04/19/2011 - 21:00
Column (variable) name mismatch

Based on what you've posted, it appears that you are correct that it's a problem with the definition of useVars. You've defined useVars as

useVars <-c('DM1', 'D1', 'age1', 'DM2', 'D2', 'age2')

, but consider what the actual column names of your dataframe are:

> str(TwD_DM)
'data.frame': 43565 obs. of 8 variables:
$ PAIRID: num 21 22 23 24 26 29 31 33 34 35 ...
$ ZYG : num 1 4 2 2 4 1 4 4 4 1 ...
$ DM.1 : num 0 0 NA 0 NA 0 NA 0 NA 0 ...
$ D.1 : num 0 0 NA 1 NA 0 NA 0 NA 0 ...
$ Age.1 : num 73 74 NA 74 NA 74 NA NA NA NA ...
$ DM.2 : num 0 NA 0 0 0 0 0 0 0 0 ...
$ D.2 : num 0 NA 1 0 0 0 0 0 0 0 ...
$ Age.2 : num 73 NA 73 74 74 74 NA NA NA NA ...

There are periods in the column (variable) names, and also, the case of "Age" versus "age" doesn't match (R is case-sensitive). I assume you're planning to use these data in an OpenMx analysis, so be advised that OpenMx disallows periods in column names. The simplest thing to do is probably add this line to your syntax,

colnames(TwD_DM) <- c('PAIRID', 'ZYG', 'DM1', 'D1', 'age1','DM2', 'D2', 'age2')

probably after the reshape() command.

CarolKan's picture
Offline
Joined: 04/17/2014 - 11:12
Thank you!

Thank you for the suggestion in renaming the variables. It finally works!
Thank you for helping me. It is much appreciated.
Carol