Q

Posted on
No user picture. CarolKan Joined: 04/17/2014
Q
Replied on Tue, 04/22/2014 - 23:26
Picture of user. tbates Joined: 07/31/2009

After you use reshape to make your twin data wide:


TwD_DM <- reshape(D_DM, idvar = "PAIRID", timevar = "TVAB", direction ="wide")

reshape create two variables for zygosity "zyg.1" and "zyg.2"

Then the old subset code doesn't work, of course:

vars <-c('DM','D')
selVars <-c('DM1', 'D1', 'DM2', 'D2')
useVars <-c('DM1', 'D1', 'age1', 'DM2', 'D2', 'age2')
mzData <- subset(TwD_DM, zyg=1, useVars)

edit: there were multiple things going on here... look at Rob's email below, not this....

You could just change zyg == 1 to the new name (note, as Rob points out below, you want a logical comparison "==" not assignment "="

mzData <- subset(TwD_DM, zyg.1 == 1, useVars)

Alternatively, use the "v.names" parameter of reshape to exclude "zyg" from the list of things reshape thinks should be copied wide.

Replied on Wed, 04/23/2014 - 07:29
No user picture. CarolKan Joined: 04/17/2014

In reply to by tbates

Thank you.
I did try:
mzData <- subset(TwD_DM, zyg.1=1, useVars)
bu the error message still occur.

So I did what you have kindly recommended.
TwD_DM <- reshape(D_DM, idvar = c("PAIRID", "ZYG"), timevar = "TVAB", direction = "wide")
and I think the data is now formatted correctly.

> str(TwD_DM)
'data.frame': 43565 obs. of 8 variables:
$ PAIRID: num 21 22 23 24 26 29 31 33 34 35 ...
$ ZYG : num 1 4 2 2 4 1 4 4 4 1 ...
$ DM.1 : num 0 0 NA 0 NA 0 NA 0 NA 0 ...
$ D.1 : num 0 0 NA 1 NA 0 NA 0 NA 0 ...
$ Age.1 : num 73 74 NA 74 NA 74 NA NA NA NA ...
$ DM.2 : num 0 NA 0 0 0 0 0 0 0 0 ...
$ D.2 : num 0 NA 1 0 0 0 0 0 0 0 ...
$ Age.2 : num 73 NA 73 74 74 74 NA NA NA NA ...
- attr(*, "reshapeWide")=List of 5
..$ v.names: NULL
..$ timevar: chr "TVAB"
..$ idvar : chr "PAIRID" "ZYG"
..$ times : num 1 2
..$ varying: chr [1:3, 1:2] "DM.1" "D.1" "Age.1" "DM.2" ...

But I still cannot select the subset of MZ:
> vars <-c('DM','D')
> selVars <-c('DM1', 'D1', 'DM2', 'D2')
> useVars <-c('DM1', 'D1', 'age1', 'DM2', 'D2', 'age2')
> mzData <- subset(TwD_DM, zyg=1, useVars)
Error in subset.data.frame(TwD_DM, ZYG = 1, selVars) :
'subset' must be logical

Yet, if I remove "SelVars" from the command and change ZYG<2, the correct subset is selected.
> mzData <- subset(TwDEP_DM, ZYG<2)
> str(mzData)
'data.frame': 12363 obs. of 8 variables:
$ PAIRID: num 21 29 35 38 43 45 232 240 248 255 ...
$ ZYG : num 1 1 1 1 1 1 1 1 1 1 ...
$ DM.1 : num 0 0 0 0 NA 0 0 0 0 0 ...
$ D.1 : num 0 0 0 1 NA 0 0 0 0 0 ...
$ Age.1 : num 73 74 NA NA NA NA 72 74 74 73 ...
$ DM.2 : num 0 0 0 0 0 0 0 0 NA NA ...
$ D.2 : num 0 0 0 0 0 0 0 0 NA NA ...
$ Age.2 : num 73 74 NA NA NA NA 72 74 NA NA ...

Otherwise,
> mzData <- subset(TwD_DM, ZYG<2, useVars)
Error in `[.data.frame`(x, r, vars, drop = drop) :
undefined columns selected

I do not know if there is something wrong with the way I label useVars as they are categorical data.

Sorry for being so slow with this and thank you for your help.
Carol

Replied on Wed, 04/23/2014 - 10:51
Picture of user. RobK Joined: 04/19/2011

In reply to by CarolKan

Instead of

mzData <- subset(TwD_DM, zyg.1=1, useVars)

you would want to use

mzData <- subset(TwD_DM, zyg.1==1, useVars)

The double equals-sign, ==, asks 'is zyg.1 equal to 1?' and returns a logical TRUE or FALSE accordingly, which is what the subset() function needs. In R, a single equals-sign, =, does assignment (like <- ) or sets the value of function arguments.

Replied on Wed, 04/23/2014 - 11:13
No user picture. CarolKan Joined: 04/17/2014

In reply to by RobK

I did try that ZYG==1 but then I received the following error message. So I thought that it may be something wrong in the way I defined useVars

> mzData <- subset(TwD_DM, ZYG==1, useVars)
Error in `[.data.frame`(x, r, vars, drop = drop) :
undefined columns selected

Replied on Wed, 04/23/2014 - 11:57
Picture of user. RobK Joined: 04/19/2011

In reply to by CarolKan

Based on what you've posted, it appears that you are correct that it's a problem with the definition of useVars. You've defined useVars as

useVars <-c('DM1', 'D1', 'age1', 'DM2', 'D2', 'age2')
, but consider what the actual column names of your dataframe are:

> str(TwD_DM)
'data.frame': 43565 obs. of 8 variables:
$ PAIRID: num 21 22 23 24 26 29 31 33 34 35 ...
$ ZYG : num 1 4 2 2 4 1 4 4 4 1 ...
$ DM.1 : num 0 0 NA 0 NA 0 NA 0 NA 0 ...
$ D.1 : num 0 0 NA 1 NA 0 NA 0 NA 0 ...
$ Age.1 : num 73 74 NA 74 NA 74 NA NA NA NA ...
$ DM.2 : num 0 NA 0 0 0 0 0 0 0 0 ...
$ D.2 : num 0 NA 1 0 0 0 0 0 0 0 ...
$ Age.2 : num 73 NA 73 74 74 74 NA NA NA NA ...

There are periods in the column (variable) names, and also, the case of "Age" versus "age" doesn't match (R is case-sensitive). I assume you're planning to use these data in an OpenMx analysis, so be advised that OpenMx disallows periods in column names. The simplest thing to do is probably add this line to your syntax,

colnames(TwD_DM) <- c('PAIRID', 'ZYG', 'DM1', 'D1', 'age1','DM2', 'D2', 'age2')

probably after the reshape() command.