You are here

Generating fake data

3 posts / 0 new
Last post
ebejer's picture
Offline
Joined: 03/18/2010 - 19:47
Generating fake data
AttachmentSize
Binary Data repData.R2.87 KB

Hi,

I have some data that I want to replicate (to maintain realistic relationships between the variables). I have copied the script that Ryne posted for everyone's convenience, and I was able to generate one data set. However the data I want does contain factors (unordered?), so it seems to me I need to use the second half of the script to retain the information that they have.

So, I have been trying to run that portion of the script, which I was able to do without errors, though it's not actually working (no data is generated). Does anyone have some suggestions about how I should modify the script (attached) to replicate the information I'm after?

Best wishes
j

Ryne's picture
Offline
Joined: 07/31/2009 - 15:12
The issue I see is that you

The issue I see is that you use the variable 'row' to define how many observations to pull. However, you deleted the line of code that defined 'row' as the number of rows in your data, so 'row' gets treated as numeric(0). Thus, you generate zero rows of data in the rmvnorm call, and fakefac contains a data.frame with zero rows.

More generally, I think fakeData needs a rewrite. I've received a few too many error reports with ordinal data, generally finding that hetcor says a lot of existing datasets have non-positive definite correlation matrices when the original data runs fine. If anyone has any feature suggestions for fakeData 1.1/2.0, let me know.

tbates's picture
Offline
Joined: 07/31/2009 - 14:25
mxGenerateData()

The function to use here is mxGenerateData.

This can accept either a model or a data frame as input, handles missing data as you choose, and returns a new (simulated) data set based on the model-implied distribution from your model or dataframe.