numObs in Multiple Group Models

Posted on
No user picture. rlucas Joined: 11/17/2009
I'm running a multiple-group model with 17 groups (using raw data). However, the number of observations that is reported is wrong. Instead of reporting the total N across all groups, it reports N X the number of groups. In other words, the full sample size across all groups is 20814, but summary reports numObs as 353838 (which is 20814*17). When I ignore the groups and run the model on the full sample, I get the correct number of observations (20814), but if I run it as 17 groups (with each group being a subset of the 20814), I get 353838 observations.

I am pretty sure that I am not specifying the model incorrectly and inadvertently using the full sample for all groups. The results match my mPlus output, and the descriptive statistics and estimates do vary across groups. Is this a bug or am I missing something about why this should be reported this way (or could I be doing something wrong to cause it to report it this way)?

Replied on Sun, 02/27/2011 - 13:31
Picture of user. mspiegel Joined: 07/31/2009

Here are a couple of suggestions. They may or may not be relevant to your script, apologies if they fall into the second category.

(1) If you have several submodels in your script, and each submodel contains an identical copy of the data set, then consider placing the data set in the parent model. In other words, if you script looks like this:

model <- mxModel('parent',
    mxModel('submodel1', mxData(theData, type = 'raw'), ...),
    mxModel('submodel2', mxData(theData, type = 'raw'), ...),
    mxModel('submodel3', mxData(theData, type = 'raw'), ...))

Replace it with:

model <- mxModel('parent', mxData(theData, type = 'raw')
    mxModel('submodel1', ...), # do not put any data here
    mxModel('submodel2', ...), # do not put any data here
    mxModel('submodel3', ...)) # do not put any data here

(2) You can use the "numObs=" argument to the summary() function to explicitly specify the number of observations. See ?summary. Although we would like more information to determine why there is a mismatch in the computed number of observations.

Replied on Sun, 02/27/2011 - 16:24
No user picture. rlucas Joined: 11/17/2009

In reply to by mspiegel

Ah, I figured it out. I had used a snippet of code from a script you had suggested to me a while ago for running multiple submodels in parallel (I realize that my multigroup model cannot be run in parallel, but the function for creating submodels was useful). That code creates a model template for the submodels, and then uses a function (along with lapply and an index) to create them. In the new template I created, I had included the original full data file, assuming that I would just replace it in the function. This is indeed what occurred (which is why I got the correct results), but I didn't realize that the line

model@data@observed <- data[which(data$agecat==index),]

wouldn't just take care of everything. So the original number of observations was kept for each of the subgroups, even though the data were replaced. So now when I go back and examine the data for each submodel, the correct data are there, but the number of observations is still listed as the same as it was for the template.

So I've figured out how to fix this, but it does lead to one question: If I leave out the mxData statement from my template model, will everything just work using the above statement in the function to add the data, or do I need to explicitly add the number of observations and the type with statements like:

model@data@numObs <- 1345
model@data@type <- "raw"

Thanks for your help!

Replied on Sun, 02/27/2011 - 18:10
Picture of user. mspiegel Joined: 07/31/2009

In reply to by rlucas

Ah, yes I see what is happening. The numObs is set in the call to the function mxData. I would recommend replacing:

model@data@observed <- data[which(data$agecat==index),]

with:

model <- mxModel(model, mxData(data[which(data$agecat==index),], "raw"))