Attachment | Size |
---|---|
forumdata1.dat | 559 KB |
forumdata2.dat | 426.41 KB |
forumscript2.R | 2.66 KB |
Dear OpenMx-ers,
For my research, I am running a latent variable model with an interaction effect between two latent variables on a latent outcome variable, all measured by ordinal and skewed manifest variables. I use WLS estimation and the matched pairs approach (Marsh et al., 2004) to model the latent interaction. I attached two simulated datasets:
* forumdata1: the true size of the latent interaction is 0
* forumdata2: the true size of the latent interaction effect is .04
When I run the model specified below on forumdata1 everything works fine, but when I run it on forumdata2 I get the error: Error in svd(X) : infinite or missing values in 'x'.
Can you reproduce this problem? And do you know a solution?
Details:
OpenMx version: 2.7.18 [GIT v2.7.18]
R version: R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0
MacOS: 10.11.6
Default optimiser: CSOLNP
NPSOL-enabled?: No
OpenMP-enabled?: No
forumdata1<-read.table(file="forumdata1.dat") forumdata2<-read.table(file="forumdata2.dat") xa.manifests=c("X1","X3","X6","X8","X10","X11","X14") xb.manifests=c("X2","X4","X5","X7","X9","X12","X13") xab.manifests=c("I1","I2","I3","I4","I5","I6","I7") y.manifests=c("Y1","Y2","Y3","Y4","Y5","Y6","Y7","Y8","Y9") forumdata1[,y.manifests] <- mxFactor(forumdata1[,y.manifests],levels=0:3) forumdata1[,c(xa.manifests,xb.manifests)] <- mxFactor(forumdata1[,c(xa.manifests,xb.manifests)],levels=0:4) forumdata2[,y.manifests] <- mxFactor(forumdata2[,y.manifests],levels=0:3) forumdata2[,c(xa.manifests,xb.manifests)] <- mxFactor(forumdata2[,c(xa.manifests,xb.manifests)],levels=0:4) manifests=c(xa.manifests,xb.manifests,xab.manifests,y.manifests) # The model mod <- mxModel(name="model1",type="RAM", # Data mxDataWLS(forumdata2,type="WLS"), # Specify variables manifestVars=manifests, latentVars=c("Xa","Xb","Xab","Y"), # Means mxPath(from="one",to=manifests,arrows=1,free=T,values=0.8,labels=manifests), # Thresholds mxThreshold(vars=c(xa.manifests,xb.manifests),nThres=4,free=c(F,T,T,F),values=c(-.5,-.3,.3,.5)), mxThreshold(vars=y.manifests, nThres=3,free=c(F,T,F),values=c(-.5,0,.5)), # Residual variances mxPath(from=manifests, arrows=2,free=T,labels=paste("error",manifests)), # Factor loadings mxPath(from="Xa", to=xa.manifests,arrows=1,free=T,values=.8,labels=paste("loading",xa.manifests)), mxPath(from="Xb", to=xb.manifests,arrows=1,free=T,values=.8,labels=paste("loading",xb.manifests)), mxPath(from="Xab", to=xab.manifests,arrows=1,free=T,values=.8,labels=paste("loading",xab.manifests)), mxPath(from="Y", to=y.manifests, free=c(F,rep(T,8)), values=1,labels=paste("loading",y.manifests)), # Latent (co)variances mxPath(from="Xa", arrows=2,free=F, values=1, labels="varXa"), mxPath(from="Xb", arrows=2,free=F, values=1, labels="varXb"), mxPath(from="Xab", arrows=2,free=F, values=1, labels="varXab"), mxPath(from="Xa",to="Xb", arrows=2,free=T, values=.4, labels="covXaXb"), mxPath(from="Xa",to="Xab", arrows=2,free=T, values=.4, labels="covXaXab"), mxPath(from="Xb",to="Xab", arrows=2,free=T, values=.4, labels="covXbXab"), mxPath(from="Y", arrows=2, free=T, values=1, labels="varY"), # Structural regression mxPath(from=c("Xa","Xb","Xab"), to="Y", arrows=1, free=c(T,T,T), values=c(-.05,.5,.1), labels=c("b_Xa_Y","b_Xb_Y","b_Xab_Y")), # Fit model with WLS fitFunction <- mxFitFunctionWLS() ) modrun<-mxRun(mod)
The weight matrix for forumdata2 has 578 uniquely placed NAs in it. One of those NAs is on the diagonal. When I noticed that the weight matrix itself was 578x578, I checked if there was an entire row/column of NAs; there is. It looks like poly_X12_Y2 is not identified. That is, we can't get weights for how the polychoric correlation between X12 and Y2 varies with the other parameters. This might have to do with the zeros in the cross tabulations of X12 and Y2
The error, although unfriendly, is telling you we can't invert the weight matrix because there are NAs in it. We should probably be checking this and give you a better message about it. However, a better message won't make the model run. You could replace the NAs in the weight matrix with some value and see what happens.
Then use
d2
as the data object in the model. You'd probably have to do something similar to the$fullWeight
part of the data object too. Of course, extreme care should be taken when interpreting those results and standard errors.That's very insightful.
The ordinal items in my simulation are highly skewed and contain a lot of zeroes. In other scenarios the weight matrix does not contain NA's, but I encounter another problem, especially with lower sample sizes (e.g. N=250; see attachment) In such scenarios I usually get the error that the system is computationally singular and that the pseudo-inverse derivative matrix is used because the first derivative matrix was not invertible. When I continue to run the model, there are some parameters with NaN estimates. I suspect I cannot trust the remaining parameter estimates.
Do you know a solution to this problem? And does it result from the relatively large amount of zeroes in the data or from something else (the weight matrix does not contain NA's in this case).
I'm not sure of a general solution for this. The problem is not with having lots of zeros in the data, but rather with having lots of zeros in the frequency table of one variable and another. The two can be related of course. If you have lots of zeros then you don't have lots of other categories used, and then the categories that are used rarely might never occur with another variable.
The ordinal features of OpenMx assume that for each ordinal variable there is an underlying continuous variable that is normally distributed. With highly skewed ordinal data, at some point this assumption falls apart. You may want to model the data as a Poisson count distribution instead of ordinal, or you may want to use a started log transformation on the data.
Another alternative is to use the item factor analysis (IFA) procedures. Fundamentally, IFA will make the same underlying continuous normal variable assumptions, but it uses ML instead of WLS. It's very good for lots of observed variables with only one or two latent variables.
Thanks for your explanation and for suggesting alternative methods to solve my problem. I'll definitely look into that!