You are here

RAM estimation from covariance matrix where ns differ per cell

16 posts / 0 new
Last post
tbates's picture
Offline
Joined: 07/31/2009 - 14:25
RAM estimation from covariance matrix where ns differ per cell

hi all,

When mxData is a cov or cor matrix, numObs is just one number.

Quite often the cells in a covariance matrix could take advantage of different numbers of observations.

Two questions: has anyone made RAM models with an "numObs" matrix to give a per-cell n? And second, are the assumptions of a covariance-based model violated if all data do not come from complete subjects?

mhunter's picture
Offline
Joined: 07/31/2009 - 15:26
Why not use FIML?

I would think that if you have missing data that cause the different entries of the observed covariance matrix to be based on different numbers of observations, then a preferred method is to use FIML by making your MxData object contain the raw data with its missing values included.

There are two major complications I can see with using a covariance matrix that has different sample sizes for different entries. First, the missing data may have biased the observed covariance matrix. One assumption in SEM is that the data provide an unbiased estimate of the population covariance matrix. The second issue relates to fit indices, and I feel is more critical. All of the objectively scaled fit indices (e.g. RMSEA, TLI, CFI, etc.) depend on a measure of sample size. If you have a different sample size for each entry of the observed covariance matrix, then how do you find an appropriate measure for sample size? With that being said, relative fit indices like AIC and BIC should be fine to use. Chi-square tests of fit would be problematic for the same reasons as RMSEA, TLI, and CFI, but likelihood ratio tests are probably okay for the same reasons as AIC and BIC.

It seems like what you might be trying to do is assign different entries of the covariance matrix different weight based on their sample size under the assumption that more precise entries should be weighted more heavily in estimation. For example, if a particular set of parameters yields good fit to all but one entry in an observed covariance matrix and this one entry is only based on 3 observations, then these parameters should fit well. But if that one entry was based on 789 observations then those parameters make a critical mistake in describing the data. FIML should handle this case quite adequately. However, other weighting procedures are certainly possible.

Is there a compelling reason not to use raw data? For example, you don't have access to raw data?

tbates's picture
Offline
Joined: 07/31/2009 - 14:25
infinite objective

thanks: I was thinking for the raw-not-available case, but also for cases where covariance data are either much quicker and when they are more tractable...

So for models with ordinal columns, I find that I get "objective has infinite value" errors that I can't seem to work around: Which is really a second forum posting, but while we're here:

It would be great for OpenMx to show which thresholds are being placed where to generate infinite values (I'm assuming it's a problem with the thresholds.)

I also never saw a definitive answer to why it is that we no longer need to enforce thresholds being sequentially larger than the preceding threshold?. Can someone answer that: What chunk of code shifts constrains thresholds to be higher by a meaningful increment over the previous threshold in that matrix column in a mxRAMObjective with thresholds?

tbrick's picture
Offline
Joined: 07/31/2009 - 15:10
You could always use the

You could always use the patterns-of-missingness workaround. If each cell has a different n, that means that there are different patterns of missingness in the data. You could create one covariance matrix for each pattern, then build a multigroup model with one group per covariance matrix, and constrain all the parameters to be equal across groups. Of course, this changes the likelihood, the degrees of freedom, etc.; there's another thread somewhere around here about the differences between the multigroup and FIML solutions.

It might be difficult to determine which thresholds are causing the problem, especially since it's possible that it's some combination of threshold locations that's putting you in a bad spot. I mean, it's possible that it's only one pair, but if each pair of thresholds is near the edge of the space, there's likely no single pair responsible.

I suppose there could be a helper function to try the integration with different patterns of missingness to see if removing one set of thresholds is the specific cause of the problem, though. Might take some time to run all those integrations, though.

For your second question, it's just the function checkIncreasing() in src/omxAlgebraFunctions.c (line 58). It does pretty much what it says on the tin and checks that the thresholds are strictly increasing. It's called from the distributed FIML calculation in src/omxFIMLSingleIteration.c at line 266. If it fails, we essentially raise an error that says that we can't compute an objective value at that location in parameter space, and tell the optimizer to try somewhere else.

tbates's picture
Offline
Joined: 07/31/2009 - 14:25
Followup: "Objective function returned an infinite value"

I still got the error even with only df= complete.cases(df) .... so it's not just missingness.

Which made me dig.

One variable had an unused factor level... Would having more thresholds than are occupied by subjects matter... welcome to:
df = droplevels(df)

Still get the error. But now with no levels that do not contain subjects, and no missing daata. Well. The target-zone is reduced in size.

neale's picture
Offline
Joined: 07/31/2009 - 15:14
Thresholds are effectively pre-standardized

First, I would simply implement the forcing constraint of premultiplying a matrix of threshold deviations by a lower triangular matrix of 1's and bounding all thresholds in rows 2:nrow to be strictly positive (say .001). This practically has to work unless starting values are way off.

Second, remember that internally the thresholds are standardized, so it is important to get the means and the variances right. If the means are 0 and the variances 1, then the thresholds themselves (or the result of the algebra to compute them) should be within sensible range.

The following is a code snippet from the attached polychoricMatrix.R (attached) which uses the deviation strategy if the default argument useDeviations=TRUE is set.

        model <- mxModel(model, mxMatrix("Lower", name="UnitLower", nrow = maxnthresh, ncol = maxnthresh, free=F, values=1))
        # Threshold differences:
        model <- mxModel(model, mxMatrix("Full", name="thresholdDeviations", nrow = maxnthresh, ncol = nordinal, free=thresholdDeviationFree,   values=thresholdDeviationValues, lbound=thresholdDeviationLbounds, labels = thresholdDeviationLabels))
        model <- mxModel(model, mxAlgebra(UnitLower %*% thresholdDeviations, dimnames=list(tnames,ordnameList), name="thresholds"))

Finally, I think the belief that we could always get away without this strategy was optimistic. True, in some cases the optimizer would work around mis-ordered thresholds, but in others I think it would just get stuck. As it does with the more widely used polycor() function.

tbates's picture
Offline
Joined: 07/31/2009 - 14:25
so... working on this now.
  1. If it works, I will smile at a stranger :-)
  2. Q: If thresholds is an mxAlgebra, how does this insert the NAs that mxFIMLObjective is expecting to define unavailable thresholds?

More generally
Algebraically constraining the thresholds makes much more sense than just whacking the pessimiser over the head until it gives up.

As I'd already made something very similar to your auto-maker functions, it occurs to me that the boilerplate for chugging over the dataframe and making a thresholds matrix that matches the thresholds found should be built into the RAM model objective, otherwise 90% of people won't get off the ground and 10% of us will re-invent the wheel.

PS: polycor::polychor(ML= T) does die on this dataset, only the two-step runs.

mspiegel's picture
Offline
Joined: 07/31/2009 - 15:24
Somebody else with expertise

Somebody else with expertise on this problem should be answering. In the meantime, I believe that we removed the requirement that unused cells at the bottom of a column must be NA values for ordinal integration. They can be any value, for they are simply ignored.

neale's picture
Offline
Joined: 07/31/2009 - 15:14
Agreed

That is my understanding also. The functionality of the "Highest" command in classic Mx - in which the highest possible category for every variable is listed so as to avoid problems with automatic counting when using multiple groups - is handled by the use of mxFactor() to assign variable categories. So I don't think OpenMx 'cares' what is in the unused rows of the threshold matrix.

tbates's picture
Offline
Joined: 07/31/2009 - 14:25
need some documentation

Once I get this working, I will have a go at improving the help file for thresholds in mxFIMLObjective. I think currently they are not even mentioned in the Details, and there is no example that uses them.

tbates's picture
Offline
Joined: 07/31/2009 - 14:25
good news and bad news

So: Thresholds don't need to be packed with NAs, and so the multiply-bylowertriangleof1s-to-increment solution works just fine.

Bad news: I get the same problem... "Error: The job for model 'm1_raw' exited abnormally with the error message: Objective function returned an infinite value."

No missing data, no unused thresholds, guaranteed monotonicity...

neale's picture
Offline
Joined: 07/31/2009 - 15:14
Starting values?

Do you know if this error occurs at the starting values, or during optimization? Checkpointing with as frequent as possible checkpoints can diagnose which. Checkpointing may also tell you which parameters are culprits. Let us assume that it is not starting values, because this is usually fairly obvious (take a look at mxEval of the eigenvalues of the expected covariance matrices, having mxRun with unsafe=T) and straightforward to fix.

If the thresholds are always definitely increasing, there are two main possibilities: 1) the mean or thresholds are going wonky; or 2) the covariance matrix is not positive definite. For 2) you could ensure pd-ness by using a Cholesky decomposition with bounds to keep the diagonal elements strictly positive. I suspect that 2 is your problem, and that it is being caused by insufficient data such that a non-pd matrix is what the optimizer is trying to tell you is the solution. Also diagnostic here can be table() of pairs of variables in the data - if there are empty cells in a 2x2 then misbehavior is likely to result as the optimizer heads towards a correlation estimate of +1 or -1.

If the problem is 1) then you should be able to bound the means & thresholds to stop things going so haywire. The problem of course is compounded if the variance is going very small - because this would make the thresholds, once standardized, very large. Bounds might be used to help here.

tbates's picture
Offline
Joined: 07/31/2009 - 14:25
eigen(m1$objective@info$expCov)

I'm beginning to wonder if this is related to
http://openmx.psyc.virginia.edu/thread/1307#comment-3908

which these data also raised.

All the eigenvalues are positive at fail time

> round(eigen(m1$objective@info$expCov)$values,3)
 [1] 5.137 3.677 1.518 1.000 1.000 0.975 0.891 0.739 0.690 0.117 0.091 0.090

As is the determinant

> det(m1$objective@info$expCov)
[1] 0.01208029

The model$objective@info$likelihoods of the subjects are very low (mostly zero or v.small).

PS, for anyone looking for an SEM-oriented discussion of not positive definite, this page is clear, and has had input from Mike Neale and others:

http://www2.gsu.edu/~mkteer/npdmatri.html

neale's picture
Offline
Joined: 07/31/2009 - 15:14
Pattern of data associated with zero likelihoods?

So if you look at the observations with zero likelihoods, are there any notable features? If you table the ordinal variables against a likelihood==0, does it reveal anything?

Even looking at one such case should reveal something, and it pretty much has to do be due to the threshold or mean estimates as we have ruled out the covariance matrix as the source of the error. Either two thresholds are really really close together (might happen with large variance) or really extreme.

Here's a bit of code for standardizing nth thresholds and obtaining the expected proportions in each cell, and it's just for one variable. The guts of it is (thresholds-mean)/sd and these values should not be too extreme - anything more than about |3| should be cause for concern.

c(as.matrix(pnorm(as.vector(mxEval((c(0,1)-estMean)/sqrt(V),AceFitFixT))),1,nth+1),1) %*% (diag(nth) - rbind( cbind(matrix(0,nth,1),diag(nth)), matrix(0,1,nth+1)))

      [,1]      [,2]       [,3]

[1,] 0.3398691 0.6091453 0.05098562

Is this a mixed ordinal-continuous analysis? If it is, then computing the predicted thresholds conditional on the observed variables may be a problem. I'd want to look for extreme values of the continuous variables (maybe correlate them with the (zero) likelihoods?)

I'm beginning to think bug per your thread reference though - do keep going!

tbates's picture
Offline
Joined: 07/31/2009 - 14:25
followup: Can nThresholds differ from nlevels(myVar)?

Hi all,
If OpenMx is examining rows 1: nlevels() for each variable, does that mean that the levels being modelled cannot differ from the mxFactor levels in the data? Previously I guess those were independent questions: you could have a 15 level factor modelled against a binary threshold.

tbrick's picture
Offline
Joined: 07/31/2009 - 15:10
nThresholds == nLevels(myvar)-1

That's correct. For FIML, the number of levels must be consistent with the number of thresholds. A 15-level factor would have to be refactored into a two-level factor before a binary threshold could be applied.

This requirement makes sure you know what you're asking for. It's not clear otherwise which factor levels to group together on each side of the threshold, and OpenMx is dedicated to not doing things automagically when it's not clear what you were really asking for. It's part of the OpenMx philosophy.

If you happen need the information about the original factor level for some other part of the model tree, just include both the original 15-level factor and the binary factor as columns in the data set, and use the appropriate one as needed.

Note that you don't need an instance of each level in the data set itself. Consider a multi-group model--measure X might have six levels, but members of group B might only manifest levels 1 and 2. When you call
dataGrpB$X <- mxFactor(dataGrpB$X, levels=c(1:6))
you tell OpenMx that there are six levels of X possible, and that it therefore needs five thresholds. That way you can constrain the five thresholds for group B to be the same as those for group A (where the rest of the categories might actually get used).