Dear all,

I am working with data from a study with a complex sample design. This requires the use of sample weights (which are known). Is this possible in OpenMx?

In a post by neale from 09/24/10 it has been suggested to extend the mxFIMLObjective() function so that sample weights, which are part of the dataframe, are available as an argument whithin the function. I think that is exactly what I am looking for, but I could not find any further information. Can you point me to any?

Thank you!

This is possible in OpenMx; Mike was arguing for a shortcut.

To weight objective functions in OpenMx, you'll have to specify two models. The first model is your standard model as though there were no weights, with the "vector=TRUE" option added to the FIML objective. This option makes the first model return not a single -2LL value for the model, but individual likelihoods for each row of the data.

That model is placed in a second MxModel object, which will also contain the data (if you put the data here, you don't have to put it in the first model, but you still can). That second model will contain an algebra that multiplies the likelihoods from the first model by some weight. I'll assume that you want to weight the log likelihoods by a variable in your dataset. Your second model will then look something like this, where "data.weight" is the weight from your data and "firstModel" is the name you assigned to your first model.

Edit: corrected the multiplication to a Kronecker product, as its a 1 x 1 definition variable matrix and a 1 x n likelihood vector.

Umm, I'm not following the previous explanation. The definition variable "data.weight" is a 1 x 1 matrix, and "firstModel.objective" is a n x 1 matrix when vector=TRUE. Another way to specify this is to put the weights in a n x 1 MxMatrix, and then multiply the weights by the likelihood vector. I think there's a right-parenthesis missing the closing of the model "firstModel".

My fault for writing code without checking it. It should be a Kronecker product in this specification. You're right that a 1 x n matrix of weights may be included in lieu of the definition variable approach. I fixed the parenthesis issue as well.

that was very helpful!

just a quick follow up question: I know Mplus uses a sandwich estimator to correct the standard errors in addition to using the weighted likelihood function. Is there a similar option in OpenMx? Can I trust the standard errors?

Hi,

Following this thread and a previous one [1], I've been trying to implement differential weighting of subjects in a model by building another model along side which multiplies the vector of fits in the first by the row weight in the data and optimizing that sum.

I thought it was working, but I am getting very different results from a colleague who is using MPlus's weight feature. Just deleting the low-weighted rows gives results more similar to his than to mine...

So,

What (if anything) is wrong with this approach? (or my implementation)

[1] http://openmx.psyc.virginia.edu/thread/445

Is this really doing what we want, if MZ.objective is a vector of all the likelihoods? I'd be inclined to go for:

where mzDataWeight is the name of an mxMatrix explicitly loaded with the weights.

what you say makes sense mike, but

generates the unconformable array error. Unfortunately the error doesn't tell one what the noncorforming array's sizes are...

figuring this might just be a column of weights from the data entering a row of likelihoods, i tried

Also fail...

It shouldn't be necessary to place the weights into a matrix, should it?

OK.. tried making a matrix and placing the weight data in there. A benefit is that the errors are more informative. But same error. Would be great if the back end kicked back something like I was doing "[300,1] %*% [300,1] when…"

Will look more in the morning.

PS: I assume the row objective is padded with NAs for all(is.na(rows)?

With this example there aren't a lot of arrays available to be non-conformable. It's either -2, MZw.data.weight, or MZ.objective. First, I'd make sure the objective (fit function) for the MZ model has vector=TRUE. Second, is MZw.data.weight a single column vector? The * operator is for elementwise multiplication, and it will not recycle either argument if they are different lengths like R. Use %*% for matrix multiplication. Third, you could put MZw.data.weight into its own mxAlgebra and have a look at it that way.

But, it would be SUPER HELPFUL to print out the dimensions of the matrices found not to be conformable along with the error. This is something classic Mx used to do, and it's sorely missed in OpenMx. Pretty please?

So, building a simple play model. After encountering some no-context errors :-(

This is just estimating the covariance of X and Y, which is around .5

So when vector is on, something needs to change in the model to keep it driving toward good estimates?

This simple example reliably causes my R to completely crash. Rolling back a couple of revisions does not fix the problem for me. Going back to r2655 still has the same crashing problem. For r2583, I no longer crash, but I replicate your finding of rather bad estimates only when vector=TRUE. I should also note I haven't observed this problem with other models running vector=TRUE. However, those other models do not optimize the fit function with vector=TRUE; they have that fit function as a component of another fit function. This got me thinking.

I think in this case OpenMx is trying to optimize the likelihood (not even the log likelihood) of the last row of data. Essentially, for the model you specified the row likelihoods do not collapse to a single number for optimization. The developers should discuss if we should catch this, if it should case an error, a warning, or what.

The following code works like a charm.

Very helpful mike H !

If a side effect of vector = TRUE is that the likelihoods are not optimised on, then perhaps mxFitFunctionML(vector = T) should have a fit = TRUE (default) parameter, which causes the ML values to be fit, rather than just optimising one value in the vector?

ie.

or perhaps a new function, followed by a fit algebra. Perhaps not... the complexity of this is already in the top 1 % of IQ :-)

Although it uses a mixture distribution, the principle for sample weights is very similar (just have a single component weighted mixture, essentially). It uses the technique of pulling variables out of the data frame to use as weights.

http://openmx.psyc.virginia.edu/svn/trunk/models/passing/Acemix2.R

I think this circumvents the issues with dimensionality, but I agree it is a bit clunky that the likelihoods are specified on an individual-wise basis (a la definition variable) but then you get the whole vector of them back. It might be better to have a weight constructor that would compute an algebra (individual-wise) and weight the likelihoods on the way back.

So, I made up a tutorial-type resumé of what I think I gathered here...

https://github.com/tbates/umx/blob/master/developer/examples/weighting/weighted_analyses.md

Does that seem correct?

Also, I noticed that these two give the same result: so the

and this version with only one model containing two fit functions.

Does that make any sense?

I like this. I'm surprised that the one with two fit functions in one model actually works, but pleased it does.

Your github link is incorrect (there's a trailing w).

It would be nice to have an example of selective sampling, and then recovery of population parameters by weighting.