sample weights

Posted on Mon, 04/04/2011 - 10:15

manu Joined: 10/08/2009

Forums

Dear all,
I am working with data from a study with a complex sample design. This requires the use of sample weights (which are known). Is this possible in OpenMx?
In a post by neale from 09/24/10 it has been suggested to extend the mxFIMLObjective() function so that sample weights, which are part of the dataframe, are available as an argument whithin the function. I think that is exactly what I am looking for, but I could not find any further information. Can you point me to any?
Thank you!

Replied on Mon, 04/04/2011 - 10:25

Ryne Joined: Jul 31, 2009

Weighted objectives

This is possible in OpenMx; Mike was arguing for a shortcut.

To weight objective functions in OpenMx, you'll have to specify two models. The first model is your standard model as though there were no weights, with the "vector=TRUE" option added to the FIML objective. This option makes the first model return not a single -2LL value for the model, but individual likelihoods for each row of the data.

That model is placed in a second MxModel object, which will also contain the data (if you put the data here, you don't have to put it in the first model, but you still can). That second model will contain an algebra that multiplies the likelihoods from the first model by some weight. I'll assume that you want to weight the log likelihoods by a variable in your dataset. Your second model will then look something like this, where "data.weight" is the weight from your data and "firstModel" is the name you assigned to your first model.

fullModel <- mxModel("ThisIsHowYouDoWeights",
    mxModel("firstModel", 
        ....
        mxFIMLObjective("myCov", "myMeans", dims, vector=TRUE)),
    mxData(myData, "raw"),
    mxAlgebra(-2 * sum(data.weight %x% log(firstModel.objective), name="obj"),
    mxAlgebraObjective("obj")
)

Edit: corrected the multiplication to a Kronecker product, as its a 1 x 1 definition variable matrix and a 1 x n likelihood vector.

Log in or register to post comments

Replied on Mon, 04/04/2011 - 10:42

mspiegel Joined: Jul 31, 2009

Umm, I'm not following the

Umm, I'm not following the previous explanation. The definition variable "data.weight" is a 1 x 1 matrix, and "firstModel.objective" is a n x 1 matrix when vector=TRUE. Another way to specify this is to put the weights in a n x 1 MxMatrix, and then multiply the weights by the likelihood vector. I think there's a right-parenthesis missing the closing of the model "firstModel".

Log in or register to post comments

Replied on Mon, 04/04/2011 - 11:50

Ryne Joined: Jul 31, 2009

My fault for writing code

My fault for writing code without checking it. It should be a Kronecker product in this specification. You're right that a 1 x n matrix of weights may be included in lieu of the definition variable approach. I fixed the parenthesis issue as well.

Log in or register to post comments

Replied on Mon, 04/04/2011 - 17:45

manu Joined: Oct 08, 2009

thank you

that was very helpful!
just a quick follow up question: I know Mplus uses a sandwich estimator to correct the standard errors in addition to using the weighted likelihood function. Is there a similar option in OpenMx? Can I trust the standard errors?

Log in or register to post comments

Replied on Thu, 02/20/2014 - 23:17

tbates Joined: Jul 31, 2009

Does this work to implement weighted analysis?

Hi,
Following this thread and a previous one [1], I've been trying to implement differential weighting of subjects in a model by building another model along side which multiplies the vector of fits in the first by the row weight in the data and optimizing that sum.

I thought it was working, but I am getting very different results from a colleague who is using MPlus's weight feature. Just deleting the low-weighted rows gives results more similar to his than to mine...
So,
wMZ = data.frame(weight = w[twinData$ZYG == "MZFF"]) # get weights for MZs wDZ = data.frame(weight = w[twinData$ZYG == "DZFF"]) # and for DZs ... # An MZ model with vector = T mxModel(name = "MZ", mxData(mzData, type = "raw"), mxFIMLObjective("top.mzCov", "top.expMean", vector = bVector))

# model "MZw", which simply weights the MZ model mxModel("MZw", mxData(wMZ, "raw"), mxAlgebra(-2 * sum(data.weight %x% log(MZ.objective) ), name = "mzWeightedCov"), mxAlgebraObjective("mzWeightedCov") # ... # In the container model, target an algebra which sums the weighted objectives # of the MZw and DZw, which in turn target weighted functions of the individual row # likelihoods of the underlying MZ and DZ ACE models mxAlgebra(MZw.objective + DZw.objective, name = "twin"), mxAlgebraObjective("twin")

What (if anything) is wrong with this approach? (or my implementation)

[1] http://openmx.psyc.virginia.edu/thread/445

Log in or register to post comments

Replied on Mon, 02/24/2014 - 15:01

neale Joined: Jul 31, 2009

Unsure about product

mxAlgebra(-2 * sum(data.weight %x% log(MZ.objective) ), name = "mzWeightedCov"),

Is this really doing what we want, if MZ.objective is a vector of all the likelihoods? I'd be inclined to go for:

mxAlgebra(-2 * sum(mzDataWeight * log(MZ.objective) ), name = "mzWeightedCov"),

where mzDataWeight is the name of an mxMatrix explicitly loaded with the weights.

Log in or register to post comments

Replied on Tue, 02/25/2014 - 01:02

tbates Joined: Jul 31, 2009

unconformable arrays

what you say makes sense mike, but
mxAlgebra(-2 * sum(MZw.data.weight * log(MZ.objective) ), name = "mzWeigh")

generates the unconformable array error. Unfortunately the error doesn't tell one what the noncorforming array's sizes are...

figuring this might just be a column of weights from the data entering a row of likelihoods, i tried

mxAlgebra(-2 * sum(t(MZw.data.weight) * log(MZ.objective) ), name = "mzWeigh")
Also fail...
It shouldn't be necessary to place the weights into a matrix, should it?

OK.. tried making a matrix and placing the weight data in there. A benefit is that the errors are more informative. But same error. Would be great if the back end kicked back something like I was doing "[300,1] %*% [300,1] when…"

Will look more in the morning.
PS: I assume the row objective is padded with NAs for all(is.na(rows)?

Log in or register to post comments

Replied on Tue, 02/25/2014 - 08:41

mhunter Joined: Jul 31, 2009

Three things to try

With this example there aren't a lot of arrays available to be non-conformable. It's either -2, MZw.data.weight, or MZ.objective. First, I'd make sure the objective (fit function) for the MZ model has vector=TRUE. Second, is MZw.data.weight a single column vector? The * operator is for elementwise multiplication, and it will not recycle either argument if they are different lengths like R. Use %*% for matrix multiplication. Third, you could put MZw.data.weight into its own mxAlgebra and have a look at it that way.

Log in or register to post comments

Replied on Wed, 02/26/2014 - 07:04

neale Joined: Jul 31, 2009

True in this case not many sources of non-conformability

But, it would be SUPER HELPFUL to print out the dimensions of the matrices found not to be conformable along with the error. This is something classic Mx used to do, and it's sorely missed in OpenMx. Pretty please?

Log in or register to post comments

Replied on Tue, 02/25/2014 - 18:47

tbates Joined: Jul 31, 2009

trying a simple example: vector alters estimates in model?

So, building a simple play model. After encountering some no-context errors :-(
This is just estimating the covariance of X and Y, which is around .5
require(OpenMx) require(MASS) #Simulate Data set.seed(200) rs = .5 nSubs = 1000 selVars <- c('X','Y') nVar = length(selVars) xy <- mvrnorm (nSubs, c(0,0), matrix(c(1,rs,rs,1),2,2)) testData <- data.frame(xy) names(testData) <- selVars cov(testData)


m1 <- mxModel("vector_is_FALSE",

	mxMatrix(name = "expCov", type = "Symm", nrow = nVar, ncol = nVar, free = T, values = var(testData)),

	mxMatrix(name = "expMean", type = "Full", nrow = 1, ncol = nVar, free = T, values = 0),

	mxExpectationNormal(covariance = "expCov", means = "expMean", dimnames = selVars),

	mxFitFunctionML(vector = F),

	mxData(observed = testData, type = "raw")

)

m1 <- mxRun(m1)

# summary shows all parameters recovered fine.

# Now: Switch to vector m1 <- mxModel(m1, mxFitFunctionML(vector = T), name = "vector") m1 <- mxRun(m1) umxSummary(m1, showEstimates = "std" ) # what we get # name Estimate Std.Error # 1 vector.expCov[1,1] 0.4705 9.0e+07 # 2 vector.expCov[1,2] 0.6148 1.4e+08 # 3 vector.expCov[2,2] 1.0314 2.1e+08 # what it should be... cov(testData) # XX 0.9945328 # XY 0.4818317 # YY 1.0102951

So when vector is on, something needs to change in the model to keep it driving toward good estimates?

Log in or register to post comments

Replied on Tue, 02/25/2014 - 23:53

mhunter Joined: Jul 31, 2009

Concerning ... but a solution

This simple example reliably causes my R to completely crash. Rolling back a couple of revisions does not fix the problem for me. Going back to r2655 still has the same crashing problem. For r2583, I no longer crash, but I replicate your finding of rather bad estimates only when vector=TRUE. I should also note I haven't observed this problem with other models running vector=TRUE. However, those other models do not optimize the fit function with vector=TRUE; they have that fit function as a component of another fit function. This got me thinking.

I think in this case OpenMx is trying to optimize the likelihood (not even the log likelihood) of the last row of data. Essentially, for the model you specified the row likelihoods do not collapse to a single number for optimization. The developers should discuss if we should catch this, if it should case an error, a warning, or what.

The following code works like a charm.

require(OpenMx) require(MASS) #Simulate Data set.seed(200) rs = .5 nSubs = 1000 selVars <- c('X','Y') nVar = length(selVars) xy <- mvrnorm (nSubs, c(0,0), matrix(c(1,rs,rs,1),2,2)) testData <- data.frame(xy) names(testData) <- selVars cov(testData)


m1 <- mxModel("vector_is_FALSE",

	mxMatrix(name = "expCov", type = "Symm", nrow = nVar, ncol = nVar, free = T, values = var(testData)),

	mxMatrix(name = "expMean", type = "Full", nrow = 1, ncol = nVar, free = T, values = 0, ubound=1),

	mxExpectationNormal(covariance = "expCov", means = "expMean", dimnames = selVars),

	mxFitFunctionML(vector = FALSE),

	mxData(observed = testData, type = "raw")

)

m1 <- mxRun(m1)

# summary shows all parameters recovered fine.

omxGetParameters(m1)

# Now: Switch to vector m1 <- mxModel(m1, mxFitFunctionML(vector = T), name = "vector") mc <- mxModel("vector_is_TRUE", m1, mxAlgebra(name="thefit", -2*sum(log(vector.fitfunction))), mxFitFunctionAlgebra("thefit")) mc <- mxRun(mc) omxGetParameters(mc)

Log in or register to post comments

Replied on Wed, 02/26/2014 - 00:50

tbates Joined: Jul 31, 2009

to fit or not to fit, that is the parameter

Very helpful mike H !

If a side effect of vector = TRUE is that the likelihoods are not optimised on, then perhaps mxFitFunctionML(vector = T) should have a fit = TRUE (default) parameter, which causes the ML values to be fit, rather than just optimising one value in the vector?

ie.
mxFitFunctionML(vector = TRUE, optimiseMinus2log_all = TRUE) # note, i will find ML values based on -2 * sum(log(vector))
or perhaps a new function, followed by a fit algebra. Perhaps not... the complexity of this is already in the top 1 % of IQ :-)
mxLikelihoodsML(vector = T, name = "mylike") # note, you need to fit these... mxFitFunctionAlgebra("mylike")

Log in or register to post comments

Replied on Wed, 02/26/2014 - 09:30

neale Joined: Jul 31, 2009

Note example script in models/passing

Although it uses a mixture distribution, the principle for sample weights is very similar (just have a single component weighted mixture, essentially). It uses the technique of pulling variables out of the data frame to use as weights.

http://openmx.psyc.virginia.edu/svn/trunk/models/passing/Acemix2.R

I think this circumvents the issues with dimensionality, but I agree it is a bit clunky that the likelihoods are specified on an individual-wise basis (a la definition variable) but then you get the whole vector of them back. It might be better to have a weight constructor that would compute an algebra (individual-wise) and weight the likelihoods on the way back.

Log in or register to post comments

Replied on Fri, 02/28/2014 - 17:17

tbates Joined: Jul 31, 2009

weighted data tutorial

So, I made up a tutorial-type resumé of what I think I gathered here...

https://github.com/tbates/umx/blob/master/developer/examples/weighting/weighted_analyses.md

Does that seem correct?

Also, I noticed that these two give the same result: so the
mc <- mxModel("vector_is_TRUE", mxModel("vector", mxMatrix(name = "expCov", type = "Symm", nrow = nVar, ncol = nVar, free = T, labels = c("XX","XY","YY"), values = var(testData)), mxMatrix(name = "expMean", type = "Full", nrow = 1, ncol = nVar, free = T, labels = c("meanX","meanY"), values = 0, ubound=1), mxExpectationNormal(covariance = "expCov", means = "expMean", dimnames = selVars), mxFitFunctionML(vector = TRUE), mxData(observed = testData, type = "raw") ), mxAlgebra(name = "minus2LL", -2 * sum(log(vector.fitfunction)) ), mxFitFunctionAlgebra("minus2LL") )
and this version with only one model containing two fit functions.
mxModel("vector", mxMatrix(name = "expCov", type = "Symm", nrow = nVar, ncol = nVar, free = T, labels = c("XX","XY","YY"), values = var(testData)), mxMatrix(name = "expMean", type = "Full", nrow = 1, ncol = nVar, free = T, labels = c("meanX","meanY"), values = 0, ubound=1), mxExpectationNormal(covariance = "expCov", means = "expMean", dimnames = selVars), mxAlgebra(name = "minus2LL", -2 * sum(log(vector.fitfunction)) ), mxFitFunctionML(vector = TRUE), mxFitFunctionAlgebra("minus2LL"), mxData(observed = testData, type = "raw") )

Does that make any sense?

Log in or register to post comments

Replied on Fri, 02/28/2014 - 17:53

neale Joined: Jul 31, 2009

Cool

I like this. I'm surprised that the one with two fit functions in one model actually works, but pleased it does.

Your github link is incorrect (there's a trailing w).

It would be nice to have an example of selective sampling, and then recovery of population parameters by weighting.

Log in or register to post comments

Log in or register to post comments

News

Recent Posts

sample weights

Weighted objectives

Umm, I'm not following the

My fault for writing code

thank you

Does this work to implement weighted analysis?

Unsure about product

unconformable arrays

Three things to try

True in this case not many sources of non-conformability

trying a simple example: vector alters estimates in model?

Concerning ... but a solution

to fit or not to fit, that is the parameter

Note example script in models/passing

weighted data tutorial

Cool

News

Recent Posts