sample weights
Posted on

Forums
Dear all,
I am working with data from a study with a complex sample design. This requires the use of sample weights (which are known). Is this possible in OpenMx?
In a post by neale from 09/24/10 it has been suggested to extend the mxFIMLObjective() function so that sample weights, which are part of the dataframe, are available as an argument whithin the function. I think that is exactly what I am looking for, but I could not find any further information. Can you point me to any?
Thank you!
Weighted objectives
This is possible in OpenMx; Mike was arguing for a shortcut.
To weight objective functions in OpenMx, you'll have to specify two models. The first model is your standard model as though there were no weights, with the "vector=TRUE" option added to the FIML objective. This option makes the first model return not a single -2LL value for the model, but individual likelihoods for each row of the data.
That model is placed in a second MxModel object, which will also contain the data (if you put the data here, you don't have to put it in the first model, but you still can). That second model will contain an algebra that multiplies the likelihoods from the first model by some weight. I'll assume that you want to weight the log likelihoods by a variable in your dataset. Your second model will then look something like this, where "data.weight" is the weight from your data and "firstModel" is the name you assigned to your first model.
Edit: corrected the multiplication to a Kronecker product, as its a 1 x 1 definition variable matrix and a 1 x n likelihood vector.
Log in or register to post comments
In reply to Weighted objectives by Ryne
Umm, I'm not following the
Umm, I'm not following the previous explanation. The definition variable "data.weight" is a 1 x 1 matrix, and "firstModel.objective" is a n x 1 matrix when vector=TRUE. Another way to specify this is to put the weights in a n x 1 MxMatrix, and then multiply the weights by the likelihood vector. I think there's a right-parenthesis missing the closing of the model "firstModel".
Log in or register to post comments
In reply to Umm, I'm not following the by mspiegel
My fault for writing code
My fault for writing code without checking it. It should be a Kronecker product in this specification. You're right that a 1 x n matrix of weights may be included in lieu of the definition variable approach. I fixed the parenthesis issue as well.
Log in or register to post comments
In reply to My fault for writing code by Ryne
thank you
that was very helpful!
just a quick follow up question: I know Mplus uses a sandwich estimator to correct the standard errors in addition to using the weighted likelihood function. Is there a similar option in OpenMx? Can I trust the standard errors?
Log in or register to post comments
In reply to Weighted objectives by Ryne
Does this work to implement weighted analysis?
Hi,
Following this thread and a previous one [1], I've been trying to implement differential weighting of subjects in a model by building another model along side which multiplies the vector of fits in the first by the row weight in the data and optimizing that sum.
I thought it was working, but I am getting very different results from a colleague who is using MPlus's weight feature. Just deleting the low-weighted rows gives results more similar to his than to mine...
So,
wMZ = data.frame(weight = w[twinData$ZYG == "MZFF"]) # get weights for MZs
wDZ = data.frame(weight = w[twinData$ZYG == "DZFF"]) # and for DZs
...
# An MZ model with vector = T
mxModel(name = "MZ", mxData(mzData, type = "raw"), mxFIMLObjective("top.mzCov", "top.expMean", vector = bVector))
# model "MZw", which simply weights the MZ model
mxModel("MZw",
mxData(wMZ, "raw"),
mxAlgebra(-2 * sum(data.weight %x% log(MZ.objective) ), name = "mzWeightedCov"),
mxAlgebraObjective("mzWeightedCov")
# ...
# In the container model, target an algebra which sums the weighted objectives
# of the MZw and DZw, which in turn target weighted functions of the individual row
# likelihoods of the underlying MZ and DZ ACE models
mxAlgebra(MZw.objective + DZw.objective, name = "twin"), mxAlgebraObjective("twin")
What (if anything) is wrong with this approach? (or my implementation)
[1] http://openmx.psyc.virginia.edu/thread/445
Log in or register to post comments
In reply to Does this work to implement weighted analysis? by tbates
Unsure about product
mxAlgebra(-2 * sum(data.weight %x% log(MZ.objective) ), name = "mzWeightedCov"),
Is this really doing what we want, if MZ.objective is a vector of all the likelihoods? I'd be inclined to go for:
mxAlgebra(-2 * sum(mzDataWeight * log(MZ.objective) ), name = "mzWeightedCov"),
where mzDataWeight is the name of an mxMatrix explicitly loaded with the weights.
Log in or register to post comments
unconformable arrays
what you say makes sense mike, but
mxAlgebra(-2 * sum(MZw.data.weight * log(MZ.objective) ), name = "mzWeigh")
generates the unconformable array error. Unfortunately the error doesn't tell one what the noncorforming array's sizes are...
figuring this might just be a column of weights from the data entering a row of likelihoods, i tried
mxAlgebra(-2 * sum(t(MZw.data.weight) * log(MZ.objective) ), name = "mzWeigh")
Also fail...
It shouldn't be necessary to place the weights into a matrix, should it?
OK.. tried making a matrix and placing the weight data in there. A benefit is that the errors are more informative. But same error. Would be great if the back end kicked back something like I was doing "[300,1] %*% [300,1] when…"
Will look more in the morning.
PS: I assume the row objective is padded with NAs for all(is.na(rows)?
Log in or register to post comments
In reply to unconformable arrays by tbates
Three things to try
With this example there aren't a lot of arrays available to be non-conformable. It's either -2, MZw.data.weight, or MZ.objective. First, I'd make sure the objective (fit function) for the MZ model has vector=TRUE. Second, is MZw.data.weight a single column vector? The * operator is for elementwise multiplication, and it will not recycle either argument if they are different lengths like R. Use %*% for matrix multiplication. Third, you could put MZw.data.weight into its own mxAlgebra and have a look at it that way.
Log in or register to post comments
In reply to Three things to try by mhunter
True in this case not many sources of non-conformability
But, it would be SUPER HELPFUL to print out the dimensions of the matrices found not to be conformable along with the error. This is something classic Mx used to do, and it's sorely missed in OpenMx. Pretty please?
Log in or register to post comments
In reply to unconformable arrays by tbates
trying a simple example: vector alters estimates in model?
So, building a simple play model. After encountering some no-context errors :-(
This is just estimating the covariance of X and Y, which is around .5
require(OpenMx)
require(MASS)
#Simulate Data
set.seed(200)
rs = .5
nSubs = 1000
selVars <- c('X','Y')
nVar = length(selVars)
xy <- mvrnorm (nSubs, c(0,0), matrix(c(1,rs,rs,1),2,2))
testData <- data.frame(xy)
names(testData) <- selVars
cov(testData)
m1 <- mxModel("vector_is_FALSE",
mxMatrix(name = "expCov", type = "Symm", nrow = nVar, ncol = nVar, free = T, values = var(testData)),
mxMatrix(name = "expMean", type = "Full", nrow = 1, ncol = nVar, free = T, values = 0),
mxExpectationNormal(covariance = "expCov", means = "expMean", dimnames = selVars),
mxFitFunctionML(vector = F),
mxData(observed = testData, type = "raw")
)
m1 <- mxRun(m1)
# summary shows all parameters recovered fine.
# Now: Switch to vector
m1 <- mxModel(m1, mxFitFunctionML(vector = T), name = "vector")
m1 <- mxRun(m1)
umxSummary(m1, showEstimates = "std" )
# what we get
# name Estimate Std.Error
# 1 vector.expCov[1,1] 0.4705 9.0e+07
# 2 vector.expCov[1,2] 0.6148 1.4e+08
# 3 vector.expCov[2,2] 1.0314 2.1e+08
# what it should be...
cov(testData)
# XX 0.9945328
# XY 0.4818317
# YY 1.0102951
So when vector is on, something needs to change in the model to keep it driving toward good estimates?
Log in or register to post comments
In reply to trying a simple example: vector alters estimates in model? by tbates
Concerning ... but a solution
This simple example reliably causes my R to completely crash. Rolling back a couple of revisions does not fix the problem for me. Going back to r2655 still has the same crashing problem. For r2583, I no longer crash, but I replicate your finding of rather bad estimates only when vector=TRUE. I should also note I haven't observed this problem with other models running vector=TRUE. However, those other models do not optimize the fit function with vector=TRUE; they have that fit function as a component of another fit function. This got me thinking.
I think in this case OpenMx is trying to optimize the likelihood (not even the log likelihood) of the last row of data. Essentially, for the model you specified the row likelihoods do not collapse to a single number for optimization. The developers should discuss if we should catch this, if it should case an error, a warning, or what.
The following code works like a charm.
require(OpenMx)
require(MASS)
#Simulate Data
set.seed(200)
rs = .5
nSubs = 1000
selVars <- c('X','Y')
nVar = length(selVars)
xy <- mvrnorm (nSubs, c(0,0), matrix(c(1,rs,rs,1),2,2))
testData <- data.frame(xy)
names(testData) <- selVars
cov(testData)
m1 <- mxModel("vector_is_FALSE",
mxMatrix(name = "expCov", type = "Symm", nrow = nVar, ncol = nVar, free = T, values = var(testData)),
mxMatrix(name = "expMean", type = "Full", nrow = 1, ncol = nVar, free = T, values = 0, ubound=1),
mxExpectationNormal(covariance = "expCov", means = "expMean", dimnames = selVars),
mxFitFunctionML(vector = FALSE),
mxData(observed = testData, type = "raw")
)
m1 <- mxRun(m1)
# summary shows all parameters recovered fine.
omxGetParameters(m1)
# Now: Switch to vector
m1 <- mxModel(m1, mxFitFunctionML(vector = T), name = "vector")
mc <- mxModel("vector_is_TRUE", m1,
mxAlgebra(name="thefit", -2*sum(log(vector.fitfunction))),
mxFitFunctionAlgebra("thefit"))
mc <- mxRun(mc)
omxGetParameters(mc)
Log in or register to post comments
In reply to Concerning ... but a solution by mhunter
to fit or not to fit, that is the parameter
Very helpful mike H !
If a side effect of vector = TRUE is that the likelihoods are not optimised on, then perhaps mxFitFunctionML(vector = T) should have a fit = TRUE (default) parameter, which causes the ML values to be fit, rather than just optimising one value in the vector?
ie.
mxFitFunctionML(vector = TRUE, optimiseMinus2log_all = TRUE) # note, i will find ML values based on -2 * sum(log(vector))
or perhaps a new function, followed by a fit algebra. Perhaps not... the complexity of this is already in the top 1 % of IQ :-)
mxLikelihoodsML(vector = T, name = "mylike") # note, you need to fit these...
mxFitFunctionAlgebra("mylike")
Log in or register to post comments
In reply to to fit or not to fit, that is the parameter by tbates
Note example script in models/passing
Although it uses a mixture distribution, the principle for sample weights is very similar (just have a single component weighted mixture, essentially). It uses the technique of pulling variables out of the data frame to use as weights.
http://openmx.psyc.virginia.edu/svn/trunk/models/passing/Acemix2.R
I think this circumvents the issues with dimensionality, but I agree it is a bit clunky that the likelihoods are specified on an individual-wise basis (a la definition variable) but then you get the whole vector of them back. It might be better to have a weight constructor that would compute an algebra (individual-wise) and weight the likelihoods on the way back.
Log in or register to post comments
In reply to to fit or not to fit, that is the parameter by tbates
weighted data tutorial
So, I made up a tutorial-type resumé of what I think I gathered here...
https://github.com/tbates/umx/blob/master/developer/examples/weighting/weighted_analyses.md
Does that seem correct?
Also, I noticed that these two give the same result: so the
mc <- mxModel("vector_is_TRUE",
mxModel("vector",
mxMatrix(name = "expCov", type = "Symm", nrow = nVar, ncol = nVar, free = T, labels = c("XX","XY","YY"), values = var(testData)),
mxMatrix(name = "expMean", type = "Full", nrow = 1, ncol = nVar, free = T, labels = c("meanX","meanY"), values = 0, ubound=1),
mxExpectationNormal(covariance = "expCov", means = "expMean", dimnames = selVars),
mxFitFunctionML(vector = TRUE),
mxData(observed = testData, type = "raw")
),
mxAlgebra(name = "minus2LL", -2 * sum(log(vector.fitfunction)) ),
mxFitFunctionAlgebra("minus2LL")
)
and this version with only one model containing two fit functions.
mxModel("vector",
mxMatrix(name = "expCov", type = "Symm", nrow = nVar, ncol = nVar, free = T, labels = c("XX","XY","YY"), values = var(testData)),
mxMatrix(name = "expMean", type = "Full", nrow = 1, ncol = nVar, free = T, labels = c("meanX","meanY"), values = 0, ubound=1),
mxExpectationNormal(covariance = "expCov", means = "expMean", dimnames = selVars),
mxAlgebra(name = "minus2LL", -2 * sum(log(vector.fitfunction)) ),
mxFitFunctionML(vector = TRUE),
mxFitFunctionAlgebra("minus2LL"),
mxData(observed = testData, type = "raw")
)
Does that make any sense?
Log in or register to post comments
In reply to weighted data tutorial by tbates
Cool
I like this. I'm surprised that the one with two fit functions in one model actually works, but pleased it does.
Your github link is incorrect (there's a trailing w).
It would be nice to have an example of selective sampling, and then recovery of population parameters by weighting.
Log in or register to post comments