Hi,

Sitting watching R computing a pretty simple set of CIs for the last 4 hrs, and with no progress-bar feedback...

I wonder:

1. Why don't we use the cheap SEs from the 2nd derivative matrix to seed the starts for CI computation? What would be wrong with that?

2. Even with variance in how long it takes per CI, something like this would be helpful to the user:

nCI <- length(CIs) # actually want this in cells, so unpack matrices, rows etc into vector of cell addresses) # create progress bar pbar = txtProgressBar(min = 0, max = nCI, style = 3) for(i in 1:nCI) { model <- compute CI[i] setTxtProgressBar(pb, i) } close(pb)

Reporter:

Created:

Mon, 07/28/2014 - 20:10

Updated:

Thu, 09/15/2016 - 13:47

- Log in or register to post comments
- Printer-friendly version

## Comments

## #1

## #2

Are you suggesting using MLE + 2SE (or something like that) as a starting value for an upper confidence limit, and likewise MLE - 2SE for a lower confidence limit? With boundary conditions, that could lead to start values outside the feasible space. Plus, algebras don't have SEs to begin with.

## #3

> Are you suggesting using MLE + 2SE and -2SE as a starting value for an upper and lower confidence limit?

yes.

> With boundary conditions, that could lead to start values outside the feasible space.

Yes, they'd sometimes have to be dragged back into the feasible space if they cross a bound. But that's automatic, no?

> Plus, algebras don't have SEs to begin with.

Yes, it wouldn't help everywhere by any means.

## #4

I think this is generally a good idea. Whenever possible, start CI estimation near where it's probably going to be. The facts that 1. boundary conditions require adjustments, and 2. algebras don't have SEs, shouldn't stop us from making CIs work better in many use cases. When we can easily get good starting values for CIs, let's do it. When we can't, let's keep doing what we've been doing.

## #5

Thanks MikeH!

Made me wonder two things: Good starts for algebras, and Whether this is worth doing.

The doing part: I wonder how much of those 5 hours are spent getting close to the right value, and how much nudging it by tiny amounts to get the -2ll nudged by "exactly" 3.84? i.e., if we started with an 1.96*SE guess, how much would that save from the current process? (If the optimizer takes 3-steps to get as close our guess and then spends 6 more getting the precision, then this is a >30% speedup and worth having. But if we're getting into the ballpark in 3 steps then spending 30 more juggling everything else around the fixed parameter to target the desired -2lll drop off, then a big win would require a better optimiser or running on a 1000 core system and get their CI answers in a flat run time *.

On the algebra side, even though algebras don't have CIs, the user might know a lot about the CI.

For instance, I know in this case that they are just standardizations of some underlying matrices, and I bet thats by far the most common user-case for CI running on an algebra.

What if, like mxMultigroup(), we create a mxStandardization(A,B,C) function, which would allow OpenMx to imply the necessary information to translate the SE information on the input matrices into a start value for any particular algebra-cell output? (given that it now knows that Astd = A/(A+B+C) )?

PPS: A progress bar using Rs built in progress API would be a lot nicer than checkpointing, although the implementation might be via a wrapper around auto-checkpointing when length(CIs)>1

PPPS: There are some nice progress system that generate notifications, like event notifications. So you could get a "weather report" type notification of "n/N CIs run" in your android or iOS notification pool.

## #6

Per

What if, like mxMultigroup(), we create a mxStandardization(A,B,C) function, which would allow OpenMx to imply the necessary information to translate the SE information on the input matrices into a start value for any particular algebra-cell output? (given that it now knows that Astd = A/(A+B+C) )?

it is not clear to me whether one should start at the upper SE of A and the MLE's of B and C. If the parameters correlate then the estimates of B and C will be different at the upper/lower CI of A than they are at the ML solution. One could, in principle, look at the Hessian and use it to infer values for the B and C parameters. That would be clever, but of course might not work well.

If there are many CIs to compute, I recommend bootstrap, which also informs about the covariance between parameters.

## #7

Pushing commit soon.