summary() contents in 2.0

Good to think about what 2.0 will show in summary()?

With a goal of surfacing the almost-always wanted top-level detail for the user, and not showing things that are obtained elsewhere or are not summary data, then 4 elements that can perhaps be dropped are:
1. compute plan
2. data
3. timestamp
4. OpenMx version

Then perhaps a line of text beneath the summary saying:

"See help(OpenMx_Output) for examples of how to easily access expected and obtained data summarys, elapsed time, packageVersion("OpenMx"), the compute plan and more." [1]

Then the user gets free parameters, and the parameter counts, fit stats, etc they likely want to see. what do people think?

[1] Happy to write an OpenMx_Output.Rd documenting how to get these other things.

Having the OpenMx version, and which optimizer was used are both very useful pieces of information when users encounter issues. Often users send snippets of output, and they may have run the job on a cluster so don't even have the objects to hand any more. I'd vote for hanging onto version & optimizer info. Compute plan for IFA or similar that use multiple optimizers I figure could be hidden from summary.

Very important to have a nice walk-the-whole-model-tree function to summarize the data. User confusion often arises because they only see estimates and not the data. While this is an argument for keeping the data summary section, I agree it's way too voluminous at present, and can easily be obtained by summary() or describe() on the data. Again a walk-the-tree function to get these on all sub-mxModels would be very helpful.

Sounds good: Perhaps when we are out of beta, stop running packageVersion("OpenMx") for people every time.

We currently don't show the optimizer... (list of current output from a simple RAM model below)

Output walker is a good idea. I think also that having a "whatMplusShowsYou()" would be a help(er)ful function to write.

Example output from OpenMx version number: 999.0.0-3521


compute plan:
MxComputeSequence 'compute'
$freeSet : '.'
steps[[ 1 ]] :
MxComputeGradientDescent 'compute'
$freeSet : '.'
$engine : 'CSOLNP'
$fitfunction : 'big_motor_low_mpg.fitfunction'
$verbose : 0
steps[[ 2 ]] :
MxComputeNumericDeriv 'compute'
$freeSet : '.'
$fitfunction : 'big_motor_low_mpg.fitfunction'
$parallel : TRUE
$stepSize : 1e-04
$iterations : 4
$verbose : 0
steps[[ 3 ]] :
MxComputeStandardError 'compute'
$freeSet : '.'
steps[[ 4 ]] :
MxComputeReportDeriv 'compute'
$freeSet : '.'

data:
$big_motor_low_mpg.data
mpg disp gear
Min. :10.40 Min. : 71.1 Min. :3.000
1st Qu.:15.43 1st Qu.:120.8 1st Qu.:3.000
Median :19.20 Median :196.3 Median :4.000
Mean :20.09 Mean :230.7 Mean :3.688
3rd Qu.:22.80 3rd Qu.:326.0 3rd Qu.:4.000
Max. :33.90 Max. :472.0 Max. :5.000

free parameters:
name matrix row col Estimate Std.Error lbound ubound
1 mpg_with_mpg S mpg mpg 3.518850e+01 8.7969481
2 disp_with_disp S disp disp 1.487824e+04 3718.7706007
3 gear_with_gear S gear gear 5.273663e-01 0.1318468
4 one_to_mpg M 1 mpg 2.009039e+01 1.0486368
5 one_to_disp M 1 disp 2.307250e+02 21.5625602
6 one_to_gear M 1 gear 3.687500e+00 0.1283752

observed statistics: 96
estimated parameters: 6
degrees of freedom: 90
-2 log likelihood: 673.3532
saturated -2 log likelihood: NA
number of observations: 32
chi-square: NA
p: NA
Information Criteria:
df Penalty Parameters Penalty Sample-Size Adjusted
AIC: 493.3532 685.3532 NA
BIC: 361.4369 694.1476 675.443
CFI: NA
TLI: NA
RMSEA: NA
timestamp: 2014-06-03 11:21:42
frontend time: 0.07291961 secs
backend time: 0.01539612 secs
independent submodels time: 4.911423e-05 secs
wall clock time: 0.08836484 secs
cpu time: 0.08836484 secs
OpenMx version number: 999.0.0-3521

Er yes, we do show the optimizer in the compute plan:

$engine : 'CSOLNP'

Mplus style output function seems reasonable, but could well be a rapidly moving target. I'd name it something else (maybe cover Mplus, LISREL, Amos whatever). Also, since we don't fit saturated ML all the time, it would be abbreviated unless user supplies the saturated model fit & df (or npars).

If the compute plan falls out of summary, we'll have to copy $engine into somewhere else in summary. Though I do agree with Mike: engine and version need to be in summary. Compute plan should be accessible, but probably not in summary (though we could get clever and create some type of verbose or include argument to summary in the …). Data summary should probably go away, in part because we don't teach using summary for data, instead pushing towards psych's describe function. Having summaries that are taller than the default R window size always seemed weird to me, at least for simple models.

I also concur that engine and version should be in summary() output.

The issue is complicated (as Josh anticipated) by the possible use of multiple optimizers in one mxRun(). Failure (non-pd matrix or similar) might occur at any stage, and it would useful to know at which one. Also possible is failure even before any attempt at optimization (Let's see if likelihood or other fit function can be evaluated at the starting values...).

Communication of this sort seems like a nicety, but it's really advantageous to novice user and expert alike, in that it simplifies their communication with each other and identifies the source of the problem.

Perhaps a function might help people get help, maybe a function

mxHelpRequest(model, reveal = c("all", "result", "model", "m_and_simData"), "remarks")

which would take the user's model and create a post on the help forums, a gist and/or attached .Rdata file they can use for bug reports?

kind of nice, and perhaps doable

* Most of the problems I see in giving people help relate to them not providing the code that repros the problem: This would eliminate that

The current MxSummary of the Frontpage model is below:


> summary(mxRun(factorModel))
Running One Factor
Summary of model One Factor

free parameters:
name matrix row col Estimate Std.Error
1 One Factor.A[1,6] A x1 G 0.39715214 0.015549740
2 One Factor.A[2,6] A x2 G 0.50366113 0.018232468
3 One Factor.A[3,6] A x3 G 0.57724141 0.020448361
4 One Factor.A[4,6] A x4 G 0.70277363 0.024011355
5 One Factor.A[5,6] A x5 G 0.79625002 0.026669401
6 One Factor.S[1,1] S x1 x1 0.04081419 0.002812718
7 One Factor.S[2,2] S x2 x2 0.03801999 0.002805794
8 One Factor.S[3,3] S x3 x3 0.04082719 0.003152310
9 One Factor.S[4,4] S x4 x4 0.03938706 0.003408875
10 One Factor.S[5,5] S x5 x5 0.03628710 0.003678562

observed statistics: 15
estimated parameters: 10
degrees of freedom: 5
-2 log likelihood: -3648.281
saturated -2 log likelihood: -3655.665
number of observations: 500
chi-square: 7.384002
chi-square degrees of freedom: 5
chi-square p-value: 0.1936117
Information Criteria:
| df Penalty | Parameters Penalty | Sample-Size Adjusted
AIC: -2.615998 27.38400 NA
BIC: -23.689038 69.53008 37.78947
CFI: 0.9993583
TLI: 0.9987166
RMSEA: 0.03088043
RMSEA 95% CI: (0, 0.08139005)
timestamp: 2014-08-01 01:07:37
wall clock time: 0.119 secs
OpenMx version number: 2.0.0.3709
Need help? See help(mxSummary)

nice mike!
A minor, and perhaps not useful thought: lineup elements of the stats table visually?


observed statistics 15
estimated parameters 10
degrees of freedom 5
-2 log likelihood -3648.281
saturated -2 log likelihood -3655.665
number of observations 500
chi-square 7.384002
chi-square degrees of freedom 5
chi-square p-value 0.1936117

For people wanting to get this stuff into word or elsewhere, I am now embedding R2HTML calls in my reporting functions, so the result get opened in the web browser as an html table. Easy to paste into other places.


R2HTML::HTML(x, file = file, Border = 0, append = F, sortableDF = T)
system(paste0("open ", file))
print("Table opened in browser")

Just to say that as of 3750 we are not showing the engine, even when there was only one.

The engine is part of the compute plan and is shown in summary when verbose=TRUE.