There has been some discussion of changing the behavior of mxPath() when arrows='2', all=TRUE, and paths from (a,b) and (b,a) are specified. See here for that discussion. I have become convinced that our current implementation of mxPath()
needs some improvement. These improvements are orthogonal to the discussion of 'all=TRUE' for symmetric paths. I will describe the current state of the implementation, and then suggest several improvements.
Current System
A path specification is stored by OpenMx as an S4 objects. It contains the slots '@from', '@to', '@arrows', '@values', '@excludeself', etc. When the all=TRUE argument is used in the mxPath()
function, then each element of the 'from' argument is replicated N times, where N is the length of the 'to' argument. The vector of replicated arguments is stored in the '@from' slot of the S4 object. See here for an example. I don't remember if this implementation was selected as a matter or convenience or performance (see below for performance considerations).
As a consequence of this implementation, the '@excludeself' slot stores a boolean value indicating whether or not to exclude (x,x) paths. It is not possible with the current implementation to drop the '@exludeself' argument from the S4 object and manually weed out (x,x) paths from the '@from' and '@to' vectors, and still maintain consistency with the '@values', '@labels', '@free', etc. vectors. You can try it for yourself, using the link above as an example.
Problems with Current System
- We are breaking the OpenMx philosophy of WYSIWYS (what-you-say-is-what-you-see). The 'all=TRUE' argument is expanded just enough so that R can internally generate all pairs, but notice that all pairs are not explicitly shown.
- We need to carry around the @excludeself state because of the 'all=TRUE' implementation decision.
- If we decided to explicitly expand all the paths in the S4 object, there may be some performance considerations in the OpenMx front-end. The performance consideration would be the extra garbage collection if we need to store a collection of hundreds of paths in a non-compact representation.
Proposal #1
Eliminate the '@excludeself' slot from the S4 MxPath object. Explicitly store all the path information in the S4 object. This proposal has the benefit of simplicity. When the user generates a path object, they see exactly what paths they have generated. It has the drawback of a runtime performance penalty. Another downside is that it is too verbose a representation. For example, if I have generated 50 paths that all have free parameters, then I would see 50 TRUE
values in the '@free' slot when I try to print the MxPath object.
Proposal #2
Keep the '@excludeself' slot and add an '@all' slot to the S4 MxPath object. Then the user will see exactly what they typed into the mxPath()
function when they inspect the MxPath object. We can add a function omxExpandPaths()
that uses the '@excludeself' and '@all' slots to generates an expanded form of the MxPath object. omxExpandPaths() returns a MxPath object with '@exludeself' and '@all' set to FALSE
, but the same paths are stored in the return value as represented in the input value.
Proposal #2 maintains the current performance of the runtime and restores the WYSIWYS property to the MxPath interface.
Thanks for mentioning 1. I dislike it as much as you do, but good on you for laying it out for discussion. Option 2 is promising, and
omxExpandPaths()
is a nice touch.Most of the issues from the other thread are unaddressed by this proposal, but you stated that up front. I do see a crossover point with regards to exclude self; in the current spec, the 'self' paths are made regardless of the status of the 'includeself' flag, and then ignored when 'includeself' is set to TRUE. Could we instead filter the 'from' and 'to' vectors to remove these paths before we start? We'd have to expand the 'to' vector the same way we do the 'from' vector, then select only the elements of these lists where from!=to. We'd increase runtime, but be even more clear with regards to WYSIWYG.
Yes, I also see a potential problem with the current behavior of
excludeself = TRUE
. In the current behavior, a(x,x)
path will not be placed, but the current element for 'values', 'labels', 'free', etc. will be consumed. Let me use an example:mxPath(from=c('a','b'), to=c('a','b'), all=TRUE, arrows=1, labels=c('ab', 'ba'), excludeself=TRUE)
will generate a path from 'a' to 'b' with the label 'ba' and a path from 'b' to 'a' with the label 'ab'. One might expect the labels to be reversed.Why do we consume these elements when
excludeself = TRUE
is activated? I believe the original intention ofexcludeself = TRUE
was to avoid the need for the user to filter out any paths that have the pattern(x,x)
. In other words, if the user is going to filter out 'values', 'labels', 'free', etc., why doesn't the user just filter out 'from' and 'to' as well?I now agree that this decision is confusing. I'm not sure how to proceed from here.
The real challenge is not screwing up all=TRUE for either one-headed arrows or cases where two-headed arrows are specified for overlapping but not redundant two-headed paths. It's relatively straightforward to design a spec were we to assume that the from and to lists are equal. Tim Bates elsewhere suggested that we have different behavior when the 'to' argument is unspecified, but that creates the case that specifying the optional 'to=X' when 'from=X' is specified changes the behavior of the 'all' argument, which is both a bad idea and a run-on sentence.
I think the fact that mxPath doesn't check for lengths will help us here, such that we can tinker with the 'from' and 'to' lists to our hearts content. Here are two alternative proposals.
Proposal #3
from
andto
vectors whenall=TRUE
. Current practice reps thefrom
vector, with each item repeatedlength(to)
times, which can be achieved byrep(from, each=length(to))
. Expanding theto
vector means replacing the simpleto
vector usingrep(to, length(from))
. This generates equivalent paths as the current spec, but is more explicit by showing all pairs.excludeself
slot.excluderedundant
slot, albeit with a better name. Similar function asexcludeself
, but kills "b with a" when "a with b" has been previously stated. There are a few ways one could do this, but it can be done by making the fullfrom
andto
vectors and removing pairs for whichto[j] %in% from[1:(j-1)]
and visa versa. As a side benefit, specifying this option with arrows=1 can be used to create a Cholesky decomposition.Proposal #4
from
andto
vectors as in Proposal 3, but apply theexcludeself
andexcluderedundant
before populating the finalMxPath
object. That is, return thefrom
andto
vectors that have been filtered by theexcludeself
andexcluderedundant
arguments.excludeself
,excluderedundant
andall
slots. These are used to filter thefrom
andto
vectors, so these slots don't provide any information about future processes.Proposal 5 (with example)
from
andto
lists as above.all
andexcludeself
in favor ofuse
, which will internally assume the function of the above arguments and the proposedexcluderedundant
without argument crawl, much as theuse
argument is utilized incor
. Future additions to the range ofuse
are allowed (our first deprecation, and we're only at version 1.1).use
options, most of which need renaming: NA (equivalent toall=FALSE
), "all.pairs" (equivalent toall=TRUE
,excludeself=FALSE
,excluderedundant=FALSE
), "unique.pairs" (equivalent toall=TRUE
,excludeself=FALSE
,excluderedundant=TRUE
), "all.bivariate" (equivalent toall=TRUE
,excludeself=TRUE
,excluderedundant=FALSE
), and "unique.bivariate" (equivalent toall=TRUE
,excludeself=TRUE
,excluderedundant=TRUE
).free
,values
,labels
,lbound
andubound
refer to the filteredfrom
andlists
after theuse
argument.I've attached a function that will print
MxPath
objects under the proposed spec. Sorry about the name; I needed to name the new thing, and I've been listening to Them Crooked Vultures lately. This is just a proposal, but it was easier to write the function than write the description. Default values for the included arguments are given only because I was getting errors otherwise, and I didn't want to rewrite the entiremxPath
spec. Comments welcome.great solution: I really like how explicit this is:very wysiwyg.
tim
ps: example fun is currently broken for cases where to is missing, but I guess you knew that.
t
Nope, didn't know that. Fixed!
I didn't pay enough attention to this proposal. Now that I've looked at it with proper attention, I like it a great deal. It eliminates the need for bloating the number of arguments to the
mxPath()
, by adding one argument that can accept more values over time.To make sure that everyone is following, here is a very simple example of Proposal #5. Assume the "to" and "from" fields are
c('a','b','c')
.use =
valueI still have a concern over whether we consume the elements of the
values, labels, free
, etc. ifuse = "all.pairs"
oruse = "all.bivariate"
andarrows = 2
. I advocate that we should throw an error whenever "all.pairs" or "all.bivariate" is used and arrows = 2. The error message should instruct people to use "unique.pairs" or "unique.bivariate", respectively.Since we will no longer be consuming elements in any of the four "use" options, I think we should throw an error if
all = TRUE
orexcludeself = TRUE
. We should tell people to employ the "use=" argument, and remind them that we are no longer consuming elements. This change would be in OpenMx 1.2, and then the "all=" and "excludeself=" arguments can be removed from OpenMx 1.3.Thanks!
Has
excludeself
been in an official release yet? It wasn't in 1.0, and I'm OK with deprecation while we're still in the 1.1 beta. If it's going to disappear so quickly, maybe we leave it out of 1.1. Agreed about the error and deprecation note forall
.I like that this proposal allows for more intuitive use of labels, specifically that we don't have to feed in labels to be destroyed. I don't see any reason to consume excess labels save for backwards compatability. Are there other reasons?
Also, I like 'all', 'unique' and 'pairs'. I don't think 'bivariate' is the right word, nor do I have a good name for the standard/no expansion other than NA, which isn't clear. Proposals for alternative names hereby requested. NA makes more sense if we name the proposed argument
expand
rather thanuse
, though I likeuse
better. This is of secondary importance to actually picking a proposal, but something to think about.That all sounds fine to me, and I like the option names: they are easy to remember and distinguish.