You are here

RFC: Fixing implementation of 'all' and 'excludeself' in mxPath()

10 posts / 0 new
Last post
mspiegel's picture
Offline
Joined: 07/31/2009 - 15:24
RFC: Fixing implementation of 'all' and 'excludeself' in mxPath()

There has been some discussion of changing the behavior of mxPath() when arrows='2', all=TRUE, and paths from (a,b) and (b,a) are specified. See here for that discussion. I have become convinced that our current implementation of mxPath() needs some improvement. These improvements are orthogonal to the discussion of 'all=TRUE' for symmetric paths. I will describe the current state of the implementation, and then suggest several improvements.

Current System

A path specification is stored by OpenMx as an S4 objects. It contains the slots '@from', '@to', '@arrows', '@values', '@excludeself', etc. When the all=TRUE argument is used in the mxPath() function, then each element of the 'from' argument is replicated N times, where N is the length of the 'to' argument. The vector of replicated arguments is stored in the '@from' slot of the S4 object. See here for an example. I don't remember if this implementation was selected as a matter or convenience or performance (see below for performance considerations).

As a consequence of this implementation, the '@excludeself' slot stores a boolean value indicating whether or not to exclude (x,x) paths. It is not possible with the current implementation to drop the '@exludeself' argument from the S4 object and manually weed out (x,x) paths from the '@from' and '@to' vectors, and still maintain consistency with the '@values', '@labels', '@free', etc. vectors. You can try it for yourself, using the link above as an example.

Problems with Current System

  • We are breaking the OpenMx philosophy of WYSIWYS (what-you-say-is-what-you-see). The 'all=TRUE' argument is expanded just enough so that R can internally generate all pairs, but notice that all pairs are not explicitly shown.
  • We need to carry around the @excludeself state because of the 'all=TRUE' implementation decision.
  • If we decided to explicitly expand all the paths in the S4 object, there may be some performance considerations in the OpenMx front-end. The performance consideration would be the extra garbage collection if we need to store a collection of hundreds of paths in a non-compact representation.

Proposal #1

Eliminate the '@excludeself' slot from the S4 MxPath object. Explicitly store all the path information in the S4 object. This proposal has the benefit of simplicity. When the user generates a path object, they see exactly what paths they have generated. It has the drawback of a runtime performance penalty. Another downside is that it is too verbose a representation. For example, if I have generated 50 paths that all have free parameters, then I would see 50 TRUE values in the '@free' slot when I try to print the MxPath object.

Proposal #2

Keep the '@excludeself' slot and add an '@all' slot to the S4 MxPath object. Then the user will see exactly what they typed into the mxPath() function when they inspect the MxPath object. We can add a function omxExpandPaths() that uses the '@excludeself' and '@all' slots to generates an expanded form of the MxPath object. omxExpandPaths() returns a MxPath object with '@exludeself' and '@all' set to FALSE, but the same paths are stored in the return value as represented in the input value.

Proposal #2 maintains the current performance of the runtime and restores the WYSIWYS property to the MxPath interface.

Ryne's picture
Offline
Joined: 07/31/2009 - 15:12
Thanks for mentioning 1. I

Thanks for mentioning 1. I dislike it as much as you do, but good on you for laying it out for discussion. Option 2 is promising, and omxExpandPaths() is a nice touch.

Most of the issues from the other thread are unaddressed by this proposal, but you stated that up front. I do see a crossover point with regards to exclude self; in the current spec, the 'self' paths are made regardless of the status of the 'includeself' flag, and then ignored when 'includeself' is set to TRUE. Could we instead filter the 'from' and 'to' vectors to remove these paths before we start? We'd have to expand the 'to' vector the same way we do the 'from' vector, then select only the elements of these lists where from!=to. We'd increase runtime, but be even more clear with regards to WYSIWYG.

mspiegel's picture
Offline
Joined: 07/31/2009 - 15:24
Yes, I also see a potential

Yes, I also see a potential problem with the current behavior of excludeself = TRUE. In the current behavior, a (x,x) path will not be placed, but the current element for 'values', 'labels', 'free', etc. will be consumed. Let me use an example: mxPath(from=c('a','b'), to=c('a','b'), all=TRUE, arrows=1, labels=c('ab', 'ba'), excludeself=TRUE) will generate a path from 'a' to 'b' with the label 'ba' and a path from 'b' to 'a' with the label 'ab'. One might expect the labels to be reversed.

Why do we consume these elements when excludeself = TRUE is activated? I believe the original intention of excludeself = TRUE was to avoid the need for the user to filter out any paths that have the pattern (x,x). In other words, if the user is going to filter out 'values', 'labels', 'free', etc., why doesn't the user just filter out 'from' and 'to' as well?

I now agree that this decision is confusing. I'm not sure how to proceed from here.

Ryne's picture
Offline
Joined: 07/31/2009 - 15:12
The real challenge is not

The real challenge is not screwing up all=TRUE for either one-headed arrows or cases where two-headed arrows are specified for overlapping but not redundant two-headed paths. It's relatively straightforward to design a spec were we to assume that the from and to lists are equal. Tim Bates elsewhere suggested that we have different behavior when the 'to' argument is unspecified, but that creates the case that specifying the optional 'to=X' when 'from=X' is specified changes the behavior of the 'all' argument, which is both a bad idea and a run-on sentence.

I think the fact that mxPath doesn't check for lengths will help us here, such that we can tinker with the 'from' and 'to' lists to our hearts content. Here are two alternative proposals.

Proposal #3

  • Expand both the from and to vectors when all=TRUE. Current practice reps the from vector, with each item repeated length(to) times, which can be achieved by rep(from, each=length(to)). Expanding the to vector means replacing the simple to vector using rep(to, length(from)). This generates equivalent paths as the current spec, but is more explicit by showing all pairs.
  • Keep the excludeself slot.
  • Add a excluderedundant slot, albeit with a better name. Similar function as excludeself, but kills "b with a" when "a with b" has been previously stated. There are a few ways one could do this, but it can be done by making the full from and to vectors and removing pairs for which to[j] %in% from[1:(j-1)] and visa versa. As a side benefit, specifying this option with arrows=1 can be used to create a Cholesky decomposition.
  • Proposal #4

  • Expand the from and to vectors as in Proposal 3, but apply the excludeself and excluderedundant before populating the final MxPath object. That is, return the from and to vectors that have been filtered by the excludeself and excluderedundant arguments.
  • Omit the excludeself, excluderedundant and all slots. These are used to filter the from and to vectors, so these slots don't provide any information about future processes.
  • Ryne's picture
    Offline
    Joined: 07/31/2009 - 15:12
    Proposal 5 (with example)

    Proposal 5 (with example)

  • Internally expand the from and to lists as above.
  • DEPRECATE all and excludeself in favor of use, which will internally assume the function of the above arguments and the proposed excluderedundant without argument crawl, much as the use argument is utilized in cor. Future additions to the range of use are allowed (our first deprecation, and we're only at version 1.1).
  • Initially allow the following use options, most of which need renaming: NA (equivalent to all=FALSE), "all.pairs" (equivalent to all=TRUE, excludeself=FALSE, excluderedundant=FALSE), "unique.pairs" (equivalent to all=TRUE, excludeself=FALSE, excluderedundant=TRUE), "all.bivariate" (equivalent to all=TRUE, excludeself=TRUE, excluderedundant=FALSE), and "unique.bivariate" (equivalent to all=TRUE, excludeself=TRUE, excluderedundant=TRUE).
  • Have free, values, labels, lbound and ubound refer to the filtered from and lists after the use argument.
  • I've attached a function that will print MxPath objects under the proposed spec. Sorry about the name; I needed to name the new thing, and I've been listening to Them Crooked Vultures lately. This is just a proposal, but it was easier to write the function than write the description. Default values for the included arguments are given only because I was getting errors otherwise, and I didn't want to rewrite the entire mxPath spec. Comments welcome.

    tbates's picture
    Offline
    Joined: 07/31/2009 - 14:25
    bump

    great solution: I really like how explicit this is:very wysiwyg.
    tim

    ps: example fun is currently broken for cases where to is missing, but I guess you knew that.
    t

    Ryne's picture
    Offline
    Joined: 07/31/2009 - 15:12
    Nope, didn't know that.

    Nope, didn't know that. Fixed!

    mspiegel's picture
    Offline
    Joined: 07/31/2009 - 15:24
    I didn't pay enough attention

    I didn't pay enough attention to this proposal. Now that I've looked at it with proper attention, I like it a great deal. It eliminates the need for bloating the number of arguments to the mxPath(), by adding one argument that can accept more values over time.

    To make sure that everyone is following, here is a very simple example of Proposal #5. Assume the "to" and "from" fields are c('a','b','c').

    use = value Outcome
    "all.pairs" (a,a), (a,b), (a,c), (b,a), (b,b), (b,c), (c,a), (c,b), (c,c)
    "unique.pairs" (a,a), (a,b), (a,c), (b,b), (b,c), (c,c)
    "all.bivariate" (a,b), (a,c), (b,a), (b,c), (c,a), (c,b)
    "unique.bivariate" (a,b), (a,c), (b,c)

    I still have a concern over whether we consume the elements of the values, labels, free, etc. if use = "all.pairs" or use = "all.bivariate" and arrows = 2. I advocate that we should throw an error whenever "all.pairs" or "all.bivariate" is used and arrows = 2. The error message should instruct people to use "unique.pairs" or "unique.bivariate", respectively.

    Since we will no longer be consuming elements in any of the four "use" options, I think we should throw an error if all = TRUE or excludeself = TRUE. We should tell people to employ the "use=" argument, and remind them that we are no longer consuming elements. This change would be in OpenMx 1.2, and then the "all=" and "excludeself=" arguments can be removed from OpenMx 1.3.

    Ryne's picture
    Offline
    Joined: 07/31/2009 - 15:12
    Thanks! Has excludeself been

    Thanks!

    Has excludeself been in an official release yet? It wasn't in 1.0, and I'm OK with deprecation while we're still in the 1.1 beta. If it's going to disappear so quickly, maybe we leave it out of 1.1. Agreed about the error and deprecation note for all.

    I like that this proposal allows for more intuitive use of labels, specifically that we don't have to feed in labels to be destroyed. I don't see any reason to consume excess labels save for backwards compatability. Are there other reasons?

    Also, I like 'all', 'unique' and 'pairs'. I don't think 'bivariate' is the right word, nor do I have a good name for the standard/no expansion other than NA, which isn't clear. Proposals for alternative names hereby requested. NA makes more sense if we name the proposed argument expand rather than use, though I like use better. This is of secondary importance to actually picking a proposal, but something to think about.

    tbates's picture
    Offline
    Joined: 07/31/2009 - 14:25
    Plus 1

    That all sounds fine to me, and I like the option names: they are easy to remember and distinguish.