shared memory space between R & OpenMx AND between R & xxm

Hi,
I am hoping that the c gurus can help me understand how OpenMx matrices
are stored, updated and passed around.
Here is the situation:
We are trying design data-structures for the block-sparse BLAS routines
and are struggling with the issue of ownership of "double" matrices. Our
key data-structure is called the "blockSparseMatrix". For example, there
may be a huge "factor-loading" matrix where each dense sub-block would
map on to one or more OpenMx parameter matrices. In the ordinary
factor-model, each dense sub-block would point to the single
factor-loading matrix. Alternatively in the growth-curve case with
definition variables, each dense-block would point to person specific
factor-loading matrix.
The struct blockSparseMatrix has a field **dns or a "pointer to an array
of doubles representing the dense matrix". From a memory and efficiency
perspective, it makes sense for OpenMx to create, update and destroy the
memory for dense-matrices. Pointer to the dense matrix would be passed
to the backend at the beginning and would be retained throughout the
estimation process as a member of the blockSparseMatrix structure.
>From an OO perspective, having the main application write directly to a
struct-member seems to be problematic. However, I see no alternative to
doing so -- without copying huge amounts of data at each iteration.
>From OpenMx's design perspective and how you envision development of
other plug-ins, what approach would you recommend?
How does OpenMx deal with R? If I understand correctly, R insists that
all data be "copied" between R and external-packages, unless the access
is read-only. Is this what happens at each iteration?
The "Writing R Extensions"
With regards to the decision to keep the front-end as dense matrices, here is my perception of the advantages and disadvantages of this decision. Advantages: it is simpler with respect to interfacing to the front-end. No changes need to be made in R. Disadvantages: there are costs associated with storing dense matrices. There is a time cost associated with converting from dense to sparse at the beginning of optimisation, and sparse to dense at the end of the operation. There is also a memory cost associated with carrying around a huge matrix in the front-end. I believe the memory cost is greater than the time cost. OTOH, I think designing the system in this fashion is good for a first attempt. We should probably be using something akin to the Matrix package in the front-end anyhow, but we're not.
Log in or register to post comments
In reply to The "Writing R Extensions" by mspiegel
Some design hints on writing
Log in or register to post comments
In reply to Some design hints on writing by mspiegel
Will do. additional
additional questions/thoughts in my reply to Tim's response.
Log in or register to post comments
In reply to The "Writing R Extensions" by mspiegel
Sparse vs Dense matrices in
I think keeping the front-end matrices as dense along with the flexibility/functionality that openMx matrices makese sense. Probably much faster than sparse matrix operations for most common SEM problems.
Log in or register to post comments
R insists that its own data
If this is a situation where it is preferable to have directly manipulable data, it can be duplicated once at the beginning of the optimization routine, manipulated during each iteration of the optimizer, and then the duplicate destroyed at the end of the routine, all within the back-end. That way the time per iteration isn't increased.
The omxMatrix object at the back-end works this way, using R's memory directly whenever only read access is needed. It is worth noting that the R accessor functions do not guarantee that they will not duplicate data in order to pass it from R to C, though in general they do not.
The whole thing is complicated, of course, by the fact that BLAS tends to operate in-place, and therefore often modifies arguments during computation. As a result, there are many times when duplication of data is necessary at every iteration. When the computations overwrite the original values, though, I'm not sure that there's any way to avoid at least some copying.
Log in or register to post comments
In reply to R insists that its own data by tbrick
That was helpful. For any
For any direct interface with R, copying will generally be needed, unless the access is read-only. I suppose similar assumptions can be made for the omxModel object passed to the objectiveFunction.
Our sparse-matrix structures and routines will work with pointers to externally allocated dense matrices.
For our backend, we plan to construct our sparse-matrix objects at the first iteration - essentially grabbing pointers to the numeric parts of all omxMatrices. Then at every iteration, we will just need to update pointers to these matrices - if the locations have changed. If no reallocation has occured within OpenMX then our back-end structures will be automatically updated once openMx updates the model object for the objectiveFunction. Either of these options would be better than copying numeric values from omxMatrices to our back-end structures, which is not terribly worse.
Which of these three scenarios should we assume for our back-end?
Log in or register to post comments
In reply to That was helpful. For any by Paras
I would recommend
omxMatrix objects that are passed from the front-end are statically allocated, and will not change their pointer structure, so pointers to them can be used without worrying about changes. omxAlgebras, at present, are not guaranteed to maintain their data pointers, although they should only change if the data matrices grow between iterations, which is not a case I've seen yet.
If you maintain the omxMatrix* pointer to either an omxAlgebra or an omxMatrix, though, you can always access the current location of the data object (or of a specific element in the data object) from that. The function to do so is currently called double* omxLocationOfMatrixElement(omxMatrixObject om, int row, int col), but that is subject to change until the back-end API is finalized.
It's also worth noting that omxObjective objects in the back-end are set up through the call to an initialization function, which is called after all matrices and (hopefully) all dependent algebras have been generated. This is primarily intended to allow objective objects to unpack their arguments, link to their associated data objects, etc., but also provides the opportunity to grab the pointers you need and perform precalculations.
Log in or register to post comments
In reply to I would recommend by tbrick
That is what I was hoping.
Log in or register to post comments