I am analyzing data from an ambulatory assessment study and I want to specify a latent first-order autoregressive model. Because the time points are randomly selected for each individual, the data has an unbalanced structure: The occasions are not equally spaced and the time lags differ between individuals and between different occasions.

I think that in order to specify such a model the autoregressive parameter has to have the individual time lag in the exponent: beta^time-lag(individual).

Could such a model be specified in OpenMx or is there another way to solve this problem?

Yes, you can do this with OpenMx.

Take a look at the documentation for "definition variables". These allow you to substitute a value in for each row. If you set your data up so that each occasion/person is on a row, the definition variable can act as a multiplier for your autoregressive coefficient. You might also want to look at Han Oud's work on continuous time autoregressive models, since elapsed time has an exponential relationship to the autoregressive parameter. This too can be accomplished in OpenMx with an mxAlgebra statement. As far as I know, there are no step-by-step instructions for this type of model yet. An opportunity for a methodology article, though!

I have a Latent Curve Model (LCM) defined to describe change in Y over age, but Y has been collected not as a function of age but with respect to occasions of measurement. At each occasions participants have different ages. By using the definition variables it is possible to set up this model. To be clear, this is equivalent to the TSCORES keyword in M+.

However, as soon as participants do not have complete occasion-specific data the relevant definition variable has missing values. Currently this does not appear to be handled by OpenMx.

Imagine these data:

id Y1 Y2 Y3 A1 A2 A3

1 10 12 14 20 21 22

2 12 16 . 34 35 .

3 18 . . 40 . .

By defining the age variables A1, A2, A3 as the definition variables it is possible to set the LCM up such that the slope loadings are fixed at those values, hence describing change over age and not occasions. This would work fine if all participants had complete data (with respect to occasions).

However, because definition variables cannot have missing values it seems to me that at present the only way to include all 3 patterns of missing (represented by the 3 participants) one needs defining a multiple-group model with equality constraints, in which each pattern of missing represents a different group. For participant 1 there would be 3 definition variables (A1, A2, A3), for participant 2 there would be 2 (A1, A2), and finally just A1 for partcipant 1.

Is there another, more direct solution to this?

Hi Paolo!

I had the same problem and I'd like to offer my solution. After all, the model I am running is specified according to your initial design!

I set all definition variables that are missing to an arbitrary value, e.g., to "1". Of course, you need to make sure that the patterns of missingness are really identical in the definition variables and the observed data. If I got my math right, the fact that a certain observation is missing renders the associated value of the definition variable meaningless within in the evaluation of the fit function. Therefore, the missing definition variable can be replaced by any value.

best,

Andreas

Hi Andreas,

thanks, your tip works beautifully! Simple and logical!

I checked the OpenMx solution of the LCM with definition variables with the lme4 solution (under equivalent LCM) conditions, and the two matched perfectly!

Funny, though, the M+ solution (with the TSCORES option) was somewhat different from either, even after fiddling with various estimation criteria... Given the robustness of the lme4 package I know which SEM software to trust for this kind of analysis!

The trick described is the one we have used in classic Mx for years. I recommend using an arbitrary large value like 999999 instead of 1 for missing definition variables where their missingness is not supposed to have any effect on the model. If there is an accidental attribution of the missing value to a case where the definition variable is actually in use, then the large code will likely cause the model to blow up and the problem can be traced and corrected.

Do you have any pointers for using OpenMx to model one time series with many observations? (Observations every day for several hundred days, equally-spaced time points and no missing values.) I need to create a time series model with autoregressive and/or moving average terms where the response is latent (with three indicators at each time point) and I have several explanatory variables at each time point (some simply observed, and others as indicators for other latent variables). A very nice article by du Toit and Browne (Structural equation modeling of multivariate time series) shows path diagrams so I know how to describe the model. But expressing it in OpenMx is a bit puzzling to me. Would all the variables and time points have to be in one very long row in a data.frame handled by mxData?

BTW, Oud's "Continuous time modeling of the cross-lagged panel design (2002)" can be found using this link:

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&sqi=2&ved=0CDYQFjAC&url=http%3A%2F%2Fmembers.chello.nl%2Fj.oud7%2FOud2002.pdf&ei=VAd-T4W9BIL3gAfnrKGNDg&usg=AFQjCNFtbekh92vmC0kA64YOVEImSegqoQ&sig2=xOa8OepMbCbUswIDtgYUtA

Fitting time series models at the manifest or latent level with standard SEM methods often involves a fair amount of cleverness, trickery, and violation of assumptions. Ultimately, we'd like to have special objective/expectation/fit functions for these kinds of models that obviate the need for these complicated model specifications. Currently, I think every SEM program requires some tinkering.

I'm not sure if the method you described would work. It may, but I haven't seen it before.

What I have seen uses a block Toeplitz lagged covariance matrix as data. I've attached an R script that shows a general example of how to do this. The idea is to create a covariance matrix for lag 0 (a standard covariance matrix), 1, 2, and so on up to some "large enough" lag that you decide. You then construct the total covariance matrix that you'll use in the model from these blocks.

The model that you specify then describes relationships at lag 0, 1, 2, and so on.

Relating to the example script attached, you might have variables x, y, and z, and be interested in up to lag 5. You'd create a covariance matrix with names x_t0, y_t0, z_t0, x_t1, y_t1, z_t1, x_t2, y_t2, z_t2, x_t3, y_t3, z_t3, x_t4, y_t4, and z_t4. You might then create a factor for x_t0, y_t0, z_t0, and another for x_t1, y_t1, z_t1 and so on. But you probably want to use the same labels for the parameters associated with each factor, constraining them to be the same. Finally, you'll need to add appropriate predictions from one factor and factor residual to another to represent the autoregressive and moving average components.

We should probably have a demo that estimates this kind of model and discusses various aspects of the technique.

Hope this can get you started!

I created an OpenMx script that computes the autoregressive coefficient for a simple 1-lag autoregressive model. It works with 20 time points (seconds to minutes) up to 100 time points (2 hours). When I try for 200 time points it runs for days and I've never waited longer to see if it ever completes. Obvioiusly, this brute force approach won't work.

I've attached a version that has n=30 time points. This runs in about 52 seconds. In this model (adapted from the du Toit article), the time series history (labeled "x") before the first observation is treated as a latent factor. (It may be possible to eliminate this part.) Here are the confidence intervals for phi=0.4 (labeled "alpha" in the model):

> round(summary(ar.ts.fit)$CI,2)

lbound estimate ubound

alpha 0.09 0.45 0.80

m_x -171248.97 -0.19 149367.61

AR Model.s_s[1,1] 0.10 0.10 23646.70

m_u -0.29 0.05 0.39

AR Model.s_u[1,1] 0.70 0.89 1.17

"u" is the error process with mean 0 and sd of 1. You can see that alpha is estimated to be 0.45 (0.09 to 0.80). The estimate is reasonable. The confidence interval is wide due to the small amount of data. (The results for the latent x are not good but given the sample size not unexpected.) The results are much better with 100 time points but it takes 2 hours to run and I need to be able to handle multiple time series of 400 to 500 time points at minumum.

I will study your approach. Thanks.