Two Model Styles - Two Data Styles

In this first detailed example, we introduce the different styles available to specify models and data, which were briefly discussed in the ‘Beginners Guide’. There are two main approaches to specifying models: (i) paths specification and (ii) matrix specification. We will go through all the examples in both approaches, so you can choose which fits your style better, or check them both out to get a sense of their advantage/disadvantages. The ‘path specification’ model style translates path diagrams into OpenMx code; the ‘matrix specification’ model style relies on matrices and matrix algebra to produce OpenMx code. For each of the two approaches, the data may come in (a) summary format, i.e. covariance matrices and possibly means, or (b) raw data format. We will illustrate both, as arguments of functions may differ. Thus, we will here describe the same example four different ways:

  • i.a Path Specification - Covariance Matrices
  • i.b Path Specification - Raw Data
  • ii.a Matrix Specification - Covariance Matrices
  • ii.b Matrix Specification - Raw Data

Our first example is fitting a simple model to one variable to estimate its mean and variance. This is also referred to as fitting a saturated model. We start with a univariate example, and also work through a bivariate example which differs in minor ways from the univariate one, as it forms the basis for later examples.

Univariate Saturated Model

The four versions of univariate example are available in the following files:

The last file includes all four example in one. The bivariate examples are available in the following files:

Note that we have additional version of the matrix-style examples which use a Cholesky decomposition to estimate the expected covariance matrices, which is preferred to directly estimation the symmetric matrices.

Data

To avoid reading in data from an external file, we simulate a simple dataset directly in R, and use some of its great capabilities. As this is not an R manual, we just provide the code here with minimal explanation.

#Simulate Data
set.seed(100)
x <- rnorm (1000, 0, 1)
testData <- as.matrix(x)
selVars <- c("X")
dimnames(testData) <- list(NULL, selVars)
summary(testData)
mean(testData)
var(testData)

The first line is a comment (starting with a #). We set a seed for the simulation so that we generate the same data each time and get a reproducible answer. We then create a variable x for 1000 subjects, with mean of 0 and a variance of 1, using R’s normal distribution function rnorm. We read the data in as a matrix into an object testData and give the variable a name "X" using the dimnames command. We can easily produce some descriptive statistics in R using built-in functions summary, mean and var, just to make sure the data look like what we expect. We also define selVars here with the names of the variable(s) to be analyzed.

Covariance Matrices and Path-style Input

Model Specification

The model estimates the mean and the variance of the variable X. We call this model saturated because there is a free parameter corresponding to each and every observed statistic. Here we have covariance matrix input only, so we can estimate one variance. Below is the path diagram and the complete script:

_images/UnivariateSaturatedModel.png
#example 1: Saturated Model with Cov Matrices and Path-Style Input
univSatModel1 <- mxModel("univSat1",
    manifestVars= selVars,
    mxPath(
        from=c("X"),
        arrows=2,
        free=T,
        values=1,
        lbound=.01,
        labels="vX"
    ),
    mxData(
        observed=var(testData),
        type="cov",
        numObs=1000
    ),
    type="RAM"
)

We will discuss each of the commands separately and repeat the relevant bits of the OpenMx code. We use the mxModel command to specify the model. Its first argument is a name. All arguments are separated by commas.

univSatModel1 <- mxModel("univSat1",

When using the path specification, it is easiest to have a matching path diagram. Assuming you are familiar with path analysis (for those who are not, there are several excellent introductions, see refs), we have a box for the observed/manifest variable x, specified with the manifestVars argument, and one double arrow on the box to represent its variance, specified with the mxPath command. The mxPath command indicates where the path originates: from= and where it ends: to. If the to= argument is omitted, the path ends at the same variable where it started. The arrows argument distinguishes one-head arrows (if arrows=1) or two-headed arrows (if arrows=2). The free command is used to specify which elements are free or fixed with a TRUE or FALSE option. If the mxPath command creates more than one path, a single T implies that all paths created here are free. If some of the paths are free and others fixed, a list is expected. The same applies for values command which is used to assign starting values or fixed final values, depending on the corresponding ‘free’ status. Optionally, lower and upper bounds can be specified (using lbound and ubound, again generally for all the paths or specifically for each path). Labels can also be assigned using the labels command which expects as many labels (in quotes) as there are elements.

manifestVars=selVars ,

mxPath(
    from=c("X"),
    arrows=2,
    free=T,
    values=1,
    lbound=.01,
    labels="vX"
),

We specify which data the model is fitted to with the mxData command. Its first argument, observed=, reads in the data from an R matrix or data.frame, with the type= given in the second argument. Given we read a covariance matrix here, we use the var() function (as there is no covariance for a single variable). When summary statistics are used as input, the number of observations (numObs=) needs to be supplied.

mxData(
    observed=var(testData),
    type="cov",
    numObs=1000
),

With the path specification, the ‘RAM’ objective function is used by default, as indicated by the type argument. Internally, OpenMx translates the paths into RAM notation in the form of the matrices A, S, and F [see refs].

    type="RAM"
)

Model Fitting

So far, we have specified the model, but nothing has been evaluated. We have ‘saved’ the specification in the object univSatModel1. This object is evaluated when we invoke the mxRun command with the object as its argument.

univSatFit1 <- mxRun(univSatModel1)

There are a variety of ways to generate output. We will promote the use of the mxEval command, which takes two arguments: an expression and a model name. The expression can be a matrix or algebra name defined in the model, new calculations using any of these matrices/algebras, the objective function, etc. We can then use any regular R function to generate derived fit statistics, some of which will be built in as standard. When fitting to covariance matrices, the saturated likelihood can be easily obtained and subtracted from the likelihood of the data to obtain a Chi-square goodness-of-fit.

EC1 <- mxEval(S, univSatFit1)   #univSatFit1[['S']]@values
LL1 <- mxEval(objective, univSatFit1)
SL1 <- univSatFit1@output$other$Saturated
Chi1 <- LL1-SL1

The output of these objects like as follows:

> EC1
         [,1]
[1,] 1.062112
> LL1
         [,1]
[1,] 1.060259
> SL1
[1] 1.060259
> Chi1
             [,1]
[1,] 2.220446e-16

In addition to providing a covariance matrix as input data, we could add a means vector. As this requires a few minor changes, lets highlight those. We have one additional mxPath command for the means. In the path diagram, the means are specified by a triangle which as a fixed value of one, reflected in the from="one" argument, with the to= argument referring to the variable which mean is estimated.

mxPath(
    from="one",
    to="X",
    arrows=1,
    free=T,
    values=0,
    labels="mX"
),

The other required change is in the mxData command, which now takes a fourth argument means for the vector of observed means from the data calculated using the R mean command.

mxData(
    observed=matrix(var(testData),1,1),
    type="cov",
    numObs=1000,
    means=mean(testData)
),

When a mean vector is supplied and a parameter added for the estimated mean, the RAM matrices A, S and F are augmented with an M matrix which can be referred to in the output in a similar was as the expected variance before.

EM1m <- mxEval(M, univSatFit1m)

Raw Data and Path-style Input

Instead of fitting models to summary statistics, it is now popular to fit models directly to the raw data and using full information maximum likelihood (FIML). Doing so requires specifying not only a model for the covariances, but also one for the means, just as in the case of fitting to covariance matrices and mean vectors, described above.

The only change required is in the mxData command, which now takes either an R matrix or a data.frame with the observed data as first argument, and the type="raw" as the second argument.

mxData(
    observed=testData,
    type="raw"
)

A nice feature of OpenMx is that an existing model can be modified in any respect. So to change the above univSatModel1 can be effected this way:

univRawModel1 <- mxModel(univSatModel1,
        mxData(
    observed=testData,
    type="raw"
    )
)

This model can be run as usual with an mxRun command:

univRawFit1 <- mxRun(univSatModel1)

Note that the output generated from this model now includes the expected mean, the expected covariance matrix and -2 times the log-likelihood of the data.

> EM2
           [,1]
[1,] 0.01680498
> EC2
         [,1]
[1,] 1.061049
> LL2
         [,1]
[1,] 2897.135

Covariance Matrices and Matrix-style Input

We now specify essentially the same models with matrices. Starting with the model fitted to the summary covariance matrix, we need a specify one matrix for the expected covariance matrix. We use the mxMatrix command for this. The first argument is its type, which is symmetric for a covariance matrix. The second and third arguments are the number of rows (nrow) and columns (ncol). The free and values command work in the same way as in the path specification. If only one element is given, it is applied to all the elements in the matrix. Alternatively, each element can be assigned its free/fixed status and starting value with a list command. Note that in the current example, the matrix is a simple 1x1 matrix, but that will change rapidly in the following examples. The code to specify the model includes four commands, (i) mxModel, (ii) mxMatrix, (iii) mxData and (iv) mxMLObjective. The mxData is the same for paths and matrices specifications. A different objective function is used, namely the mxMLObjective command which takes two arguments, the covariance or the expected covariance matrix, which we specified in an mxMatrix command as expCov, and dimnames which allow the mapping of the observed data to the expected covariance matrix, i.e. the model.

univSatModel3 <- mxModel("univSat3",
        mxMatrix(
                type="Symm",
                nrow=1,
                ncol=1,
                free=T,
                values=1,
                name="expCov"
        ),
        mxData(
                observed=var(testData),
                type="cov",
                numObs=1000
        ),
        mxMLObjective(
                covariance="expCov",
                dimnames=selVars
        ),
)

univSatFit3 <- mxRun(univSatModel3)

A means vector can also be added here as part of the input summary statistics (as the fourth argument of the mxData command). In that case, a second mxMatrix command is used to specify the expected mean vector, which is of type='Full', has 1 row and 1 column, is assigned free=T with start value 0, and the name expMean. The second change is an additional argument mean to the mxMLObjective function for the expected mean, here expMean.

....
        mxMatrix(
            type="Full",
            nrow=1,
            ncol=1,
            free=T,
            values=0,
            name="expMean"
        ),
        mxData(
            observed=var(testData),
            type="cov",
            numObs=1000,
            means=mean(testData)
        ),
        mxMLObjective(
            covariance="expCov",
            means="expMean",
            dimnames=selVars
                )
)

Raw Data and Matrix-style Input

Finally, if we want to use the matrix specification with raw data, we again specify two matrices using the mxMatrix command, one for the expected covariance matrix and one for the expected mean vector, in the same way as before. The mxData command directly read the raw data from a matrix or data.frame and the mxFIMLObjective command is used to evaluate the likelihood of the data using FIML. This function takes three arguments, one for the expected covariance matrix, covariance, one for the expected mean, means, and one for the dimnames.

univSatModel4 <- mxModel("univSat4",
        mxMatrix(
                type="Symm",
                nrow=1,
                ncol=1,
                free=T,
                values=1,
                name="expCov"
        ),
        mxMatrix(
                type="Full",
                nrow=1,
                ncol=1,
                free=T,
                values=0,
                name="expMean"
        ),
        mxData(
                observed=testData,
                type="raw"
        ),
        mxFIMLObjective(
                covariance="expCov",
                means="expMean",
                dimnames=selVars
        )
)

Note that the output generated for the paths and matrices specification are completely equivalent.

Bivariate Saturated Model

Rarely will we analyze a single variable. As soon as a second variable is added, not only can we then estimate two means and two variances, but also a covariance between the two variables, as shown in the following path diagram:

_images/BivariateSaturatedModel.png

The path diagram for our bivariate example includes two boxes for the observed variables ‘X’ and ‘Y’, each with a two-headed arrow for the variance of each variables. We also estimate a covariance between the two variables with the two-headed arrow connecting the two boxes. The optional means are represented as single-headed arrows from a triangle to the two boxes.

Data

The data used for the example were generated using the multivariate normal function (mvrnorm in the R package MASS). We have simulated data on two variables named ‘X’ and ‘Y’ with means of zero, variances of one and a covariance of .5 using the following R code, and saved is as testData. Note that we can now use the R function cov to generate the observed covariance matrix.

#Simulate Data
require(MASS)
set.seed(200)
rs=.5
xy <- mvrnorm (1000, c(0,0), matrix(c(1,rs,rs,1),2,2))
testData <- xy
selVars <- c('X','Y')
dimnames(testData) <- list(NULL, selVars)
summary(testData)
cov(testData)

Model Specification

The mxPath commands look as follows. The first one specifies two-headed arrows from X and Y to themselves. This command now generates two free parameters, each with start value of 1 and lower bound of .01, but with a different label indicating that these are separate free parameters. Note that we could test whether the variances are equal by specifying a model with the same label for the two variances and comparing it with the current one. The second mxPath command specifies a two-headed arrow from X to Y, which is also assigned ‘free’ and given a start value of .2 and a label.

....
mxPath(
    from=c("X", "Y"),
    arrows=2,
    free=T,
    values=1,
    lbound=.01,
    labels=c("varX","varY")
)
mxPath(
    from="X",
    to="Y",
    arrows=2,
    free=T,
    values=.2,
    lbound=.01,
    labels="covXY"
)

When observed means are included in addition to the observed covariance matrix, we add an mxPath command with single-headed arrows from one to the variables to represent the two means.

....
mxPath(
    from="one",
    to=c("X", "Y"),
    arrows=1,
    free=T,
    values=.01,
    labels=c("meanX","meanY")
)

Changes for fitting to raw data just require the mxData command to read in the data directly with type=raw.

Using matrices instead of paths, our mxMatrix command for the expected covariance matrix now specifies a 2x2 matrix with all elements free. Start values have to be given only for the unique elements (diagonal elements plus upper or lower diagonal elements), in this case we provide a list with values of 1 for the variances and .5 for the covariance

....
mxMatrix(
    type="Symm",
    nrow=2,
    ncol=2,
    free=T,
    values=c(1,.5,1),
    name="expCov"
)

The optional expected means command specifies a 1x2 row vector with two free parameters, each given a 0 start value.

....
mxMatrix(
    type="Full",
    nrow=1,
    ncol=2,
    free=T,
    values=c(0,0),
    name="expMean"
)

Combining these two mxMatrix commands with the raw data, specified in the mxData command and the mxFIMLObjective command with the appropriate arguments is all that’s need to fit a saturated bivariate model. So far, we have specified the expected covariance matrix directly as a symmetric matrix. However, this may cause optimization problems as the matrix could become not positive-definite which would prevent the likelihood to be evaluated. To overcome this problem, we can use a Cholesky decomposition of the expected covariance matrix instead, by multiplying a lower triangular matrix with its transpose. To obtain this, we use a mxMatrix command and specify type="Lower". We then use an mxAlgebra command to multiply this matrix, named Chol with its transpose (R function t()).

....
mxMatrix(
    type="Lower",
    nrow=2,
    ncol=2,
    free=T,
    values=.5,
    name="Chol"
)
mxAlgebra(
    Chol %*% t(Chol),
    name="expCov",
)

The following sections will describe OpenMx examples in detail, first in path specification and second in matrix specification. Insofar as relevant, we will also present detailed code when using covariance matrices versus raw data input. So far, we have implemented examples of regression analysis, factor analysis, time series analysis, multiple group analysis, twin analysis and definition variable analysis. We intend to add examples as they are implemented in OpenMx.