You are here

Revision of Ideas for simplified manual for beginners (SMB) from Tue, 03/02/2010 - 11:26

Revisions allow you to track differences between multiple versions of your content, and revert back to older versions.

Some thoughts on how to structure such a manual:

  1. Getting started in R - to be written
    To include: Explain how to get R loaded, set up directory, brief explanation of console vs editor screens, how to escape if program in loop, what the little + signs in left hand column mean, how to comment; Most could be taken from any introduction to R; we can refer to this, but nevertheless, I think a couple of paragraphs of the essentials should be included in SMB

  2. Generating simulated data in R - see below
    Based on Optimization script p 11, but heavily commented.
    User-defined variables to have names such as mydata, mylabel.
    This would avoid difficulties for beginners in working out what bits of a script are variables and what are functions.
    Also revised script shows how to compute covariances, SDs etc in R, how to plot scatterplot of data, and how to write data to a file and read it back.

  3. Matrix operations in R - to be written
    This could be lifted from existing sources. Should be kept pretty simple at this stage, but need to explain basic operations of matrix multiplication, transpose, inverse,etc. An appendix similar to that in Mx Manual, with exercises for doing matrix operations, would be useful.

  4. Introduction to optimization and maximum likelihood - partly done - but got stuck: see below!
    First encounter with OpenMx, in very simplified analysis based on script on p 12-14, but computing likelihoods for fixed values of correlation from .1 to .9, so these can be plotted.
    User encouraged to repeat analysis with different sample size for simulated data, to show how this affects likelihood estimation. Although this introduces user to OpenMx, main aim here is to get them to understand likelihood estimation.

  5. Introduction to SEM using RAM path method - to be done
    Could be based on material on p 4, but with more explanation (based on excellent account in Mx manual), and I think better to have example where one-factor vs two-factor models are compared, and user encouraged to modify data to see how this affects likelihood estimation.

  6. Introduction to SEM approach to analysis of twin data: univariate ACE model. - to be done
    Should be able to use material from Mx manual here. Also worth looking at good didactic material on Shaun Purcell's website. Include path tracing rules to explain how formulae arrived at.

  7. Run ACE model in OpenMx - to be done
    Include how to read in data that start in SPSS or xls format. Plot scatterplots for MZ and DZ pairs.

Here is my simplified version of script re point 2. NB I am totally new to R and so would welcome suggestions for improving this.


Modified version of generate correlated data Script for real beginners (based on script on p 11)


Simulate data on X and Y from 50 cases, with correlation of .5 between them

(NB using smaller N than in original example, so user can see mismatch between obtained and predicted)

require(MASS) # MASS is a R package that you need for generating multivariate normal data

set.seed(2) # A seed is a value used in creating random numbers;
# You don't need to understand how it works
# Keep the seed the same if you want the same random numbers every time you run
# Change the seed to any other number to run the script and get different random numbers

rs=.5 # Correlation between variables
mydata=mvrnorm(50, # Create a matrix of multivariate random normal deviates, called mydata, and specify number of XY pairs to generate
c(0,0), # Mean values for X and Y
# NB. In R, c denotes concatenate
matrix(c(1,rs,rs,1),2,2)) # Covariance matrix to be simulated, 2 rows, 2 columns, i.e.
# [1 .5
# .5 1]

mylabels=c('X','Y') # Put labels for the two variables in a vector
dimnames(mydata)=list(NULL,mylabels) # Allocate the labels to mydata (our created dataset)
# Just accept that NULL is needed here: too complicated to explain
summary(mydata) # Print means for mydata
cov(mydata) # Print covariance for mydata
cor(mydata) # Print correlation for mydata
print('Note that actual values may differ from specified value of .5, especially with small sample size')
sd(mydata) # Print SD for mydata
print('Note that the correlation = covariance/(SD_X * SD_Y)')

plot(mydata) # Plots a scatterplot of X vs Y
write.table(mydata,"myfile") # Saves a copy of mydata in your R directory

If you want to re-read your data another time, you can use a command such as

newdata=read.table("myfile"); this will create a matrix called newdata containing the data


Script for point 4: attempt to write script to show different log likelihood for simulated correlated data, relative to different levels of covariance. Stuck because can't get value of expected covariance to cycle through values as desired. Can someone advise as to what I am doing wrong?


# Modified version of 1.2.2 Optimization Script for real beginners

Based on OpenMxUserGuide, p 13-14.

But with expCov values Fixed. - Idea is to step through different values, but can't work out

how to do this.

require(OpenMx) # This step needed because we are going to use OpenMx routines

bivCorModel=mxModel("Biv corr", #Name given by user to the model


create a matrix called expMean with values [0 0 corresponding to expected means


           mxMatrix(type="Full",     # for a Full matrix, all cells are filled
                    free=TRUE,       # if free=TRUE, then values are estimated, rather than fixed at start values
                    values=c(0,0),   # starting values for matrix
                    name="expMean"), #create a matrix called expMean with values [0 0]
                                     #corresponding to expected means


create a 2 x 2 matrix called expCov with values [1 mycor mycor 1] corresponding to expected covariances


           mxMatrix(type="Full",nrow=2,ncol=2,free=FALSE,values=c(1, mycor,mycor,1),name="temp"),
           mxAlgebra(expression=temp, name="expCov"),  # here just reassign temp to expCov
                 type="raw"),                                    # use the raw data created in our simulation
           mxFIMLObjective(                                      # Function used to obtain likelihood of data
               covariance="expCov",                              # FIML stands for full information maximum likelihood
               means="expMean",                                  # FIML needs to be told which values to use for covariance and means



End of model specification section; this just sets up the model, it does not run it


myLL=matrix(0,nrow=9,ncol=4) # set up a matrix with 10 rows to hold a) correlation; b) LL values; c) AIC, d)BIC
for (myx in c(1:9))
bivCorFit=mxRun(bivCorModel) # Output of running the model is written to bivCovFit
EM = bivCorFit[['expMean']]@values
EC = bivCorFit[['expCov']]
myLL[myx,2] = mxEval(objective,bivCorFit); #Log likelihood

summary(bivCorFit) #How do I access AIC and BIC values, to add to myLL table?