# Degrees of freedom with raw data

Posted on

StavrosUt
Joined: 04/22/2010

I am an amateur in OpenMx. I am trying, for a start, to fit a one-factor model with OpenMx. I am willing to use the path version.

I use type=RAM, and mxData(...type="raw")

I have 1064 observations and 8 items.

I compare the summary() with that of factanal(). I get the same loadings and uniquenesses estimations with one weird problem: When I use "raw" and not "cov", I have a total number of observed statistics = 8512 (=1064*8) instead of the correct = 44. Several statistics are NA (chi-square, RMSA) but the estimates of the loadings and uniquenesses are the same (and means) !!!

I use the code found in the documentation as an example for: Factor Analysis, Path Specification.

Please help!!

Stavros

## You might find it helpful to

It was written for people who haven't got much background in R, Mx, or SEM and I am interested in feedback in how far it is useful.

The short answer is that when fitting a model to raw data, the degrees of freedom will reflect the sample size, being equivalent to the total number of observations, minus the estimated parameters.

Log in or register to post comments

In reply to You might find it helpful to by Dorothy Bishop

## Thank you very much for your

It still does not make sense, the statistics used in any case are the covariances and the means , why does it calculate the total observations? And of course this results in no calculation of Chi-square and what that follows (p-value, RMSEA etc).

I am trying to fit a model with thresholds and I came to a deadend.

mxThresholds does not work for path specification and for the matrix one, I keep getting the message that my thresholds ar not one less than my levels and they really are!

I am looking forward to any kind of response.

Log in or register to post comments

## This is not an error, but a

Most SEM software treats raw data like it is a covariance matrix. They take the raw data (which has n*p pieces of information or df, where p is the number of variables) and project it down into a covariance matrix (p(p+1)/2 df) and a vector of means (p df). This represents information loss, but the covariance matrix retains the linear relations in the data. In SEM, this is likely all you care about most of the time.

OpenMx doesn't do that pre-processing step, instead fitting your model to the raw directly. This reflects OpenMx' status as a general-purpose matrix optimizer rather than as software exclusively for SEM. As such, the data still have n*p df, which is a multivariate regression definition of df rather than an SEM definition. The fit stats you're referencing are undefined because the null model isn't estimated. Whereas the SEM definition of df leads to an obvious null model, the fully-saturated model with n*p df would be very large.

If you think of your data as a covariance matrix, then analyze it as a covariance matrix, and you'll have the df and fit statistics you're looking for.

Good luck!

Log in or register to post comments

In reply to This is not an error, but a by Ryne

## Thanks once again for the

I found in a thread that this can be done with path specification and mxThreshold command, but it does not work. (cant find function mxThreshold).

With matrix specification, I keep getting the error that the number of thresholds is not one less than the levels of my model. All variables in the model have a range from 1-7 (Totally disagree.... fully agree). I attach the code I use in case somebody could help.

Thanks for your time anyway

Log in or register to post comments

In reply to Thanks once again for the by StavrosUt

## Several comments on the

Log in or register to post comments

In reply to Several comments on the by mspiegel

## Thank you. I downloaded the

Error: In model 'THR One Factor' column 'SS227' is not an ordered factor. Use mxFactor() on this column.

(I get this after using mxFactor, SS227 is just my first variable)

If I use as.ordered, I get that my matrix is not of type 'double'.

Log in or register to post comments

In reply to Thank you. I downloaded the by StavrosUt

## It's difficult to debug

Log in or register to post comments

In reply to It's difficult to debug by mspiegel

## I think I do it exactly as

Neg.Self91 is my data sets consisted of 8 items with levels 1 - 7, originally being numeric values, with no NA.

Thanks

Log in or register to post comments

In reply to I think I do it exactly as by StavrosUt

## Ah. I think it's not working

~~I'll add some checking to mxFactor() that will issue an error on this condition.~~Nevermind, the right-hand side of the second expression is an ordered factor, but the left-hand side is not. Instead I'll throw an error if thresholds are specified and the input data is not a data.frame object.`foo <- matrix(1:9, 3, 3)`

foo[,1] <- mxFactor(foo[,1], levels = c(1:3))

Log in or register to post comments

In reply to Ah. I think it's not working by mspiegel

## Thank you so much! Turning my

Log in or register to post comments

In reply to Thank you so much! Turning my by StavrosUt

## There are 48 free parameters

I just finished a problem with an 8x4 matrix of thresholds and 200 obs which took 15 minutes on my Macbook Pro.

Log in or register to post comments

In reply to There are 48 free parameters by Steve

## Yes, many are awaiting

One thought I had for a temporary solution was to use frequency weights in a definition variable. Suppose that one or two patterns of data were relatively common (all zeroes for example). In this case, using a dataset with a data pattern and then a frequency to indicate the number of observations with this pattern might yield a big performance improvement. It would be necessary to use the vector=TRUE argument to mxFIMLObjective(), as used in this script: http://openmx.psyc.virginia.edu/repoview/1/trunk/models/passing/Acemix2.R

in which probabilities of zygosity are used. In our case there would only be the one "model", but the frequency would be applied *after* the logarithm part, so in place of

mxAlgebra(-2*sum(log(pMZ * MZlike.objective + (unit-pMZ) * DZlike.objective)), name="twin"),

we'd use something like

mxAlgebra(-2*sum(frequency * log(MZlike.objective)), name="whatever"),

where frequency is the MxMatrix with the definition variable frequency as its sole element. This approach would avoid calculating the same integral multiple times for each set of parameter estimates.

Log in or register to post comments

## same model, different indicators, different df?

I have also a question on the df in the summary output. I have specified an model (one-factor CFA models) with a background variable that have moderation functions on factor loadings, intercepts and residual variances on each five indicators. When I fit this model to different data, the df have a different number in the summary output. Is this normal? I thought the df are the difference between the number of given information and number of parameter estimates that have to be estimated. Therefore they should be identical or do I miss something? Any help would be appreciated!

Log in or register to post comments

In reply to same model, different indicators, different df? by marijal

## df and statistics

With raw data, OpenMx considers each non-missing value for all of the variables in the analysis to be its own statistic. This is because there may be individual-specific parameters for such data - definition variables make this possible. They also make it possible to have more parameters than there are 'observed' covariances and means. To avoid weirdness such as negative df, we count the statistics differently.

Another thing to think about with raw data that may be missing is that counting df as m(m+1)/2 covariances and m means can also be wrong. For example, suppose we have two variables, X and Y, two columns of data, so you might think 3 observed covariance statistics and two means, a good assumption in most circumstances. However, if Y is missing every time X is present, and if X is missing every time Y is present, then there is no information about the covariance between X and Y, so the m(m+1)/2 formula is wrong in that there is no information on the covariance between X and Y.

Finally note that the df for comparing models with submodels is correct despite the apparently large number of statistics from counting raw data observations this most inclusive way.

Log in or register to post comments

In reply to df and statistics by AdminNeale

## df and statistics

Log in or register to post comments

In reply to df and statistics by AdminNeale

## df and statistics

Thank you again for your reply. I have another question on that topic: In the summary output, there are no statistics on significance for the model parameters. Usually, I could calculate them this way:

1. Calucalting the t-value by the following devision: parameter estimate/standard error of the parameter estimate

2. Determining the dfs

3. Determining the p-value

Would it be right to use the dfs given in the output? If I understand you correctly, I should do that.

Log in or register to post comments

In reply to df and statistics by marijal

## Asymptotically ok

The step 1, calculate t-value is all you really need to determine if the parameter is significantly different from zero, or different from some other value, e.g., 1 (this happens in SEM, not so much in regression). It is a one-liner in R to get the list of estimates and the list of SEs from the fitted model object or, easier still, the summary of it. Suppose the model is called ACFit, and you execute

`ACSum <- summary(ACFit)`

. Thenpt(ACSum$parameters$Estimate/ ACSum$parameters$Std.Error,df=1,lower.tail=F)

would give you the p-values associated with the t-values calculated against zero. We tend to leave these subsequent calculations to the user, because testing hypotheses other than this parameter equals zero becomes straightforward. Note that df=1 for all these comparisons.

In more complicated scenarios, such as small N or working with correlations, one may want to use likelihood-based confidence intervals, which can be asymmetric, unlike CI's calculated from estimate +/-1.96*SE. The idea is to make the SE correspond to a symmetric interval around the Fisher's Z transformed correlation, which gets asymmetric the closer the estimate is to 1 or -1.

HTH

Log in or register to post comments

In reply to Asymptotically ok by AdminNeale

## Thank you but

Log in or register to post comments

## Confidence Intervals

`mxCI()`

function which can be used within an`MxModel`

. See the help page. OpenMx also has Wald type confidence intervals based on the standard errors via the`confint()`

function when applied to an`MxModel`

. For example,`confint(yourFittedModel) # gives CIs for all parameters`

Incidentally, the Wald type confidence intervals assume an asymptotic *normal* distribution not a t-distribution, so the degrees of freedom are irrelevant.

Log in or register to post comments

In reply to Confidence Intervals by mhunter

## Thank you!

params <- summary(fitOpenmxPartial)$parameters

# Define the desired confidence level

confidenceLevel <- 0.95

# Calculate critical value from the desired confidence level

criticalValue <- qnorm((1 + confidenceLevel) / 2)

# Calculate confidence intervals for each parameter

params$CI_lower <- params$Estimate - criticalValue * params$Std.Error

params$CI_upper <- params$Estimate + criticalValue * params$Std.Error

`params`

Your version is more convenient. Thank you again!

Log in or register to post comments