You are here

help with SEM

9 posts / 0 new
Last post
konval's picture
Offline
Joined: 04/04/2016 - 21:08
help with SEM
AttachmentSize
Microsoft Office document icon Data description.doc15.89 KB

Dear all,

I am very new to both OpenMx and structural equation modelling. I am trying to analyse data from educational research (see attached file for description). I have tried some examples from OpenMx but couldn’t figure out how to run SEM on my own data. I would appreciate greatly if I can get help on how to run factor SEM on my data. I have 3 factors (class, subject and student) and 12 measurements that are binomial (0 or 1). Thanks in advance

konval

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
Could you say a little bit

Could you say a little bit more about what you're trying to accomplish in your SEM analysis?

konval's picture
Offline
Joined: 04/04/2016 - 21:08
help with SEM

Sorry, I should’ve given more explanations regarding my problem.
The aim is to identify factors influencing student’s activity. For example is the subject(History, Biology, etc.) a significant factor influencing the student’s activity?
I ran one factor model with the following:
manifests<-names(myData_a[1:12])
latents<-c("F1")
factorModel <- mxModel("One Factor",
type="RAM",
manifestVars=manifests,
latentVars=latents,
mxPath(from=latents, to=manifests),
mxPath(from="F1", to=c("a_1","a_2","a_3","a_4","a_5","a_6","a_7","a_8","a_9","a_10","a_11","a_12"),
arrows=1,
free=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE))
mxPath(from=manifests, arrows=2),
mxPath(from=latents, arrows=2, free=FALSE, values=1.0),
mxData(observed=cov(myData_a), type="cov", numObs=296))
summary(factorRun <- mxRun(factorModel))

The results are:
Summary of One Factor

free parameters:
name matrix row col Estimate Std.Error A
1 One Factor.A[1,13] A a_1 F1 0.19138966 0.026990833
2 One Factor.A[2,13] A a_2 F1 0.24194364 0.024762326
3 One Factor.A[3,13] A a_3 F1 0.14715512 0.032381955
4 One Factor.A[4,13] A a_4 F1 0.21847060 0.026185996
5 One Factor.A[5,13] A a_5 F1 0.15000079 0.025090427
6 One Factor.A[6,13] A a_6 F1 0.06425752 0.034086120
7 One Factor.A[7,13] A a_7 F1 0.14055586 0.032496649
8 One Factor.A[8,13] A a_8 F1 0.19709460 0.021677667
9 One Factor.A[9,13] A a_9 F1 0.16027545 0.023737734
10 One Factor.A[10,13] A a_10 F1 0.08455701 0.033841160
11 One Factor.A[11,13] A a_11 F1 0.15396803 0.026376935
12 One Factor.A[12,13] A a_12 F1 0.19200881 0.028753720
13 One Factor.S[1,1] S a_1 a_1 0.13662603 0.012496842
14 One Factor.S[2,2] S a_2 a_2 0.09485988 0.010304504
15 One Factor.S[3,3] S a_3 a_3 0.21515154 0.018370394
16 One Factor.S[4,4] S a_4 a_4 0.11980876 0.011628976
17 One Factor.S[5,5] S a_5 a_5 0.11788509 0.010549488
18 One Factor.S[6,6] S a_6 a_6 0.24576808 0.020334279
19 One Factor.S[7,7] S a_7 a_7 0.21927574 0.018632046
20 One Factor.S[8,8] S a_8 a_8 0.07802738 0.007937209
21 One Factor.S[9,9] S a_9 a_9 0.10788329 0.009727832
22 One Factor.S[10,10] S a_10 a_10 0.23915225 0.019893368
23 One Factor.S[11,11] S a_11 a_11 0.13384475 0.011846589
24 One Factor.S[12,12] S a_12 a_12 0.15720670 0.014191727

observed statistics: 78
estimated parameters: 24
degrees of freedom: 54
fit value ( -2lnL units ): -2895.402
saturated fit value ( -2lnL units ): -3060.333
number of observations: 296
chi-square: X2 ( df=54 ) = 164.931, p = 3.66674e-13
Information Criteria:
| df Penalty | Parameters Penalty | Sample-Size Adjusted
AIC: 56.93098 212.9310 NA
BIC: -142.34843 301.4996 225.3879
CFI: 0.7382109
TLI: 0.6800355 (also known as NNFI)
RMSEA: 0.08330742 [95% CI (0.06610627, 0.1006919)]
Prob(RMSEA <= 0.05): 0.0001196532
timestamp: 2016-04-11 10:22:05
Wall clock time (HH:MM:SS.hh): 00:00:00.11
optimizer: SLSQP
OpenMx version number: 2.5.2
Need help? See help(mxSummary)

Is this the right thing to do? My data is binary however. What should I do in this case?
Thanks.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
When you say "factors," it

When you say "factors," it sounds like you mean the term in the sense used in ANOVA and experimental design, and not in the sense meant in psychometrics--as in "factor analysis," which is what the front-page example demonstrates. It sounds like the response variables are the 12 dichotomously scored activities, and that you have one fixed effect, subject, and two random effects, student and class.

Does a unique numeral in the "student" column of your dataset always refer to the same student, across subjects and classes? For instance, is the student #1 in History, class #66, the same person as student #1 in Biology, class #62? To put it another way, are there repeated measures on the same students but taking different subjects? I get the impression that there aren't, so I'll assume that's the case for the rest of this post.

It sounds like the 12 activities are items on a scale. You might want to simply sum the items to get a total score for each student, especially if the scale is unidimensional. Do you know anything about the latent dimensionality of the scale? That's something that factor analysis COULD tell you about. A more elegant alternative to creating sum scores would be IRT (Item Response Theory) scoring, but I'm concerned you might not have a large enough sample to adequately calibrate your items.

If you decide to create sum scores, you wouldn't necessarily need to use OpenMx or any other SEM software. You could just do linear mixed-effects regression in any software that handles it, like the lme4 package in R. If you decide to analyze item-level data, you might be able to use OpenMx's IFA (Item Factor Analysis) module to do an IRT analysis. Alternately, you could do some kind of multilevel, multivariate probit regression, which would involve recoding the 12 dichotomous variables as MxFactors, and maybe use OpenMx's new multilevel features.

Finally, you could answer your research questions by doing an item-level ANOVA. That's kind of old-fashioned, and you might get criticized for it, but it's not out of the question--see R.J. Shavelson & N.M. Webb's Generalizability Theory: A Primer (1991), published by Sage Publications, Inc.

konval's picture
Offline
Joined: 04/04/2016 - 21:08
Help with SEM

Thank you very much for your comments and suggestions. What I meant by “factors” was in the sense meant in psychometrics and not in ANOVA or some sort of linear mixed models.
Let me give some more explanations. The column “class” in the data is a code for different classes. There are 11 classes in total. The number of students in the classes ranges between 25 and 30. Student No. 1 in class 1 is not the same as student No. 1 in class 2. Therefore the numeral in the “student” column does not refer to the same student across subjects and classes. Students are nested within subjects and subjects are nested within classes. Only class 61 contains 2 subjects (1 and 3).
Activity of the students was measured on a scale of 12 – energetic, alert, diligent, agile and so on. The answer is coded as 0 or 1.
Now the question is: Can I model student’s activity as a latent variable (factor) given the 12 items. How do we account for the effect of the class, subject and student? Which demo from OpenMx is the closest to my problem (if any)?

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
I gave this some thought, and

I gave this some thought, and I think the simplest way to approach this problem in OpenMx would actually be to treat class as a fixed, not random, effect. The reason is that if you treated class as a random effect, because your observable variables are dichotomous threshold variables, OpenMx would currently offer you no other choice than to format your data in "wide" format, with one class per row, in order to account for clustering of students within classes. Thus, the independent units of observation would be vectors of item responses, with length equal to the number of students in the class times twelve. This would require you to cyclically specify the same within-person structure for blocks of twelve items, as well as a between-person correlation structure, to account for similarity of students clustered within the same class. Since there are only eleven classes, it greatly simplifies the analysis to just estimate 10 additional parameters for "class effects," and leave the data structured with one student (specifically, one student's item responses) per row.

It sounds like you want to operationalize student "activity" as a single latent variable that represents variance common to the 12 items (i.e. as a common factor), and estimate the effects of subject and class on student activity. Is this right? If so, it might be doable in OpenMx's item factor analysis module. I'm certain it can be done as an ordinal-threshold model, though I'm not sure how easy it would be to set it up using path specification.

If I understand you correctly, what you probably want to do is condition the mean of the latent common "activity" factor on subject and class. To identify the scale, fix the factor's intercept to zero. Choose a class and subject to be "reference" categories, and create four "subject" effects and 10 "class" effects, which represent the "effect" of a student being that class or subject on the model-expected factor mean, relative to the intercept. In other words, you'd be dummy coding class and subject as categorical predictors, as in classical regression, but the regression is being built into a common-factor model because the response variable is a latent factor.

Do I understand your objective correctly, and does my post make sense?

konval's picture
Offline
Joined: 04/04/2016 - 21:08
Help with SEM

Thank you very, very much for the time you’ve spent on my problem.
“It sounds like you want to operationalize student "activity" as a single
latent variable that represents variance common to the 12 items (i.e. as a
common factor), and estimate the effects of subject and class on student
activity.” – Yes this is exactly what I want to do.
I prepared the data only for subject. The entire dataset is attached. Then I ran the attached script and I got errors. I do not know how to specify the subject as an “effect” in the script. Help will be much appreciated.
On the other hand, I have problems to read my data with PANAS model.
PANASItem <- c("Very Slightly or Not at All", "A Little",
"Moderately", "Quite a Bit", "Extremely") – can I change this to:
PANASItem <- c("Subject") ?

spec <- list()
spec[1:10] <- rpf.grm(outcomes = length(PANASItem)) # grm="graded response model"

replace with your own data

data <- rpf.sample(750, spec, sapply(spec, rpf.rparam))
what should I do here?
Data <- read.table(“myData”,header=TRUE) ?

for (cx in 1:10) levels(data[[cx]]) <- PANASItem # repair level labels
colnames(data) <- c("interested", "excited", "strong", "enthusiastic", "proud",
"alert", "inspired", "determined", "attentive", "active")
How do I force the colnames to be the names for myData?
head(data) # much easier to understand with labels
origData <- data
I am sorry for the trivial questions but I am still stugling woth the syntax of OpenMx.

Once again thanks.

AdminRobK's picture
Offline
Joined: 01/24/2014 - 12:15
some suggestions

I don't really know the IFA module (what the PANAS demo uses), but I have a few suggestions for the path-specified model in your syntax.

One possibility would require dummy-coding the subject variable. Create three new columns in your dataset. If the subject in a row of the dataset is, say, History, then set all three of the new columns to zero for that row. If the subject is Biology, then the first new column should contain a 1, and the other two should contain zero. If the subject is Language, the second new column should contain 1, and the other two zero. If the subject is Math, the third new column should contain 1, and the other two zero. After that, use the three new variables as "definition variables" for the mean of the latent factor. I'm not sure how many demos there are that use definition variables in RAM-type models, though.

Alternately, you could try adding the three new variables to your model as manifest continuous variables. Be sure they have free one-headed paths going to them from "one", and that they have free two-headed paths going among themselves (they need to have freely estimated variances, and they should be strongly negatively correlated with one another). Most importantly, make sure there are free one-headed paths going from them to the latent factor. Those path coefficients will represent the "effect" on the factor of the subject being Biology, Language, or Math, relative to History. Be warned, though, I'm not sure this will work (I don't presently have time to try it myself).

Another possibility is to forget about dummy-coding and divide your dataset into four subsets, one for each subject, and put each into its own MxModel, and put the four MxModels into another "container" MxModel. Those four MxModels would be mostly the same as what you have in your syntax. One exception is that they should have different names. The important exception is that the one-headed path from "one" to the latent factor should be free, with different labels, in three of the four models, and should be fixed to zero in the remaining model. Comparing the estimated values of those paths will tell you how the mean of the latent factor differs relatively for the four subjects (it must be fixed to zero for one of them to identify the latent scale). Be sure to give the container MxModel a multigroup fitfunction.

Edit: I should mention that eventually, you will need to deal with the fact that your students are clustered within classes. Ignoring it shouldn't bias your point estimates, but it will bias your inferences (standard errors, test statistics, confidence intervals). Elsewhere in this thread, I suggested treating class as a fixed effect like subject, which I think is a reasonable way to proceed in this case.

jpritikin's picture
Offline
Joined: 05/24/2012 - 00:35
help with IFA

I have an accepted manuscript that will likely be of assistance. See http://people.virginia.edu/~jnp3bc/pritikin-schmidt-20160223.pdf