Hello,
I have a cross-lagged model that I have run using the code below:
CrossLaggedModel<-mxModel("cross_lagged_model", type="RAM", manifestVars=c("f7_bmi","tf4_bmi","f7_cpg","tf4_cpg"), mxData(observed=children_data, type="raw"), mxPath(from=c("f7_bmi"), to=c("tf4_cpg"), arrows=1, free=TRUE, values=c(.5), labels="AtoB"), mxPath(from=c("f7_cpg"), to=c("tf4_bmi"), arrows=1, free=TRUE, values=c(1.5), labels="BtoA"), mxPath(from=c("f7_bmi"), to=c("tf4_bmi"), arrows=1, free=TRUE, values=c(.1), labels="AtoA"), mxPath(from=c("f7_cpg"), to=c("tf4_cpg"), arrows=1, free=TRUE, values=c(0), labels="BtoB"), mxPath(from=c("f7_bmi","tf4_bmi"), arrows=2, free=TRUE, values=c(1, 1), labels=c("residualA1", "residualA2")), mxPath(from=c("f7_cpg","tf4_cpg"), arrows=2, free=TRUE, values=c(.5, .5), labels=c("residualB1", "residualB2")), mxPath(from=c("f7_bmi","tf4_bmi"), to=c("f7_cpg","tf4_cpg"), arrows=2, free=TRUE, values=c(.05,.05), labels=c("residCovAB1", "residCovAB2")), mxPath(from="one", to=c("f7_bmi","tf4_bmi","f7_cpg","tf4_cpg"), free=TRUE, values=0, labels="m") ) model1<-mxModel(CrossLaggedModel, mxCI(c("cross_lagged_model.A","cross_lagged_model.S"))) model<-mxTryHard(model1, intervals=T)
Here, f7 is one time point and tf4 is a later timepoint and BMI and CpG are the two variables at different timepoints. When I run this model I'm concerned about the results I get (shown below) as these are suggesting opposite effects to linear regression models, but also because the residuals/variance for residual B1 is very large and not near what the actual variance of the data is. I've tried specifying different starting values for the model, but I still get these large values. I have also tried adding in a latent variable for BMI and CpG but this just makes the numbers even larger. So I just wondered if there was anything wrong with the code I am using or whether this is perhaps just not working well for some reason.
free parameters:
name matrix row col Estimate Std.Error A
1 AtoA A tf4_bmi f7_bmi 0.09102847 0.10034782
2 AtoB A tf4_cpg f7_bmi 0.47616253 0.03039210
3 BtoA A tf4_bmi f7_cpg 1.34019586 0.01243257
4 BtoB A tf4_cpg f7_cpg -0.05465275 0.01098387
5 residualA1 S f7_bmi f7_bmi 1.82067088 0.33356059
6 residualA2 S tf4_bmi tf4_bmi 7.49071007 0.38457389
7 residCovAB1 S f7_bmi f7_cpg -13.60428618 2.70038904
8 residualB1 S f7_cpg f7_cpg 238.60375066 12.54453274
9 residCovAB2 S tf4_bmi tf4_cpg 0.26706472 0.09050008
10 residualB2 S tf4_cpg tf4_cpg 0.82379525 0.04027825
11 m M 1 f7_bmi 0.91112011 0.17838277
Thank you!
I don't see anything amiss about your script. I'm surprised you're getting results you don't find trustworthy, because this looks as though it would be a pretty easy optimization problem.
If I'm counting correctly, your model should be just-identified. Does the model-expected covariance matrix at the solution look reasonable? Also, do you have many missing observations? A FIML solution can look different from results obtained by a method that deletes incomplete cases (as is typical in linear regression).
Thanks Rob,
I'm mostly concerned by that large residual value, but then if the direction of effect is also different it's a question of which to trust I guess.
I've had a look at the expected covariance matrix (I believe this is the correct part, below) and I'm not really sure what it should look like but I'm a little surprised to see such large values in there. But perhaps this isn't unusual?
I also looked at missing cases and yes there are less people in the regression analayses and missing data in the FIML model, so this could explain the difference, although this is about 100 people and I wouldn't expect this to necessarily change that much, but perhaps that is the case.
I should also mention that I have this issue with another similar model but in a separate sample (above is children and the other sample is adults).
Thanks
Zoe
I think the issue is with the model for the intercepts. First of all, is zero a good choice of start value for them? Secondly, and more importantly, I just noticed that this line,
, assigns the same label "m" to all four of those paths, meaning that the intercepts of all four manifest variables are constrained to be equal to one another. I doubt that's what you want to do?
If you're incorrectly estimating the phenotypic means, then it would be no surprise that the variance is being overestimated.
The same expected mean for every variable is likely the culprit, IMO. Substituting manifestVars for m in this line
to get
should help a lot.
Thank you both,
So am I right in thinking that will just use what I have already specified as the manifestVars to estimate the expected means, so if I just do the above but keep them all within the same mxPath() function then this should work?
Actually I've just tried making that change and I still get large residuals, even with different starting values. I've tried specifying different starting values for each but this still makes no difference.
Odd. Do you think you could share simulated data that's similar to your own, with
mxGenerateData()
?I've attached data generated as suggested as a .csv. I've not used that function before, but the data has negatives in it for BMI, but I'm guessing that might not matter? If it does let me know and I'll try and make it so this does not happen.
Thanks!
What did you pass as value for argument
model
tomxGenerateData()
: your MxModel object, or a dataframe containing your actual data? I should have specified that you ought to pass it your actual data.If you did give it a dataframe of your data, then I guess negative BMI scores are OK, because the function generates new data from a multivariate-normal distribution.
So I just passed it the model, so that explains why. I'm probably missing something obvious but when I try and pass a normal R data frame to this it doesn't work and I get the following error? Do you know why this is?
Error: you must specify 'nrow' and 'ncol' arguments in mxMatrix(values = wlsData$thresholds, name = "thresh")
Thanks
That looks like a bug. What's your
mxVersion()
output?Thanks, it was an older version. I tried with a newer one and it worked. I've attached the correct data now, so hopefully that is useful.
Wait, hold it! Professor Neale's suggested change here,
, won't resolve the issue. It just changes the label of the single intercept parameter. I can't believe I overlooked that! Each of the 4 variables needs its own intercept parameter, like so:
Thanks!
That solves the problem and I get much more realistic residuals, as shown here:
Summary of cross_lagged_model
free parameters:
name matrix row col Estimate Std.Error A
1 AtoA A tf4_bmi f7_bmi 1.300011908 0.04774913
2 AtoB A tf4_cpg f7_bmi 0.053621172 0.01425319
3 BtoA A tf4_bmi f7_cpg 0.094762258 0.10041324
4 BtoB A tf4_cpg f7_cpg 0.452180754 0.02951133
5 residualA1 S f7_bmi f7_bmi 4.311406435 0.20415313
6 residualA2 S tf4_bmi tf4_bmi 7.475740688 0.38337381
7 residCovAB1 S f7_bmi f7_cpg 0.224644372 0.07007160
8 residualB1 S f7_cpg f7_cpg 1.004418842 0.04750762
9 residCovAB2 S tf4_bmi tf4_cpg 0.285473804 0.08713938
10 residualB2 S tf4_cpg tf4_cpg 0.772890949 0.03655699
11 f7_bmi_int M 1 f7_bmi 16.219345821 0.06950395
12 tf4_bmi_int M 1 tf4_bmi 1.566125549 0.77808383
13 f7_cpg_int M 1 f7_cpg 0.007653321 0.03351886
14 tf4_cpg_int M 1 tf4_cpg -0.873267552 0.23301683
Thank you for noticing that!
Good to hear!