In OpenMx, if I run a latent growth model with age definition variables and center my intercept at some age (by subtracting that number from each person's age), will the standard error of the intercept mean and variance depend on the number of people at or around that age? For example, if I have a longitudinal study that in the first wave a cross-section of people were recruited ages 10-40y and then followed every 5 years. Even if my total sample size was reasonable (1000) there are a lot less people at any given age, especially younger ages. So if I center the intercept at 20y, will the standard error of the intercept's mean and variance estimates be limited by the number of people with age 20y data? Does the issue get worse as I center to more and more specific ages such as 20.56y? Thanks

We discussed this at a recent OpenMx dev team meeting. We agreed that the answer to the following question,

, is "yes". As for the rest, we're not sure!

Sorry I missed this discussion. I see two parts to your question, Eric.

1.) The within-dataset answer: Growth curve model parameters should be insensitive to linear transformation when fit to the same data. You can center the age variable wherever you want and get the same fit. If you center way beyond the range of your data, you'll have a much wider standard error and CI than if you set it in the middle of your data. When you're comparing models fit to a single dataset, it doesn't matter a ton. You are effectively rotating your two latent variables, and shuffling error and imprecision back and forth between them.

And maybe we should start thinking about CIs around LGC intercepts less as "how confident am I that it does/n't include zero?" and more as "at what age (and CI around that age) does my predicted curve cross zero?"

2.) The cross-dataset answer. When trying to sample time points, how much does precision improve when I sample at age X? Rast & Hofer have some work on growth curve power that talks about how you get better mean slope estimates by sampling relatively extreme time points or simply maximizing the age range you study. If you go up 50 points over 50 years, you're a lot more confident that the slope is a point a year than you would be with a change of 1/365 observed the next day. Beyond that, I'm not sure there's a good answer beyond sampling at time points that cover the range you want to study.

I also wrote a quick simulation to test this. Feel free to play with it to learn more. I make five time points of data (ages 10-50), but have twice as much data at age 10 (n=1000), as at any other age (I made 1000 people and deleted data for half of them from age 20 on; attached code can delete any section). Then I centered age at 10, 30, and 50, and saw what happened. Results for the mean intercept are below at ages 10, 30, and 50, respectively:

5 meanI M 1 I 48.83985857 0.190165130

5 meanI M 1 I 50.07661199 0.14813758

5 meanI M 1 I 51.31329792 0.343631605

As you can guess, I simulated based on a mean of 50 at age 30, and scores went down an average of .05/year. Despite having twice as much data at age 10, the lowest standard error of the mean intercept was at age 30.

this was super helpful, thanks to both of you. I still get myself confused with the use of age definition variables, see my latest post with additional questions.