Does anybody know a reference in which it is shown that the likelihood ratio test statistic converges to a Chi squared distribution for extended SEMs with definition variables?

With extended SEM I mean RAM models where the entries of A and S may be arbitrary algebras. In contrast to classical SEM where the entries of A,S are either fixed or a single parameter. With definition variables I mean that any fixed value in an SEM may be replaced by a person specific value such that it is different for every person.

In textbooks I can only find the proof for classical SEMs without definition variables. I am pretty sure that the switch from classical to extended SEM does not violate any of the assumptions used in standard proofs that the likelihood ratio test is Chi squared distribution. I am not so sure about the definition variables.

I don't believe definition variables violate any of the assumptions of the Wilks paper (http://doi.org/10.1214/aoms/1177732360) that suggested the asymptotic chi-square distribution of a likelihood ratio. The extended SEM vs SEM makes absolutely no difference. When you add definition variables, SEMs switch to conditional likelihoods instead of likelihoods. More fully, SEMs use maximum likelihood (assuming a multivariate normal distribution; MVN) whereas SEMs with definition variables use conditional maximum likelihood (i.e. where the likelihood is conditional on the definition variables and the data are MVN conditional on the definition variables).

The more common issue with the chi square distribution of a likelihood ratio is the assumption that the estimated parameters are at interior points of the parameter space. That is, a hypothesis that a variance is zero violates this assumption by being at the boundary of the parameter space. This is why REML exists.

Wilks, S. S. (1938). "The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses". The Annals of Mathematical Statistics 9: 60–62. doi:10.1214/aoms/1177732360

Thanks a lot for your swift and clear answer. I am glad that you also believe that extended SEM vs SEM makes no difference.

I don't find the Wilks paper so accessible. However, as I understand it, the central assumption is that both the restricted and unrestricted ML estimator are asymptotically normal (see also http://www.statlect.com/likelihood_ratio_test.htm ).

As far as I understood it, the switch to conditional likelihoods violates one of the assumptions for asymptotic normality. According to statlect (first assumption there) one assumption for regular ML estimators to be normal is that every individual is a realization of the same distribution (IID). But, using definition variables we assume that the distribution is different for every individual.

However, as you wrote, when using definition variables, we use conditional maximum likelihood estimators. I found a paper (http://www.jstor.org/stable/2984535?seq=1#page_scan_tab_contents ) that shows that under some regularity conditions conditional maximum likelihood estimators are also approximately normal, which I guess you already knew.

Since we are at it now. What happens to conditional maximum likelihood estimators if there is missing data?

I just ran across:

Enders, C. K. & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8 (3), 430–457.

Relevant?

Along similar lines, there's this citation classic on missing data: Rubin, D. B. (1976). Inference and missing data. Biometrika 63(3): 581-592.

Also, FWIW, here are the regularity conditions ordinarily assumed for asymptotic maximum-likelihood theory:

They're taken from Greene, W. H. (2003). Econometric Analysis (5th ed.). Upper Saddle River, NJ: Prentice-Hall.

A very mild relaxation of IID is conditionally IID. The modeled data are IID conditional on the definition variables. You switch from f(y | theta) to f(y | theta, x), where f() is the probability density function, y is the data vector, theta is the vector of free parameters, and x is the vector of definition variables. Basically, all the benefits of the former case carry on to the latter.

The article you mentioned seemed to tackle a different problem: the Neyman-Scott problem of inconsistent estimators (that is, a situation where standard errors do not get smaller when you add more rows of data). Some ways of using definition variables run into this problem, others do not. It depends on whether "nuisance" parameters are estimated for each new row of data.

In many cases, the same thing happens to conditional maximum likelihood estimators under missing data as MLEs. The paper Joshua mentioned is relevant to this. In general, it depends on how you're using definition variables.

Thanks again for the tremendous help. I think I should be a bit more specific in which problem I want to solve. The class of models for which I want to show that the likelihood ratio test is valid is of the form as attached. In practice the number of manifest variables will be bigger but the form remains the same.

theta is my parameter vector. The number of parameters might grow with the number of manifest variables but not with the number of persons. t_i contains the definition variables for person i, which is also fixed. Within the definition variables there are no missing values. The number of definition variables is also fixed. There is no missingness within the definition variables. m(t_ij;theta) is an arbitrary function into the reals. k(t_ij,t_ik;theta) is a function such for any parameter configuration and definition variable the model implied covariance matrix is a valid covariance matrix. For any person some but not all manifest variables might be missing. Thus, the likelihood for the data y_i of person i is f(y_i|theta,t_i).

As far as I understand, for the likelihood ratio test to be valid I need to show that the FIML estimator as well as the FIML estimator for g(theta)=0 where g(theta) is a suitable restriction are distributed asymptotically normal.