select sample
Posted on

Forums
I have a real data set consisting of 6148 students. There are six observed variables and two latent variables. I calculate the covariances matrix from data which include 6148. I want to selecet 100 student's data which has the same covariances matrix with the universe. If you help me this regard, I would be very pleasure.
Any randomly selected
What do I mean by "pretty close"? I mean that the sampling variability of the covariance matrix is not large. A popular result from undergraduate statistics is that the sampling distribution of the mean has a mean of mu and a variance of v/N where mu is the population mean, v is the population variance, and N is the sample size. Similarly, the sampling distribution of the variance has a mean of v and a variance of 2*v*v/(N-1). If 2*v/(N-1) is less than 1.0 then the variance of the sampling distribution of the variance is smaller than the population variance. There are similar results for multiple variables.
The take-home message is that the sample covariance matrix from a random sample is almost always close enough to the population covariance matrix. You have a population covariance matrix, so take any random sample and the covariance should be sufficiently close to the population covariance.
Log in or register to post comments
In reply to Any randomly selected by mhunter
Not any random sample, but why?
It would be possible to simulate data consistent with the population covariance matrix, using mvrnorm() and empirical=TRUE.
However, such efforts beg the question of why one would want to do this. At the end of the day, for covariance matrix model fitting, simply changing the sample size would do exactly what is desired.
Log in or register to post comments
In reply to Any randomly selected by mhunter
Thanks for your answer.
Log in or register to post comments