You can also download a PDF copy of this lecture.
Note: Rather than creating extra homework problems on the topics from today’s lecture, if there are any problems on the examination from today’s lecture I will use the examples from this lecture (although I may change the numbers).
When strata correspond to one or more domains of interest, additional inferences concerning those domains can easily be done when using stratified random sampling.
With stratified random sampling, the sampling design for obtaining the sample from each stratum is simple random sampling. So inferences concerning a stratum mean or total use the results from simple random sampling. We have \[ \hat\mu_i = \bar{y}_i \ \ \ \text{and} \ \ \ V(\hat\mu_i) = \left(1 - \frac{n_i}{N_i}\right)\frac{\sigma_i^2}{n_i}, \] and \(\hat\tau_i = N_i\bar{y}_i\) and \(V(\hat\tau_i) = N_i^2V(\hat\mu_i)\), noting that if we need to estimate the variance we replace \(\sigma_i^2\) with \(s_i^2\) and \(V\) with \(\hat{V}\).
Example: Suppose we have the following results from a survey that used stratified random sampling.\(i\) | \(N_i\) | \(n_i\) | \(\bar{y}_i\) | \(s_i\) |
---|---|---|---|---|
1 | 1000 | 100 | 36 | 6 |
2 | 2000 | 200 | 25 | 5 |
3 | 3000 | 300 | 16 | 4 |
What is \(\hat\mu_1\) as well as the estimates of the variances of that estimator? What is \(\hat\tau_1\) as well as the estimates of the variances of that estimator?
\(i\) | \(N_i\) | \(n_i\) | \(\bar{y}_i\) | \(s_i\) |
---|---|---|---|---|
1 | 1000 | 100 | 36 | 6 |
2 | 2000 | 200 | 25 | 5 |
3 | 3000 | 300 | 16 | 4 |
What is the estimate of \(\mu_{2,3}\) and the variance of that estimator?
More generally, we can do this for any number of strata. For example, to estimate to mean of strata \(i\), \(j\), and \(k\) combined, we can use the estimator \[ \hat\mu_{i,j,k} = \frac{N_i}{N_i+N_j+N_k}\bar{y}_i + \frac{N_j}{N_i+N_j+N_k}\bar{y}_j + \frac{N_k}{N_i + N_j + N_k}\bar{y}_k, \] which as variance \[ V(\hat\mu_{i,j,k}) = \left(\frac{N_i}{N_i+N_j+N_k}\right)^2 V(\hat\mu_i) + \left(\frac{N_j}{N_i+N_j+N_k}\right)^2V(\hat\mu_j) + \left(\frac{N_k}{N_i+N_j+N_k}\right)^2V(\hat\mu_k). \] And the total for those strata combined, \(\tau_{i,j,k}\), is estimated as \[ \hat\tau_{i,j,k} = \hat\tau_i + \hat\tau_j + \hat\tau_k \] which has variance \[ V(\hat\tau_{i,j,k}) = V(\hat\tau_i) + V(\hat\tau_j) + V(\hat\tau_j). \]
Suppose we want to estimate \(\mu_i - \mu_j\). The estimator is simply \[ \hat\mu_i - \hat\mu_j = \bar{y}_i - \bar{y}_j, \] which has variance \[ V(\hat\mu_i - \hat\mu_j) = V(\hat\mu_i) + V(\hat\mu_j). \] Similarly if we want to estimate \(\tau_i - \tau_j\) the estimator is \[ \hat\tau_i - \hat\tau_j = N_i\bar{y}_i - N_j\bar{y}_j, \] which has variance \[ V(\hat\tau_i - \hat\tau_j) = V(\hat\tau_i) + V(\hat\tau_j). \] Note that although we are subtracting estimators, the variances are still additive.
Example: Suppose we have the following results from a survey that used stratified random sampling.\(i\) | \(N_i\) | \(n_i\) | \(\bar{y}_i\) | \(s_i\) |
---|---|---|---|---|
1 | 1000 | 100 | 36 | 6 |
2 | 2000 | 200 | 25 | 5 |
3 | 3000 | 300 | 16 | 4 |
What is the estimate of \(\mu_1 - \mu_3\) and what is the variance of the estimator \(\hat\mu_1 - \hat\mu_3\)?
Note: We can also do these kinds of inferences with post-stratification, or for stratified random sampling where the domains of interest do not correspond to the strata. The estimators are the same but the variances are different.