You can also download a PDF copy of this document.
Level | \(N_j\) | \(n_j\) | \(\bar{y}_j\) | \(s_j\) |
---|---|---|---|---|
High | 300 | 50 | 80 | 10 |
Medium | 500 | 100 | 70 | 15 |
Low | 200 | 50 | 50 | 20 |
Use this information to answer the following questions.
Confirm that the estimate of the mean test score for all of the students at the school is 69, that the estimate of the variance of the estimator used to obtain this estimate is approximately 0.84, that the standard error is approximately 0.92, that the bound on the error of estimation is approximately 1.83, and so the confidence interval for the mean test score for all students at the school is approximately 69 \(\pm\) 1.83.
The survey described above was conducted several years ago. The educational psychologists are planning a new survey this year. The composition of the school has changed slightly. The school has grown to a total of 1300 students. Of these, 400 are classified as high-achieving, 600 are classified as medium-achieving, and 300 are classified as low-achieving students. Assume that it costs $20 to test one student, regardless of their achievement level. Confirm that the optimum allocation for a fixed bound on the error of estimation of one point (i.e., \(B\) = 1) would be to sample approximately 106 high-achieving students, approximately 238 medium-achieving students, and approximately 159 low-achieving students.
It was determined that the allocation found in the previous problem was too expensive. In addition to a cost of $20 per tested student, there is an overhead cost of $2000 for the specialists administering the test. Confirm that the optimum allocation for a fixed total cost of $10000 would be to sample approximately 84 high-achieving students, approximately 189 medium-achieving students, and approximately 126 low-achieving students.
Forestry researchers conducted a survey to estimate the number of trees of a particular species in a region of forest. They used a stratified random sampling design. The region had been divided into 500 units, each with an area of one hectare. Each unit had also been classified as either “low-elevation” or “high-elevation” based on the average elevation within each unit. A total of 400 units were classified as low-elevation, and 100 units were classified as high-elevation. The researchers selected 50 low-elevation units using simple random sampling, and sent teams out to count the number of trees in each sampled unit. The average number of trees per sampled low-elevation unit was 100 trees, and the standard deviation was 10 trees. The researchers also selected 50 high-elevation units using simple random sampling, and sent teams out to count the number of trees in each of these sampled units. The average number of trees per sampled high-elevation unit was 25 trees, and the standard deviation was 5 trees.
Confirm that the estimate of the total number of trees in the region is 42500 trees, that the estimate of the variance of the estimator used to obtain this estimate is 282500, that the standard error is approximately 532 trees, that the bound on the error of estimation is approximately 1063 trees, and that the confidence interval for the total number of trees in the region is approximately 42500 \(\pm\) 1063 trees.
The forestry researchers are planning another survey in a similar forest region using stratified random sampling. This region has a total of 1000 one-hectare units, of which 300 are low-elevation and 700 are high-elevation. They estimate that the cost to survey a low-elevation unit is $20 and the cost to survey a high-elevation unit is $10 (high-elevation units are cheaper despite being less accessible because they tend to be more sparse with respect to flora and so faster to survey). Confirm that the optimum allocation for a fixed bound on the error of estimation of 1000 trees is to sample approximately 55 low-elevation units and approximately 91 high-elevation units.
Consider the previous problem where the forestry researchers are planning a new survey. Assume that the researchers can spend $5000 on the survey, but they only need to pay for surveying the units so there is no overhead cost (any overhead cost will be paid for out of a separate budget). Confirm that the optimum allocation for a fixed total cost of $5000 is to sample approximately 137 low-elevation units and approximately 226 high-elevation units.
Consider the problem of estimating the parameter \(\mu\) using one of three designs: simple random sampling, stratified random sampling, and cluster sampling (which we will talk about later in the course). Each are based on a total sample size of \(n\) = 1000 elements. Assume we can compute (or at least estimate) the variance of the estimator of \(\mu\) for each of the three designs. The variances of the estimator under simple random sampling, stratified random sampling, and cluster sampling are \(V_{\tiny\mbox{srs}}(\hat\mu)\) = 10, \(V_{\tiny\mbox{strat}}(\hat\mu)\) = 8, and \(V_{\tiny\mbox{clus}}(\hat\mu)\) = 16, respectively. The stratified and cluster sampling designs are the complex sampling designs. Confirm that the design effects of the stratified and cluster sampling designs are 0.8 and 1.6, respectively. Also confirm that the effective sample sizes of the stratified and cluster sampling designs are 1250 and 625, respectively.
Level | \(n_j\) | \(\bar{y}_j\) | \(s_j\) |
---|---|---|---|
High | 58 | 78 | 11 |
Medium | 98 | 72 | 14 |
Low | 44 | 55 | 22 |
Confirm that the estimate of the mean score on the cognitive test of all students at the school is approximately 70.
Consider the previous example were forestry researchers were conducting a survey to estimate the number of trees of a particular species in a region of forest divided into a total of 500 units. Recall that they used elevation to stratify the units of forest. Now suppose that the elevation data were not available, so the researchers used a double sampling design to observe the elevation data in the field. First a simple random sample of 100 units was selected and field crews were sent to each unit to quickly inspect it and classify it as low-elevation or high-elevation. They classified 77 units as low-elevation and 23 as high-elevation. They then obtained a stratified random sample of these sampled units by obtaining a simple random sample of 40 of the low-elevation units, and a simple random sample of 20 of the high-elevation units. For these 60 sampled units field crews counted the number of trees. The mean number of trees per unit was 96 trees in the low-elevation units, and 28 trees in the high-elevation units. Confirm that the estimate of the total number of trees in the region is then 40180 trees.