Friday, December 12

You can also download a PDF copy of this lecture.

Information about the Final Examination

The final examination is on Monday, December 15th, from 12:45 to 2:45 PM. It will be in the same room as lecture.
The format of the final examination is like that of the other examinations. The only difference is that it is a “cumulative” examination (i.e., including material covered over the entire semester), and you will have 120 minutes rather than 50 minutes to complete it. It will not have more questions than the other examinations, and is worth the same amount (25%) as the other examinations. As usual the final examination is open-notes.
The final examination will focus more on concepts. There will be fewer computational problems.

Foundations

Understand the distinction between a population and a sample, between an element and a sampling unit, between a parameter and estimator, and between a target and auxiliary variable.
What do we mean by a sampling frame?
What do we mean by a (probability) sampling design?
What are inclusion probabilities and what are selection probabilities? What are first-order versus second-order inclusion probabilities?
What do we mean by the sampling distribution of an estimator?
How do we talk about the “accuracy” of an estimator in terms of its bias, variance, and mean squared error?
What is the relationship between the standard error, the variance of an estimator, the bound on the error of estimation, and a confidence interval?

Sampling Designs

From a short description of a sampling design (assuming you are given adequate information) you should be able to identify whether the design is a simple random sampling design, a stratified random sampling design, a one-stage cluster sampling design, or a two-stage cluster sampling design (assuming it is one of these). Think about the defining characteristics of each design.
Relative to a simple random sampling design with the same sample size (i.e., the same number of sampled elements), what advantages and disadvantages (if any) do we have by using stratified random sampling or cluster sampling? Are there any advantages to two-stage cluster sampling over one-stage cluster sampling? What?
Both stratified random sampling and cluster sampling (both one-stage and two-stage) use the fact that the elements in a population have been partitioned into groups that we call either strata or clusters. What then distinguishes a stratified random sampling design from a cluster sampling design, and what distinguishes a one-stage cluster sampling design from a two-stage cluster sampling design?
If the goal is to reduce the variance of an estimator of \(\tau\) or \(\mu\), what is the best way to assign elements to strata? What is the best way to assign elements to clusters? Think about when the variance will be relatively small in terms of how elements are stratified/clustered.
How can simple random sampling be used as a component of various complex sampling designs, including stratified random sampling and cluster sampling designs? Also what kinds of sampling designs might we use to sample clusters in a cluster sampling design?
What is meant by optimum allocation in the context of stratified random sampling. What is the basic approach? Which strata should get the largest/smallest allocation?
What is the distinction between sampling with versus without replacement?
Have some understanding of some of the specialized sampling designs that we discussed including ranked set sampling, systematic sampling, Poisson sampling, line-intercept sampling, fixed-area plot sampling, adaptive cluster sampling, and balanced sampling.

Estimation/Inference

Several times we have seen that there were a couple of different estimators that could be used to estimate a total or mean. What factors would we consider in evaluating the advantages/disadvantages of each estimator relative to the other? Given a pair of such estimators you should be able to list their relative advantages and disadvantages (if any).
What do we mean by variance estimation?
In what situations is a ratio or regression estimator best suited?
In one-stage and two-stage cluster sampling designs, what is the auxiliary variable that we are using when we use the “ratio estimator” of \(\tau\) or \(\mu\)?
How do we specify the target variable if we want to estimate abundance (i.e., the number of elements in a population)? How do we specify it if we want to estimate the abundance for a domain (i.e., the number of elements in a particular sub-population)?
How do we specify the target variable if we want to estimate the mean or total in a given domain? How do we specify the target variable if we want to estimate abundance (either overall or in a domain)?
What does it mean to say that a sample is calibrated?
What is meant by post-stratification? How is it different from stratified random sampling?
Many estimators used in survey sampling (and in statistics in general) are biased. Why is this sometimes tolerable?
Given the values of the target variable and the inclusion probability for the elements within a sample, how do we compute the Horvitz-Thompson estimator and the Hájek estimator for \(\tau\) and \(\mu\)? And for \(\mu\) how is this done when the number of elements in the population is known versus when it is not?
How/when do we use a Hansen-Hurwitz estimator?
If we can choose our own selection probabilities when sampling with replacement, or our own inclusion probabilities when sampling without replacement, how should we specify them so as to reduce the variance of an estimator of \(\tau\) or \(\mu\)?
What determines survey weights? How are survey weights used in estimation?
Be familiar with why and how we would use double sampling to (a) estimate \(\mu\) or \(\tau\) when we have strata and (b) are using a ratio estimator of \(\mu\) or \(\tau\).
How does detectability and non-response affect inferences? How do we conceptualize these issues in terms of inclusion probabilities?