How is two-stage cluster sampling different from one-stage cluster sampling?
In what sense is a one-stage cluster sampling design a special case of a two-stage cluster sampling design?
In what sense is a stratified random sampling design a special case of a two-stage cluster sampling design?
What are the advantages and disadvantages of two-stage cluster sampling relative to simple random sampling?
What advantage does a two-stage cluster sampling have relative to a one-stage cluster sampling design?
Like many complex sampling designs, a two-stage cluster sampling design can use simple random sampling as a part of the design. How might a two-stage cluster sampling design use simple random sampling?
For a two-stage cluster sampling design with simple random sampling at both stages, there are two estimators of \(\tau\). Why might you use one estimator over the other? That is, what are the advantages and/or disadvantages of each estimator relative to the other?
For a two-stage cluster sampling design with simple random sampling at both stages, there are two estimators of \(\mu\). Why might you use one estimator over the other? That is, what are the advantages and/or disadvantages of each estimator relative to the other?
What is meant by primary and secondary sampling units in the context of two-stage cluster sampling?
How do the between-groups and within-groups mean squares affect the optimum number of clusters to sample and the optimum number of elements per cluster to sample?
In terms of the between-groups mean square, or the within-groups mean square, when would it be optimal to use one-stage cluster sampling? When would it be optimal to use two-stage cluster sampling with one element sampled from each cluster?
What do the between-group and within-group mean squares reflect about how elements are clustered? That is, what kinds of variability do these mean squares measure?
What is multi-stage cluster sampling? How is it an extension of two-stage cluster sampling?
Let \(y_i\) be the value of the target variable in a random sampling design (e.g., simple random sampling). How do we define \(y_{i}\) if we want to estimate the number of elements in the population? How do we define \(y_i\) if we want to estimate the number of elements in the population in a given domain?
To estimate the number of elements in a population based on a one-stage cluster sampling design with simple random sampling of clusters, we discussed two viable estimators. Why/when might we use one estimator over the other?
To estimate the number of elements in a domain based on a one-stage cluster sampling design with simple random sampling of clusters, we discussed three viable estimators. Why/when might we use one estimator over the other?
What is a mark-recapture design? How is it done?
The Lincoln-Peterson estimator is basically a ratio estimator. How do we define the value of the target variable (\(y_i\)) for this estimator? How do we define the value of the auxiliary variable (\(x_i\)) for this estimator?
Why would one use the Chapman estimator rather than the Lincoln-Peterson estimator of \(N\)?
Understand how a mark-recapture design can be viewed in terms of the inclusion or exclusion of each element in the population in each of two samples with the multinomial model.
What is the independence assumption that we are implicitly making when estimating abundance using a mark-recapture design as discussed in the context of the multinomial model? How can this assumption be violated?
How do we compute the estimate of \(N\) using a maximum likelihood estimator based on the likelihood function from the hypergeometric model for a mark-recapture design? Note that this can be done two ways: using a formula, or using the function itself.
We discussed how we could use something like mark-recapture with clustered elements. The number of “marked” elements in each cluster becomes a kind of auxiliary variable for each cluster. This provides us two ways to estimate the number of elements: the “unbiased” estimator \(\hat\tau = \frac{N}{n}\sum_{i \in \mathcal{S}} y_i\) and the “ratio” estimator \[\hat\tau = \tau_x\frac{\sum_{i \in \mathcal{S}} y_i}{\sum_{i \in \mathcal{S}} x_i},\] where \(y_i\) is the number of elements in the \(i\)-th cluster, and \(x_i\) is the number of marked elements in the \(i\)-th cluster. Why might the ratio estimator be preferred here?
What is meant by an inclusion probability?
How can you determine the inclusion probability of an element based on its selection probability for a sampling design using sampling with replacement?
How can the Horvitz-Thompson estimator be used to estimate \(\mu\) when (a) the number of elements in the population is known and (b) when the number of elements in the population is unknown.
How is the (expected) number of elements related to the inclusion probabilities?
Given an auxiliary variable \(x_i\) with known total \(\tau_x\) that is approximately proportional to the target variable, how might we specify the inclusion probabilities so that the Horvitz-Thompson estimator will have relatively small variance?
The Hansen-Hurwitz and Horvitz-Thompson estimators can be viewed as “general purpose” estimators. But they differ in a couple of important ways. For what kinds of designs can we use the Hansen-Hurwitz estimator? And how does the summation over the sample \(\mathcal{S}\) differ in the calculation of each estimator?
What are the Hájek estimators of \(\tau\) and \(\mu\)? When might a Hájek have lower variance than the alternative estimator (discussed in lecture)? How does knowledge of the number of elements (or lack thereof) affect whether or not we might use a Hájek estimator?
What are the inclusion probabilities for the elements for each of the following designs: simple random sampling, stratified random sampling, one-stage cluster sampling, and two-stage cluster sampling. Be able to compute the inclusion probability a given element for each design given sufficient information about the design.
What is a first-order versus a second-order inclusion probability? Which kind of inclusion probabilities are used in the calculation of an estimate using the Horvitz-Thompson estimator? Which kind of inclusion probabilities are used in the calculation of the variance of the Horvitz-Thompson estimator?
What do I mean by the “direct” versus “indirect” specification of inclusion probabilities?
What is Poisson sampling and what are its properties?
What is balanced sampling? What does it try to accomplish?
What is spatially balanced sampling? What does it try to accomplish?
Based on a single line or a single fixed area plot, how do we determine the inclusion probability of each element intersected by the line or included within the plot? Also, generally speaking, how does Bitterlich sampling determine the inclusion probability of a tree?
For what situations is adaptive cluster sampling best suited?
What is meant by detectability and non-response?
How does the inclusion probability depend on the probability of detection/response?
Assuming a constant detection/response probability, how to we adjust an estimate of \(\tau\) to account for undetected or non-responding elements if we know or have an estimate of \(\pi_D\)?
Assuming a constant detection/response probability, what happens to our estimator of \(\tau\) if we fail to account for undetected or non-responding elements and implicitly assume that \(\pi_D\) = 1?
What are some ways we can estimate detectability in one-stage cluster sampling when it is constant?
Assuming a constant detection/response probability and a one-stage cluster sampling design, how does \(\pi_D\) affect the variance of an estimator of \(\tau\)?
Assuming a constant detection/response probability and a one-stage cluster sampling design, how does having to estimate \(\pi_D\) affect the variance of an estimator of \(\tau\)?
When using distance sampling, how do you use the distribution of the distances of detected elements to estimate detectability?
When is the bias due to non-response large in absolute value?
Have a rough idea of the different methods of dealing with bias due to non-response.
What are the un-adjusted survey weights (i.e., design weights) for the elements for each of the following designs: simple random sampling, stratified random sampling, one-stage cluster sampling, and two-stage cluster sampling. Be able to compute the weight of a given element for each design given sufficient information about the design.
Given the values of the target variable and the corresponding survey weights, how can we compute an estimate of~~ \(\tau\) ~~and how can we compute an estimate of \(\mu\)?
In addition to the above, be comfortable understanding the calculations used in the homework problems as well as all notation.