You can also download a PDF copy of this study guide.
Know how to conduct the three kinds of tests that we discussed that use the \(X^2\) test statistic: the goodness-of-fit test, the test of independence, and McNemar’s test. This includes the statement of the null and alternative hypotheses, calculation of the test statistic, calculation of the p-value, and the decision.
Know how to compute expected counts for a goodness-of-fit test and a test of independence.
Know how to compute the \(X^2\) test statistic using observed and expected counts.
Know how to compute the \(p\)-value using the \(\chi^2\) sampling distribution.
What does it mean to say that two variables are independent?
The test statistic \[ z = \frac{\hat{p}-p}{\sqrt{p(1-p)/n}} \] can sometimes be used instead of the \(X^2\) test statistic for a goodness-of-fit test. What is the relationship between the \(z\) and \(X^2\) test statistics? What is the limitation of the \(z\) test statistic?
The test statistic \[ z = \frac{\hat{p}_1-\hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})(1/n_1 + 1/n_2)}}. \] can sometimes be used instead of the \(X^2\) test statistic for a test of independence. What is the relationship between the \(z\) and \(X^2\) test statistics? What is the limitation of the \(z\) test statistic?
What is the purpose of a mark-recapture study?
How do you compute the Lincoln-Petersen estimator? What does it estimate?
Understand the importance of independence of inclusion/exclusion within the two samples in the context of a mark-recapture study, and how this assumption might be violated.
What are direct sampling and inverse sampling in the context of a mark-recapture study? Why is the distinction between these two kinds of sampling important?
Understand the purpose and calculation of Cramer’s V.
What is Simpson’s paradox, Berkson’s paradox, the ecological fallacy, a suppressor variable, and a spurious relationship?
Formulas/expressions you should understand when and how to use.
\[ X^2 = \sum \frac{(\text{observed count} - \text{expected count})^2}{\text{expected count}} \] \[ \text{expected count} = \text{probability} \times n \] \[ \text{expected count} = \frac{R \times C}{T} \] \[ \text{df} = \text{number of categories} - 1 \] \[ \text{df} = (r-1)(c-1) \] \[ \hat{N} = \frac{n_1n_2}{m} \] \[ \text{standard error} = \sqrt{\frac{n_1n_2(n_1-m)(n_2-m)}{m^3}} \] \[ \text{standard error} = \sqrt{\frac{n_1^2n_2(n_2-m)}{m^2(m+1)}} \] \[ V = \sqrt{\frac{X^2/n}{\min(r-1,c-1)}} \] \[ X^2 = \frac{(O_{bl}-O_{tr})^2}{O_{bl} + O_{tr}} \]