You can also download a PDF copy of this lecture.
We can derive the probability distribution of a statistic (i.e., a sampling distribution) given the probability distribution of a single observation (i.e., a population distribution) by using the following steps.
Create the sample space which consists of all possible samples.
Compute the probability of each sample in the sample space.
Compute the value of the statistic for each sample in the sample space.
Create a table of the possible values of the statistic.
Compute the probability of each value of the statistic.
A sample space is the set of all possible samples of observations.
Example: Suppose we have a forest of trees of which 60% have a volume of 20 cubic feet, and 40% have a volume of 30 cubic feet. What would be the sampling distribution of the mean volume based on a random sample of the \(n\) = 2 trees? (Note: We are going to assume for now that we are sampling with replacement, meaning that we could select the same tree more than once.)
\(x\) | \(P(x)\) |
---|---|
20 | 0.6 |
30 | 0.4 |
Sample | Probability | \(\bar{x}\) |
---|---|---|
20, 20 | 0.6 \(\times\) 0.6 = 0.36 | 20 |
20, 30 | 0.6 \(\times\) 0.4 = 0.24 | 25 |
30, 20 | 0.4 \(\times\) 0.6 = 0.24 | 25 |
30, 30 | 0.4 \(\times\) 0.4 = 0.16 | 30 |
\(\bar{x}\) | \(P(x)\) |
---|---|
20 | 0.36 |
25 | 0.24 + 0.24 = 0.48 |
30 | 0.16 |
Note: We are using two properties of probabilities from probability theory here.
The probability of two or more events happening together (e.g., A and B) equals the product of their probabilities if the events are independent, meaning that the probability of each event does not change depending on if the other events have or have not occurred.
The probability that at least one of two or more events happening (e.g., A or B) equals the sum of their probabilities if the events are mutually exclusive, meaning that the events cannot both occur at the same time.
Example: What is the sampling distribution of the mean of a sample of \(n\) = 2 observations of the throwing distance of the trebuchet?
\(x\) | \(P(x)\) |
---|---|
1 | 0.1 |
2 | 0.3 |
3 | 0.6 |
Sample | Probability | \(\bar{x}\) |
---|---|---|
1, 1 | 0.01 | 1.0 |
1, 2 | 0.03 | 1.5 |
1, 3 | 0.06 | 2.0 |
2, 1 | 0.03 | 1.5 |
2, 2 | 0.09 | 2.0 |
2, 3 | 0.18 | 2.5 |
3, 1 | 0.06 | 2.0 |
3, 2 | 0.18 | 2.5 |
3, 3 | 0.36 | 3.0 |
\(\bar{x}\) | \(P(\bar{x})\) |
---|---|
1.0 | 0.01 |
1.5 | 0.06 |
2.0 | 0.21 |
2.5 | 0.36 |
3.0 | 0.36 |
Example: We can find the sampling distribution of any statistic in the same way. What is the sampling distribution of the sample variance (\(s^2\)) based on a sample of \(n\) = 2 observations?
Example: What is the sampling distribution of the proportion of female platies preferring the yellow-tailed male from a sample of \(n\) = 3 observations?
\(x\) | \(P(x)\) |
---|---|
C | 0.3 |
Y | 0.7 |
Note: We will denote a proportion from a sample as \(\hat{p}\).
\(x\) | \(P(x)\) |
---|---|
\(S\) | \(p\) |
\(F\) | \(1-p\) |
\(x\) | \(P(x)\) |
---|---|
Y | 0.7 |
C | 0.3 |
Here we are defining \(Y\) as a success and \(C\) as a failure, so \(p\) = 0.7 and \(1-p\) = 0.3.
The sampling distribution of the number of successes (\(s\)) in a sample will be a binomial distribution. The sampling distribution of \(s\) is given by the following equation.1 \[ P(s) = \frac{n!}{s!(n-s)!}p^s(1-p)^{n-s}. \] Two mathematical details to remember when using this formula:
The \(!\) symbol is the factorial operation. For example, \[\begin{align*} 5! & = 5 \times 4 \times 3 \times 2 \times 1 = 120, \\ 4! & = 4 \times 3 \times 2 \times 1 = 24, \\ 3! & = 3 \times 2 \times 1 = 6, \\ 2! & = 2 \times 1 = 2, \\ 1! & = 1, \\ 0! & = 1. \end{align*}\] Note that \(0! = 1\), which is perhaps not intuitive.
For powers remember that any number raised to the power of 1 is that number (i.e., \(p^1 = p\) and \((1-p)^1 = 1-p\)), and any number raised to the power of zero is one (i.e., \(p^0 = 1\) and \((1-p)^0 = 1\)).
Example: What is the sampling distribution of the proportion of female platies preferring the yellow-tailed male from a sample of \(n\) = 3 observations?
\(x\) | \(P(x)\) |
---|---|
C | 0.3 |
Y | 0.7 |
\(s\) | \(\hat{p}\) | \(P(\hat{p})\) |
---|---|---|
0 | 0 | 0.027 |
1 | 1/3 | 0.189 |
2 | 2/3 | 0.441 |
3 | 1 | 0.343 |
Note: We will denote a proportion from a sample as \(\hat{p}\).
Here is how we can compute the probabilities in the sampling distribution of \(\hat{p}\). \[\begin{align*} P(0) & = \underbrace{\frac{3!}{0!(3-0)!}}_{1}\underbrace{0.7^0(1-0.7)^{3-0}}_{0.027} = 1 \times 0.027 = 0.027 \\ P(1) & = \underbrace{\frac{3!}{1!(3-1)!}}_{3}\underbrace{0.7^1(1-0.7)^{3-1}}_{0.063} = 3 \times 0.063 = 0.189 \\ P(2) & = \underbrace{\frac{3!}{2!(3-2)!}}_{3}\underbrace{0.7^2(1-0.7)^{3-2}}_{0.147} = 3 \times 0.147 = 0.441 \\ P(3) & = \underbrace{\frac{3!}{3!(3-3)!}}_{1}\underbrace{0.7^3(1-0.7)^{3-3}}_{0.343} = 1 \times 0.343 = 0.343 \end{align*}\] Note that the formula computes two parts — the number of samples that produce \(s\) successes out of \(n\) observations, and the probability of each sample. These can be seen when we look at the sample space.Sample | Probability | \(s\) | \(\hat{p}\) |
---|---|---|---|
Y, Y, Y | 0.7 \(\times\) 0.7 \(\times\) 0.7 = 0.343 | 3 | 1 |
C, Y, Y | 0.3 \(\times\) 0.7 \(\times\) 0.7 = 0.147 | 2 | 2/3 |
Y, C, Y | 0.7 \(\times\) 0.3 \(\times\) 0.7 = 0.147 | 2 | 2/3 |
Y, Y, C | 0.7 \(\times\) 0.7 \(\times\) 0.3 = 0.147 | 2 | 2/3 |
Y, C, C | 0.7 \(\times\) 0.3 \(\times\) 0.3 = 0.063 | 1 | 1/3 |
C, Y, C | 0.3 \(\times\) 0.7 \(\times\) 0.3 = 0.063 | 1 | 1/3 |
C, C, Y | 0.3 \(\times\) 0.3 \(\times\) 0.7 = 0.063 | 1 | 1/3 |
C, C, C | 0.3 \(\times\) 0.3 \(\times\) 0.3 = 0.027 | 0 | 0 |
Sometimes we write this as \[ P(s) = \binom{n}{s}p^s(1-p)^{n-s}, \] because \(\binom{n}{s} = \frac{n!}{s!(n-s)!}\). The \(\binom{n}{s}\) is called the binomial coefficient. Also usually this formula is written with \(x\) in place of \(s\), but I have used \(s\) to emphasize that the the formula computes the probability of the number of successes.↩︎