The linear regression model has the form \[ \mu_y = \alpha + \beta x, \] where \(\mu_y\) is the mean of the population distribution of the response variable \(y\) (e.g., mean tree volume), and \(x\) is the value of the explanatory variable (e.g., tree girth). The quantities \(\alpha\) and \(\beta\) are the intercept and slope parameters, respectively.
Study Question: What do the four symbols in \(\mu_y = \alpha + \beta x\) represent?
Example: The plot below shows the data from a study
of the relationship between the number of chromosomal abnormalities per
cell (\(\mu_y\)) and the rate of
exposure to gamma radiation (\(x\)).
But this relationship was studied at three different total dose amounts.
Three linear regression models are used here.
The multiple linear regression model has the form \[ \mu_y = \alpha + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k \] where \(x_1, x_2, \dots x_k\) are the values of \(k\) explanatory variables. For example, we might have \[ \mu_y = \alpha + \beta_1 x_1 + \beta_2 x_2 \] where \(\mu_y\) is the mean of the population distribution of the response variable \(y\) (e.g., mean tree volume), \(x_1\) is the value of one explanatory variable (e.g., tree girth), and \(x_2\) is the value of a second explanatory variable (e.g., tree height).
The generic term linear regression is usually used to refer to the case where there is one or more explanatory variables. The case where there is only one explanatory variable is sometimes referred to as simple linear regression.
Study Question: How is multiple linear regression different from simple linear regression?
A nonlinear regression model is any regression model that cannot be written as \[ \mu_y = \alpha + \beta x \] or \[ \mu_y = \alpha + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k. \]
Study Question: What is nonlinear regression?
Example: In biochemistry, the relationship between
the mean reaction rate (\(\mu_y\)) and
the concentration of a substrate (\(x\)) is often modeled as \[
\mu_y = \frac{\delta x}{\gamma + x}.
\] Here \(\delta\) is the
maximum achievable mean reaction rate, and \(\gamma\) is the substrate concentration
that yields a mean reaction rate half way between 0 and \(\delta\).
Example: In fisheries science, a nonlinear
regression model (the von Bertalanffy model) is used to model
the relationship between mean length (\(\mu_y\)) and age (\(x\)) of fish. This model can be written as
\[
\mu_y = \alpha + (\delta - \alpha)e^{-x\log(2)/\gamma}.
\] Here \(\alpha\) is the
maximum value of \(\mu_y\) that we
approach as fish age, \(\delta\) is the
value of \(\mu_y\) before they reach
one year of age, and \(\gamma\) is how
many years it takes for \(\mu_y\) to be
half way between \(\delta\) and \(\alpha\).
What if we have one or more categorical explanatory variables? Regression can accommodate categorical explanatory variables using some tricks. But often the statistical methodology is described as the analysis of variance (ANOVA).
Example: The dot plots below show four samples of
observations of the variable anger reduction. The four samples
correspond to four levels of a categorical treatment variable of
anger management exercises (none, physical, behavioral, and
both physical and behavioral).
Here are some descriptive statistics for each group.
Group | n | mean | sd |
---|---|---|---|
None | 10 | -0.2 | 1.5 |
Physical | 10 | 0.8 | 1.0 |
Behavioral | 10 | 3.1 | 2.6 |
Both | 10 | 4.1 | 2.1 |
Possible research questions to address using statistical inference:
Example: The histograms below show data from an
observational study of the effect of warning signs on car speed. Here
there are two categorical explanatory variables: warning (yes
or no) and period (before, short, and long).
Here are some descriptive statistics for each group.
warning | period | n | mean | sd |
---|---|---|---|---|
yes | before | 1,400 | 36.5 | 6.0 |
yes | short | 1,400 | 35.8 | 6.1 |
yes | long | 1,362 | 37.7 | 6.4 |
no | before | 1,400 | 38.2 | 6.6 |
no | short | 1,400 | 39.2 | 6.8 |
no | long | 1,475 | 39.5 | 6.4 |
Possible research questions to address using statistical inference:
Study Question: When would a researcher use an analysis of variance?