You can also download a PDF copy of this lecture.
A significance test can be used to derive a confidence interval, and a confidence interval can be used to conduct a significance test. If we have hypotheses for a two-sided test like \[ H_0\!: \beta_j = c \ \ \text{and} \ \ H_a\!: \beta_j \neq c, \] then we reject \(H_0\) if and only if the confidence interval for \(\beta_j\) does not contain \(c\), with a couple of caveats.
The confidence level must be \((1-\alpha)100\)% (\(\alpha\) is the significance level).
The test is two-sided (but one-sided tests match one-sided confidence intervals).
A confidence interval with confidence level \((1-\alpha)100\)% effectively defines all values of the parameter that would not be rejected in a two-sided test with significance level \(\alpha\).
Note that this also applies to a linear function of model parameters (\(\ell\)). So if we have the hypotheses \[ H_0\!: \ell = c \ \ \text{and} \ \ H_a\!: \ell \neq c, \] then we reject \(H_0\) if and only if the confidence interval for \(\ell\) does not contain \(c\).
Example: Consider again the model for the
anorexia
data, but parameterized to compare the two
treatment conditions against the control so that the model is \[
E(Y_i) =
\begin{cases}
\beta_0, & \text{if the $i$-th observation is under the control
condition}, \\
\beta_0 + \beta_1, & \text{if $i$-th observation under cognitive
behavioral therapy}, \\
\beta_0 + \beta_2, & \text{if the $i$-th observations is under
family therapy}. \\
\end{cases}
\]
library(MASS) # for anorexia data
anorexia$change <- anorexia$Postwt - anorexia$Prewt
anorexia$Treat <- relevel(anorexia$Treat, ref = "Cont")
m <- lm(change ~ Treat, data = anorexia)
cbind(summary(m)$coefficients, confint(m))
Estimate Std. Error t value Pr(>|t|) 2.5 % 97.5 %
(Intercept) -0.450 1.476 -0.3048 0.761447 -3.3954 2.495
TreatCBT 3.457 2.033 1.7001 0.093608 -0.5994 7.513
TreatFT 7.715 2.348 3.2854 0.001602 3.0302 12.399
We can produce the same inferences using contrast
.
library(trtools)
contrast(m,
a = list(Treat = c("CBT","FT")),
b = list(Treat = "Cont"),
cnames = c("Cognitive vs Control", "Family vs Control"))
estimate se lower upper tvalue df pvalue
Cognitive vs Control 3.457 2.033 -0.5994 7.513 1.700 69 0.093608
Family vs Control 7.715 2.348 3.0302 12.399 3.285 69 0.001602
Example: Consider the following model and hypotheses
for the anorexia
data.
library(MASS) # for anorexia data
anorexia$change <- anorexia$Postwt - anorexia$Prewt
m.anorexia <- lm(change ~ Treat, data = anorexia)
summary(m.anorexia)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.007 1.398 2.151 0.03499
TreatCont -3.457 2.033 -1.700 0.09361
TreatFT 4.258 2.300 1.852 0.06838
The model is therefore \[ E(Y_i) = \begin{cases} \beta_0, & \text{if the $i$-th observation is under cognitive behavioral therapy}, \\ \beta_0 + \beta_1, & \text{if $i$-th observation is under the control condition}, \\ \beta_0 + \beta_2, & \text{if the $i$-th observations is under family therapy}. \\ \end{cases} \] In some cases we might be testing hypothesis like \(H_0: \beta_2 = 0\) or \(H_0: \beta_1 - \beta_2 = 0\). But in other cases we might be testing what is sometimes called a joint hypothesis such as \[ H_0\!: \beta_1 = 0 \text{ and } \beta_2 = 0 \ \ \text{versus} \ \ H_a\!: \text{not both $\beta_1 = 0$ and $\beta_2 = 0$}. \] What does it imply if both \(\beta_1 = 0\) and \(\beta_2 = 0\)?
Example: Consider the following model for the
whiteside
data.
m.insulation <- lm(Gas ~ Insul + Temp + Insul:Temp, data = whiteside)
summary(m.insulation)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.8538 0.13596 50.409 7.997e-46
InsulAfter -2.1300 0.18009 -11.827 2.316e-16
Temp -0.3932 0.02249 -17.487 1.976e-23
InsulAfter:Temp 0.1153 0.03211 3.591 7.307e-04
The model is therefore \[ E(Y_i) = \begin{cases} \beta_0 + \beta_2 t_i, & \text{if $i$-th observation is before insulation}, \\ \beta_0 + \beta_1 + (\beta_2 + \beta_3) t_i, & \text{if $i$-th observation is after insulation}. \end{cases} \] We might test a single null hypothesis that the rate of change in expected gas consumption with respect to temperature is the same before and after insulation — i.e., \(H_0: \beta_3 = 0\). But consider the joint hypothesis \[ H_0\!: \beta_1 = 0 \text{ and } \beta_3 = 0 \ \ \text{versus} \ \ H_a\!: \text{not both $\beta_1 = 0$ and $\beta_3 = 0$}. \] What does it imply if both \(\beta_1 = 0\) and \(\beta_3 = 0\)?
Calculations for inference for linear models is often based on the sums of squares decomposition \[ \underbrace{\sum_{i=1}^n (y_i - \bar{y})^2}_{\text{total}} = \underbrace{\sum_{i=1}^n (\hat{y}_i - \bar{y})^2}_{\text{model/regression}} + \underbrace{\sum_{i=1}^n (y_i - \hat{y}_i)^2}_{\text{error/residual}}, \] where \(\hat{y}_i = \hat\beta_0 + \hat\beta_1x_{i1} + \cdots + \hat\beta_kx_{ik}\), and the degrees of freedom decomposition \[ \underbrace{n-1}_{\text{total}} = \underbrace{p - 1}_{\text{model/regression}} + \underbrace{n - p}_{\text{error/residual}}, \] where \(p\) is the number of \(\beta_j\) parameters, and \(p = k + 1\) if the model includes a \(\beta_0\). (Note: If the \(\beta_0\) parameter is omitted from the model, the total degrees of freedom becomes \(n\) and the model/regression degrees of freedom becomes \(p\).)
A mean square is a variance-like quantity that is a sum of squares divided by its corresponding degrees of freedom.
Tests can be conducted using the \(F\) test statistic which can be written as \[ F = \frac{(\text{RSS}_{\text{null}} - \text{RSS}_{\text{full}})/(\text{RDF}_{\text{null}} - \text{RDF}_{\text{full}})}{\text{RSS}_{\text{full}}/\text{RDF}_{\text{full}}}, \] where \(\text{RSS}\) and \(\text{RDF}\) refer to the residual sum of squares and degrees of freedom, respectively. The degrees of freedom for the \(F\) distribution are \(\text{RDF}_{\text{null}} - \text{RDF}_{\text{full}}\) (numerator) and \(\text{RSS}_{\text{full}}\) (denominator). The full model is the model we are using, and the null (aka “reduced”) model is what the full model reduces to if the null hypothesis is true. The \(F\) test statistic can be used for tests of individual and joint hypotheses in linear models.
anova
FunctionThe anova
function is particularly useful for testing
joint hypothesis, although it can also be used to test a hypothesis
about a single parameter.
Applying anova
to a single model will produce the RSS
and RDF in the Residuals
row.
anova(m.anorexia)
Analysis of Variance Table
Response: change
Df Sum Sq Mean Sq F value Pr(>F)
Treat 2 615 307.3 5.42 0.0065 **
Residuals 69 3911 56.7
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To conduct a test, the recommended approach is to apply
anova
to a null model and the full model.
m.full <- lm(change ~ Treat, data = anorexia)
m.null <- lm(change ~ 1, data = anorexia) # use ~ 1 if no explanatory variables
anova(m.null, m.full)
Analysis of Variance Table
Model 1: change ~ 1
Model 2: change ~ Treat
Res.Df RSS Df Sum of Sq F Pr(>F)
1 71 4525
2 69 3911 2 615 5.42 0.0065 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m.full <- lm(Gas ~ Insul + Temp + Insul:Temp, data = whiteside)
m.null <- lm(Gas ~ Temp, data = whiteside)
anova(m.null, m.full)
Analysis of Variance Table
Model 1: Gas ~ Temp
Model 2: Gas ~ Insul + Temp + Insul:Temp
Res.Df RSS Df Sum of Sq F Pr(>F)
1 54 40.0
2 52 5.4 2 34.6 166 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The anova
function can also do a test concerning a
single parameter. Here are two approaches to testing the null hypothesis
that \(\beta_3 = 0\) in the model \[
E(Y_i) =
\begin{cases}
\beta_0 + \beta_2 t_i, & \text{if $i$-th observation is
before insulation}, \\
\beta_0 + \beta_1 + (\beta_2 + \beta_3) t_i, & \text{if
$i$-th observation is after insulation}.
\end{cases}
\]
m.full <- lm(Gas ~ Insul + Temp + Insul:Temp, data = whiteside)
m.null <- lm(Gas ~ Insul + Temp, data = whiteside)
anova(m.null, m.full)
Analysis of Variance Table
Model 1: Gas ~ Insul + Temp
Model 2: Gas ~ Insul + Temp + Insul:Temp
Res.Df RSS Df Sum of Sq F Pr(>F)
1 53 6.77
2 52 5.43 1 1.34 12.9 0.00073 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(m.full)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.8538 0.13596 50.409 7.997e-46
InsulAfter -2.1300 0.18009 -11.827 2.316e-16
Temp -0.3932 0.02249 -17.487 1.976e-23
InsulAfter:Temp 0.1153 0.03211 3.591 7.307e-04
Comment: When conducting a test concerning one parameter (or a single linear function of the model parameters), the \(F\) and \(t\) test statistics have the relationship \(t^2 = F\) and produce the same p-values.
Consider again the model for the anorexia
data, but
suppose we parameterized the model differently.
anorexia$Treat <- relevel(anorexia$Treat, ref = "Cont")
m.anorexia <- lm(change ~ Treat, data = anorexia)
summary(m.anorexia)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.450 1.476 -0.3048 0.761447
TreatCBT 3.457 2.033 1.7001 0.093608
TreatFT 7.715 2.348 3.2854 0.001602
The model is therefore \[ E(Y_i) = \begin{cases} \beta_0, & \text{if the $i$-th observation is from the control group}, \\ \beta_0 + \beta_1, & \text{if the $i$-th observation is from the cognitive-behavioral therapy group}, \\ \beta_0 + \beta_2, & \text{if the $i$-th observations is from the family therapy group}. \\ \end{cases} \] Now consider a test of the null hypothesis that the expected weight change is the same regardless of which of the two therapies (i.e., cognitive-behavioral or family) is used. This is the null hypothesis that \(\beta_1 = \beta_2\) or, equivalently, \(\beta_1 - \beta_2 = 0\).
Using lincon
we can test this null hypothesis as
follows.
m <- lm(change ~ Treat, data = anorexia)
trtools::lincon(m, a = c(0, 1, -1))
estimate se lower upper tvalue df pvalue
(0,1,-1),0 -4.258 2.3 -8.845 0.3299 -1.852 69 0.06838
This is because the null hypothesis can be written as \[ \ell = 0 \times \beta_0 + 1 \times \beta_1 + (-1) \times \beta_2 = \beta_1 - \beta_2. \]
Using contrast
we can test this null hypothesis as
follows.
m <- lm(change ~ Treat, data = anorexia)
trtools::contrast(m, a = list(Treat = "CBT"), b = list(Treat = "FT"))
estimate se lower upper tvalue df pvalue
-4.258 2.3 -8.845 0.3299 -1.852 69 0.06838
Using anova
we can test this null hypothesis as
follows.
anorexia$therapy <- ifelse(anorexia$Treat == "Cont", "control", "therapy")
head(anorexia)
Treat Prewt Postwt change therapy
1 Cont 80.7 80.2 -0.5 control
2 Cont 89.4 80.1 -9.3 control
3 Cont 91.8 86.4 -5.4 control
4 Cont 74.0 86.3 12.3 control
5 Cont 78.1 76.1 -2.0 control
6 Cont 88.3 78.1 -10.2 control
tail(anorexia)
Treat Prewt Postwt change therapy
67 FT 82.1 95.5 13.4 therapy
68 FT 77.6 90.7 13.1 therapy
69 FT 83.5 92.5 9.0 therapy
70 FT 89.9 93.8 3.9 therapy
71 FT 86.0 91.7 5.7 therapy
72 FT 87.3 98.0 10.7 therapy
m.full <- lm(change ~ Treat, data = anorexia)
m.null <- lm(change ~ therapy, data = anorexia)
anova(m.null, m.full)
Analysis of Variance Table
Model 1: change ~ therapy
Model 2: change ~ Treat
Res.Df RSS Df Sum of Sq F Pr(>F)
1 70 4105
2 69 3911 1 194 3.43 0.068 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note that the null model can be written as \[ E(Y_i) = \begin{cases} \beta_0, & \text{if the $i$-th observation is from the control group}, \\ \beta_0 + \beta_1, & \text{if the $i$-th observation is from the therapy group}, \end{cases} \] or \[ E(Y_i) = \begin{cases} \beta_0, & \text{if the $i$-th observation is from the control group}, \\ \beta_0 + \beta_1, & \text{if the $i$-th observation is from the cognitive-behavioral therapy group}, \\ \beta_0 + \beta_1, & \text{if the $i$-th observations is from the family therapy group}. \\ \end{cases} \] So this model is effectively equivalent to the full model with \(\beta_1 = \beta_2\).
I do not recommended trying to produce tests by
applying anova
to a single model object. While it
can produce desired tests in some cases and if used
correctly, it often produces confusing results. For example, the
following produces a test of the null hypothesis \(H_0: \beta_1 = 0, \beta_2 = 0\) for the
anorexia
model.
m <- lm(change ~ Treat, data = anorexia)
anova(m)
Analysis of Variance Table
Response: change
Df Sum Sq Mean Sq F value Pr(>F)
Treat 2 615 307.3 5.42 0.0065 **
Residuals 69 3911 56.7
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
But the tests shown here are maybe not what you think they are.
m <- lm(Gas ~ Insul + Temp + Insul:Temp, data = whiteside)
anova(m)
Analysis of Variance Table
Response: Gas
Df Sum Sq Mean Sq F value Pr(>F)
Insul 1 22.3 22.3 214.2 < 2e-16 ***
Temp 1 45.9 45.9 439.9 < 2e-16 ***
Insul:Temp 1 1.3 1.3 12.9 0.00073 ***
Residuals 52 5.4 0.1
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If you know what you are doing, there are alternatives to
anova
that are perhaps better (e.g., the Anova
function from the car package), but there is often a
more clear approach using two models in anova
, using
contrast
or lincon
, or using the
emmeans package (which we will discuss later).
Note: Another potentially confusing test is one that
appears at the bottom of summary
. It tests the null
hypothesis that all \(\beta_j\) (except
\(\beta_0\)) equal zero. For the model
for the anorexia
data it is the same as the test conducted
earlier.
m <- lm(change ~ Treat, data = anorexia)
summary(m)
Call:
lm(formula = change ~ Treat, data = anorexia)
Residuals:
Min 1Q Median 3Q Max
-12.56 -4.54 -1.01 3.85 17.89
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.45 1.48 -0.30 0.7614
TreatCBT 3.46 2.03 1.70 0.0936 .
TreatFT 7.71 2.35 3.29 0.0016 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.53 on 69 degrees of freedom
Multiple R-squared: 0.136, Adjusted R-squared: 0.111
F-statistic: 5.42 on 2 and 69 DF, p-value: 0.0065
But for the model for the whiteside
data the utility of
this test is questionable.
m <- lm(Gas ~ Insul + Temp + Insul:Temp, data = whiteside)
summary(m)
Call:
lm(formula = Gas ~ Insul + Temp + Insul:Temp, data = whiteside)
Residuals:
Min 1Q Median 3Q Max
-0.9780 -0.1801 0.0376 0.2093 0.6380
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.8538 0.1360 50.41 < 2e-16 ***
InsulAfter -2.1300 0.1801 -11.83 2.3e-16 ***
Temp -0.3932 0.0225 -17.49 < 2e-16 ***
InsulAfter:Temp 0.1153 0.0321 3.59 0.00073 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.323 on 52 degrees of freedom
Multiple R-squared: 0.928, Adjusted R-squared: 0.924
F-statistic: 222 on 3 and 52 DF, p-value: <2e-16
Just because R gives you output does not mean it is useful!
Note: The Residual standard error
shown
by summary
is the square root of the residual/error mean
square (i.e., the square root of the ratio of the residual sum of
squares to the residual degrees of freedom). The degrees of freedom
shown after Residual standard error
is the residual degrees
of freedom.