You can also download a PDF copy of this lecture.

Rate Ratios (Quantitative Explanatory Variable)

Consider the model \[ \log E(Y) = \beta_0 + \beta_1 x, \] and let \[ \log E(Y_a) = \beta_0 + \beta_1 (x+1) \ \ \ \text{and} \ \ \ \log E(Y_b) = \beta_0 + \beta_1 x \] for an arbitrary value of \(x\). Then the difference in the log of the expected values is \[ \log E(Y_a) - \log E(Y_b) = \underbrace{\beta_0 + \beta_1 (x+1)}_{\log E(Y_a)} - \underbrace{(\beta_0 + \beta_1 x)}_{\log E(Y_b)} = \beta_1, \] meaning that \(\beta_1\) is the additive change in \(\log E(Y)\) per unit increase in \(x\).

Now consider the same model written as \[ E(Y) = e^{\beta_0}e^{\beta_1 x}, \] and let \[ E(Y_a) = e^{\beta_0}e^{\beta_1 (x+1)} \ \ \ \text{and} \ \ \ E(Y_b) = e^{\beta_0}e^{\beta_1 x} \] for an arbitrary value of \(x\). Then the ratio of the expected values is \[ \frac{E(Y_a)}{E(Y_b)} = \frac{\overbrace{e^{\beta_0}e^{\beta_1 (x+1)}}^{E(Y_a)}}{\underbrace{e^{\beta_0}e^{\beta_1 x}}_{E(Y_b)}} = \frac{e^{\beta_0}e^{\beta_1 x}e^{\beta_1}}{e^{\beta_0}e^{\beta_1 x}} = e^{\beta_1} \Rightarrow E(Y_a) = E(Y_b)e^{\beta_1}, \] so that \(E(Y)\) changes by a factor of \(e^{\beta_1}\) per unit increase in \(x\). The “exponentiated” parameter, \(e^{\beta_1}\), is sometimes called a “rate ratio” because it is often the ratio of two rates when the counts are per unit space, time, or something else.

Example: Consider again the ceriodaphniastrain data and model.

library(trtools)
ceriodaphniastrain$strainf <- factor(ceriodaphniastrain$strain, 
  labels = c("a","b"))
m <- glm(count ~ concentration + strainf, 
  family = poisson, data = ceriodaphniastrain) # log link is default
cbind(summary(m)$coefficients, confint(m))
              Estimate Std. Error z value   Pr(>|z|)  2.5 %  97.5 %
(Intercept)      4.455    0.03914 113.819  0.000e+00  4.377  4.5306
concentration   -1.543    0.04660 -33.111 2.057e-240 -1.635 -1.4522
strainfb        -0.275    0.04837  -5.684  1.313e-08 -0.370 -0.1803
exp(cbind(coef(m), confint(m))) # coef extracts the parameter estimates only
                        2.5 % 97.5 %
(Intercept)   86.0252 79.6152 92.817
concentration  0.2137  0.1950  0.234
strainfb       0.7596  0.6907  0.835

Note: It only makes sense to apply the exponential function to the point estimates and the endpoints of the confidence interval. A standard error of \(e^{\hat\beta_1}\) could be obtained, but it is not equal to the exponentiated standard error of \(\hat\beta_1\). A test concerning \(e^{\beta_1}\) can be done using either the confidence interval or by stated the hypotheses in terms of \(\beta_1\) (e.g., the null hypothesis that \(e^{\beta_1} = 1\) is the same as the null hypothesis that \(\beta_1 = 0\)).

Another approach is to use lincon and the tf (transformation function) argument.

lincon(m, tf = exp)
              estimate   lower   upper
(Intercept)    86.0252 79.6730 92.8838
concentration   0.2137  0.1951  0.2342
strainfb        0.7596  0.6909  0.8351

Note that the confidence interval endpoints are not quite the same as what we obtained using confint. This is because confint and lincon use different approaches to confidence intervals (more on that later).

Example: Consider a model for the expected number of matings of African elephants as a function of age.

library(Sleuth3)
head(case2201)
  Age Matings
1  27       0
2  28       1
3  28       1
4  28       1
5  28       3
6  29       0
m <- glm(Matings ~ Age, family = poisson, data = case2201)
cbind(summary(m)$coefficients, confint(m))
            Estimate Std. Error z value  Pr(>|z|)    2.5 %   97.5 %
(Intercept) -1.58201    0.54462  -2.905 3.675e-03 -2.66670 -0.52893
Age          0.06869    0.01375   4.997 5.812e-07  0.04168  0.09564
exp(cbind(m$coefficients, confint(m))) 
                     2.5 % 97.5 %
(Intercept) 0.2056 0.06948 0.5892
Age         1.0711 1.04256 1.1004

Percent Change (Quantitative Explanatory Variable)

The percent change in the expected response is \[ 100\% \times \left[\frac{E(Y_a)-E(Y_b)}{E(Y_b)}\right] = 100\% \times \left[E(Y_a)/E(Y_b) - 1\right], \]
where \(E(Y_a)\) and \(E(Y_b)\) are the expected responses at two different points (\(a\) and \(b\)) defined in terms of the explanatory variable(s).

  1. Note that if this is positive then it is a percent increase, whereas if it is negative then it is a percent decrease.

  2. The ratio \(E(Y_a)/E(Y_b)\) is the rate ratio.

Example: Suppose we have the model \(\log E(Y) = \beta_0 + \beta_1 x\) where \(x\) is a quantitative variable and \(\beta_1 = 0.22\). Then \(e^{\beta_1} \approx 1.25\). So when \(x\) increases by one unit (i.e., to \(x + 1\)), — i.e., from \(E(Y_b) = e^{\beta_0}e^{\beta_1x}\) to \(E(Y_a) = e^{\beta_0}e^{\beta_1(x+1)}\) then the expected response increases by a factor of \[ E(Y_a)/E(Y_b) = e^{\beta_1} \approx 1.25, \] and because \[ 100\% \times \left[1.25 - 1\right] = 25\%. \]
we can say that it increases by 25%.

Example: Consider again the model for the elephant mating data.

m <- glm(Matings ~ Age, family = poisson, data = case2201)
exp(cbind(m$coefficients, confint(m))) 
                     2.5 % 97.5 %
(Intercept) 0.2056 0.06948 0.5892
Age         1.0711 1.04256 1.1004

The percent change in the expected count per unit (year) increase in Age is approximately 100%(1.07 - 1) = 7% (i.e., a 7% increase).

Example: Suppose we have the model \(\log E(Y) = \beta_0 + \beta_1 x\) where \(x\) is a quantitative variable and \(\beta_1 = -0.22\). Then \(e^{\beta_1} \approx 0.8\). So when \(x\) increases by one unit (i.e., to \(x + 1\)), — i.e., from \(E(Y_b) = e^{\beta_0}e^{\beta_1x}\) to \(E(Y_a) = e^{\beta_0}e^{\beta_1(x+1)}\) then the expected response decreases by a factor of \[ E(Y_a)/E(Y_b) = e^{\beta_1} \approx 0.8, \] or because \[ 100\% \times \left[0.8 - 1\right] = -20\% \]
we can say that it decreases by 20%.

Example: Consider again the model for the ceriodaphniastrain data.

m <- glm(count ~ concentration + strainf, family = poisson, data = ceriodaphniastrain) 
exp(cbind(coef(m), confint(m)))
                        2.5 % 97.5 %
(Intercept)   86.0252 79.6152 92.817
concentration  0.2137  0.1950  0.234
strainfb       0.7596  0.6907  0.835

The percent change in the expected count per unit increase in concentration is approximately 100%(0.21 - 1) = -79% (i.e., a 79% decrease or reduction).

Rate Ratios (Categorical Explanatory Variable)

Consider the model \[ \log E(Y) = \beta_0 + \beta_1 x, \ \ \text{or, equivalently,} \ \ E(Y) = e^{\beta_0}e^{\beta_1 x}, \] where \[ x = \begin{cases} 1, & \text{if the observation is in group $a$}, \\ 0, & \text{if the observation is in group $b$}. \end{cases} \] Then \[ E(Y) = \begin{cases} e^{\beta_0}e^{\beta_1}, & \text{if the observation is in group $a$}, \\ e^{\beta_0}, & \text{if the observation is in group $b$}. \end{cases} \] Let \[ E(Y_a) = e^{\beta_0}e^{\beta_1} \ \ \ \text{and} \ \ \ E(Y_b) = e^{\beta_0}. \] Then the ratio of the expected values is \[ \frac{E(Y_a)}{E(Y_b)} = \frac{e^{\beta_0}e^{\beta_1}}{e^{\beta_0}} = e^{\beta_1} \Leftrightarrow E(Y_a) = E(Y_b)e^{\beta_1} \] so that \(E(Y_a)\) is \(e^{\beta_1}\) times that of \(E(Y_b)\). Also \[ \frac{E(Y_b)}{E(Y_a)} = \frac{e^{\beta_0}}{e^{\beta_0}e^{\beta_1}} = \frac{1}{e^{\beta_1}} = e^{-\beta_1}. \] so that \(E(Y_b)\) is \(1/e^{\beta_1}\) times that of \(E(Y_a)\).

Example: Consider again the ceriodaphniastrain data and model.

m <- glm(count ~ concentration + strainf, 
  family = poisson, data = ceriodaphniastrain) 
cbind(summary(m)$coefficients, confint(m))
              Estimate Std. Error z value   Pr(>|z|)  2.5 %  97.5 %
(Intercept)      4.455    0.03914 113.819  0.000e+00  4.377  4.5306
concentration   -1.543    0.04660 -33.111 2.057e-240 -1.635 -1.4522
strainfb        -0.275    0.04837  -5.684  1.313e-08 -0.370 -0.1803
exp(cbind(coef(m), confint(m)))
                        2.5 % 97.5 %
(Intercept)   86.0252 79.6152 92.817
concentration  0.2137  0.1950  0.234
strainfb       0.7596  0.6907  0.835

Alternatively we can parameterize the model.

ceriodaphniastrain$strainf <- relevel(ceriodaphniastrain$strainf, ref = "b")
m <- glm(count ~ concentration + strainf, 
  family = poisson, data = ceriodaphniastrain) 
cbind(summary(m)$coefficients, confint(m))
              Estimate Std. Error z value   Pr(>|z|)   2.5 % 97.5 %
(Intercept)      4.180    0.04303  97.137  0.000e+00  4.0945  4.263
concentration   -1.543    0.04660 -33.111 2.057e-240 -1.6349 -1.452
strainfa         0.275    0.04837   5.684  1.313e-08  0.1803  0.370
exp(cbind(coef(m), confint(m)))
                       2.5 % 97.5 %
(Intercept)   65.3444 60.008 71.034
concentration  0.2137  0.195  0.234
strainfa       1.3165  1.198  1.448

Example: Consider these data from a stratified random sampling design and a Poisson regression model.

library(trtools)
library(ggplot2) 
p <- ggplot(daphniastrat, aes(x = layer, y = count)) + 
  geom_dotplot(binaxis = "y", binwidth = 1, stackdir = "center") + 
  labs(x = "Layer", y = "Number of Daphnia") + theme_minimal()
plot(p)

daphniastrat$layer <- relevel(daphniastrat$layer, ref = "thermocline")
m <- glm(count ~ layer, family = poisson, data = daphniastrat)
summary(m)$coefficients
                 Estimate Std. Error z value   Pr(>|z|)
(Intercept)        2.4248    0.09407  25.776 1.648e-146
layerepilimnion    0.5456    0.10683   5.107  3.272e-07
layerhypolimnion  -1.8748    0.21751  -8.619  6.745e-18
exp(cbind(coef(m), confint(m)))
                           2.5 %  97.5 %
(Intercept)      11.3000 9.34251 13.5134
layerepilimnion   1.7257 1.40501  2.1367
layerhypolimnion  0.1534 0.09808  0.2309

Percent Larger/Smaller (Categorical Explanatory Variable)

The percent change in the expected response is \[ 100\% \times \left[\frac{E(Y_a)-E(Y_b)}{E(Y_b)}\right] = 100\% \times \left[E(Y_a)/E(Y_b) - 1\right], \]
where \(E(Y_a)\) and \(E(Y_b)\) are the expected responses at two different points (\(a\) and \(b\)) defined in terms of the explanatory variable(s).

  1. Note that if this is positive then \(E(Y_a)\) is that percent larger than \(E(Y_b)\), whereas if this is negative then \(E(Y_b)\) is that percent smaller than \(E(Y_a)\).

  2. The ratio \(E(Y_a)/E(Y_b)\) is the rate ratio.

Example: Suppose we have the model \(\log E(Y) = \beta_0 + \beta_1 x\) where \(x\) is an indicator variable for category \(a\) and \(\beta_1 = 0.22\). Then \(e^{\beta_1} \approx 1.25\), \(E(Y_a) = e^{\beta_0}e^{\beta_1}\) and \(E(Y_b) = e^{\beta_0}\), and \(E(Y_a)\) is about 1.25 times larger than \(E(Y_b)\) because \[ E(Y_a)/E(Y_b) = e^{\beta_1} \approx 1.25, \] and because \[ 100\% \times \left[1.25 - 1\right] = 25\%. \]
we can say that \(E(Y_a)\) is about 25% larger than \(E(Y_b)\).

Example: Suppose we have the model \(\log E(Y) = \beta_0 + \beta_1 x\) where \(x\) is an indicator variable for category \(a\) and \(\beta_1 = -0.22\). Then \(e^{\beta_1} \approx 0.8\), \(E(Y_a) = e^{\beta_0}e^{\beta_1}\) and \(E(Y_b) = e^{\beta_0}\), and \(E(Y_a)\) is about 0.8 times smaller than \(E(Y_b)\) because \[ E(Y_a)/E(Y_b) = e^{\beta_1} \approx 0.8, \] and because \[ 100\% \times \left[0.8 - 1\right] = -20\%. \]
we can say that \(E(Y_a)\) is about 20% smaller than \(E(Y_b)\).

Example: Consider again the model for the daphnia data.

exp(cbind(coef(m), confint(m)))
                           2.5 %  97.5 %
(Intercept)      11.3000 9.34251 13.5134
layerepilimnion   1.7257 1.40501  2.1367
layerhypolimnion  0.1534 0.09808  0.2309

The expected number of daphnia per liter in the epilimnion layer is estimated to be about 100%(1.73-1) = 73% more than in the thermocline layer. And because 100%(0.15-1) = -85% we estimate that the the expected number of daphia per liter in the hypolimnion layer is 85% less than it is in the thermocline layer.