ST411 Weeks 7-8: Models for count responses

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/35

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

36 Terms

New cards

POISSON AND NEGATIVE BINOMIAL MODELS FOR SINGLE COUNTS

New cards

Poisson example: RAND Health Insurance Experiment

New cards

What are the key properties of the Poisson distribution?

It models count data where Y = 0, 1, 2, \dots with parameter \mu

Probability: P(Y = y) = \frac{e^{-\mu} \mu^y}{y!}

Mean and variance are both equal to \mu

It belongs to the two-parameter exponential family (but without a dispersion parameter \phi)

New cards

What is the probability formula for Poisson distribution

Probability: P(Y = y) = \frac{e^{-\mu} \mu^y}{y!}

New cards

What is the mean and variance of a Poisson distribution?

Mean and variance are both equal to \mu

New cards

What is a Poisson log-linear model for counts?

This is a regression model using the Poisson distribution. It’s a GLM where Y_i \sim \text{Poisson}(\mu_i) and

\log(\mu_i) = x_i' \beta — the log link is the canonical link

\log(\lambda_i) = \beta_0 + \beta_1 x_{1i} + \cdots + \beta_p x_{pi}

Implies \mu_i = \exp(x_i' \beta)

\lambda_i = \exp(\beta_0 + \beta_1 x_{1i} + \cdots + \beta_p x_{pi})

Estimated using maximum likelihood, with likelihood-based inference

New cards

What are we modelling in the R command?

Poisson log-linear model, with response variable MDU - number of outpatient visits to a medical doctor in a year

New cards

Interpret the coefficient on IDP (having a deductible)

Controlling for age and education, the expected number of visits to a doctor for a person on a health insurance plan with an individual deductible is 0.833 times the number for a person on a plan without an individual deductible.

In short, being on a plan with an IDP reduces the expected annual number of visits to a doctor by about 16.7%, controlling for age and education.

New cards

Interpret the coefficient on education (years)

Controlling for deductible and age, a 1-unit increase in the years of education increases the expected number of outpatient visits by a factor of exp(0.04150) = 1.04, by about 4%.

New cards

Interpretation of regression coefficients in Poisson model

\exp(\beta_k) is the rate ratio — it gives the multiplicative change in the expected count when x_k increases by 1 unit, holding all other variables constant

If \beta_k > 0, expected count increases; if \beta_k < 0, it decreases

New cards

How do you compute fitted counts in a Poisson regression model?

Use \hat{\mu} = \exp(x' \hat{\beta})

You can compare fitted counts across scenarios by plugging in different values of x

New cards

How do you compute fitted probabilities in a Poisson regression model?

Use the Poisson formula:

\hat{P}(Y = y \mid x) = \frac{e^{-\hat{\mu}} \hat{\mu}^y}{y!}

Where \hat{\mu} = \exp(x' \hat{\beta}) , obtained from the fitted count.

New cards

How do you compute the change in fitted counts when covariates change?

Change in fitted counts:

\hat{\mu}_b - \hat{\mu}_a = \exp(x_b' \hat{\beta}) - \exp(x_a' \hat{\beta})

Compare fitted means for two different covariate values x_b and x_a

New cards

<p>How do you calculate the numbers in <code>muhat</code> ? And P(Y=0), P(Y=1), etc? </p>

How do you calculate the numbers in muhat ? And P(Y=0), P(Y=1), etc?

New cards

Average fitted probability formula (over all persons in the observed data)

New cards

What is overdispersion in count data models?

When the observed variance is greater than the mean — violating the Poisson assumption that \text{E}(Y_i) = \text{Var}(Y_i)

A common symptom is too many zeros in the data compared to what the Poisson model expects

New cards

Why might overdispersion persist even after fitting a Poisson regression model?

Because of unaccounted heterogeneity — the Poisson model assumes that all units with the same x share the same mean \mu

But some variability (e.g. between individuals) may remain even after adjusting for observed covariates

This can lead to continued overdispersion in the data

New cards

How does Negative Binomial regression handle overdispersion?

It adds a dispersion parameter (often \alpha = 1/k) so that:

\text{E}(Y) = \mu (same as Poisson)

\text{Var}(Y) = \mu(1 + \alpha \mu), allowing \text{Var}(Y) > \text{E}(Y)

This extra flexibility helps model count data with overdispersion

New cards

What is the probability mass function of the Negative Binomial distribution?

P(Y = y) = \frac{\Gamma(y + k)}{\Gamma(k)\, \Gamma(y + 1)} \left( \frac{k}{\mu + k} \right)^k \left( \frac{\mu}{\mu + k} \right)^y

Used for count data with overdispersion, where \mu is the mean and k is the dispersion parameter

New cards

Mean of the negative binomial distribution

E(Y) = \mu =

New cards

Variance of the negative binomial distribution, and how does it compare to the mean.

\text{Var}(Y) = \mu + \frac{\mu^2}{k} \quad \text{or} \quad \mu (1 + \alpha \mu)

Where \alpha = \frac{1}{k}

Var(Y) > E(Y)

New cards

What does R refer to k as?

theta = 1/alpha

New cards

Is negative binomial part of the two-param exponential family?

No, except when k is known and fixed.

New cards

How are fitted probabilities computed in Negative Binomial regression?

They are computed using the Negative Binomial probability function

The mean is modeled as \mu_i = \exp(x_i' \beta) just like in Poisson regression

The interpretation of \beta is the same, but the fitted \hat{P}(Y = y \mid x) uses the NB distribution to allow for overdispersion

New cards

How do you test for overdispersion in a Poisson model?

Use a likelihood ratio test comparing the Poisson model to a Negative Binomial model

Test H_0: \alpha = 0 (no overdispersion) vs. H_1: \alpha > 0

Because \alpha = 0 is on the boundary of the parameter space, divide the usual chi-squared p-value by 2

Alternatively, use the LR test p-value as a conservative check

New cards

What model are we fitting in this R command?

Negative binomial count model.

New cards

What happens when overdispersion is present and a Negative Binomial model is used?

If \alpha > 0 (e.g., \hat{\alpha} = 1.48 with significant p-value), there is strong overdispersion

Estimated \hat{\beta} and fitted means \hat{\mu} may still look similar to Poisson

But Poisson underestimates standard errors of \hat{\beta}

Fitted probabilities from the NB model differ and better match observed data

New cards

Negative binomial: interpret the coefficient on EDUCEC

Controlling for IDP and age, for a 1-unit increase in EDUC, the expected number of doctor visits is multiplied by exp(0.04617) = 1.047, or a 4.7% increase.

New cards

What is k? What is alpha?

k = 0.6742

alpha = 1/k = 1/0.6742 = 1.48

New cards

Should we account for overdisperson?

Yes, statistically significant when doing a LR Test.

New cards

Why might count outcomes y_i not be directly comparable across subjects?

Because subjects may have different levels or durations of exposure to risk

E.g., longer follow-up time (person-years) or more vehicles on a road increases the chance of events occurring

Without accounting for exposure, we may misinterpret the effect of explanatory variables

New cards

How do we account for differing exposure times in Poisson or Negative Binomial models?

Include \log(t_i) as an offset term in the log-linear model

We model the rate \mu_i / t_i so that:

\log(\mu_i) = \log(t_i) + x_i' \beta

The offset \log(t_i) has no coefficient and adjusts for different durations or exposure levels

If t_i is constant across units, the offset isn’t needed

New cards

Exposure example: admissions to intensive care unit

New cards

What are we modelling in this R command?

Exposure time using log-linear poisson

New cards

<p>Interpret the intercept on `<code>_novjan`</code></p>

Interpret the intercept on `_novjan`

Controlling for the time of day and day of the week, the months of Nov-Jan have higher rate of beginning admitted to the hospital relative to other months, about exp(0.574) = 1.78 times higher rate, or 78% higher.

The admission rate is 77.6% higher in Nov–Jan compared to Feb–Oct, adjusting for day of week, hour of day, and exposure time.

After adjusting for day of week, time of day, and the number of hours observed, the expected number of ICU admissions per hour is about 1.78 times higher in November–January than in February–October.

New cards

<p>Interpret the coeff on <code>h_22_04</code> </p>

Interpret the coeff on h_22_04

Controlling for month and day of the week, the exited number of ICU admissions per hour is about exp(1.099) = 3.00x higher during the hours of 22:00 - 04:00. Or about 300%.