ST411 Weeks 7-8: Models for count responses

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/35

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

36 Terms

1
New cards

POISSON AND NEGATIVE BINOMIAL MODELS FOR SINGLE COUNTS

2
New cards
<p>Poisson example: RAND Health Insurance Experiment</p>

Poisson example: RAND Health Insurance Experiment

knowt flashcard image
3
New cards

What are the key properties of the Poisson distribution?

It models count data where Y = 0, 1, 2, \dots with parameter \mu

-

Probability: P(Y = y) = \frac{e^{-\mu} \mu^y}{y!}

-

Mean and variance are both equal to \mu

-

It belongs to the two-parameter exponential family (but without a dispersion parameter \phi)

4
New cards

What is the probability formula for Poisson distribution

Probability: P(Y = y) = \frac{e^{-\mu} \mu^y}{y!}

5
New cards

What is the mean and variance of a Poisson distribution?

Mean and variance are both equal to \mu

6
New cards

What is a Poisson log-linear model for counts?

This is a regression model using the Poisson distribution. It’s a GLM where Y_i \sim \text{Poisson}(\mu_i) and

\log(\mu_i) = x_i' \beta — the log link is the canonical link

\log(\lambda_i) = \beta_0 + \beta_1 x_{1i} + \cdots + \beta_p x_{pi}

-

Implies \mu_i = \exp(x_i' \beta)

\lambda_i = \exp(\beta_0 + \beta_1 x_{1i} + \cdots + \beta_p x_{pi})

-

Estimated using maximum likelihood, with likelihood-based inference

7
New cards
<p>What are we modelling in the R command? </p>

What are we modelling in the R command?

Poisson log-linear model, with response variable MDU - number of outpatient visits to a medical doctor in a year

8
New cards
<p>Interpret the coefficient on IDP (having a deductible)</p>

Interpret the coefficient on IDP (having a deductible)

Controlling for age and education, the expected number of visits to a doctor for a person on a health insurance plan with an individual deductible is 0.833 times the number for a person on a plan without an individual deductible.

In short, being on a plan with an IDP reduces the expected annual number of visits to a doctor by about 16.7%, controlling for age and education.

9
New cards
<p>Interpret the coefficient on education (years)</p>

Interpret the coefficient on education (years)

Controlling for deductible and age, a 1-unit increase in the years of education increases the expected number of outpatient visits by a factor of exp(0.04150) = 1.04, by about 4%.

10
New cards

Interpretation of regression coefficients in Poisson model

\exp(\beta_k) is the rate ratio — it gives the multiplicative change in the expected count when x_k increases by 1 unit, holding all other variables constant

-

If \beta_k > 0, expected count increases; if \beta_k < 0, it decreases

11
New cards

How do you compute fitted counts in a Poisson regression model?

Use \hat{\mu} = \exp(x' \hat{\beta})

-

You can compare fitted counts across scenarios by plugging in different values of x

12
New cards

How do you compute fitted probabilities in a Poisson regression model?

Use the Poisson formula:

\hat{P}(Y = y \mid x) = \frac{e^{-\hat{\mu}} \hat{\mu}^y}{y!}

-

Where \hat{\mu} = \exp(x' \hat{\beta}) , obtained from the fitted count.

13
New cards

How do you compute the change in fitted counts when covariates change?

Change in fitted counts:

\hat{\mu}_b - \hat{\mu}_a = \exp(x_b' \hat{\beta}) - \exp(x_a' \hat{\beta})

-

Compare fitted means for two different covariate values x_b and x_a

14
New cards
<p>How do you calculate the numbers in <code>muhat</code> ? And P(Y=0), P(Y=1), etc? </p>

How do you calculate the numbers in muhat ? And P(Y=0), P(Y=1), etc?

knowt flashcard image
15
New cards

Average fitted probability formula (over all persons in the observed data)

knowt flashcard image
16
New cards

What is overdispersion in count data models?

When the observed variance is greater than the mean — violating the Poisson assumption that \text{E}(Y_i) = \text{Var}(Y_i)

-

A common symptom is too many zeros in the data compared to what the Poisson model expects

17
New cards

Why might overdispersion persist even after fitting a Poisson regression model?

Because of unaccounted heterogeneity — the Poisson model assumes that all units with the same x share the same mean \mu

-

But some variability (e.g. between individuals) may remain even after adjusting for observed covariates

-

This can lead to continued overdispersion in the data

18
New cards

How does Negative Binomial regression handle overdispersion?

It adds a dispersion parameter (often \alpha = 1/k) so that:

-

\text{E}(Y) = \mu (same as Poisson)

-

\text{Var}(Y) = \mu(1 + \alpha \mu), allowing \text{Var}(Y) > \text{E}(Y)

-

This extra flexibility helps model count data with overdispersion

19
New cards

What is the probability mass function of the Negative Binomial distribution?

P(Y = y) = \frac{\Gamma(y + k)}{\Gamma(k)\, \Gamma(y + 1)} \left( \frac{k}{\mu + k} \right)^k \left( \frac{\mu}{\mu + k} \right)^y

-

Used for count data with overdispersion, where \mu is the mean and k is the dispersion parameter

20
New cards

Mean of the negative binomial distribution

E(Y) = \mu =

21
New cards

Variance of the negative binomial distribution, and how does it compare to the mean.

\text{Var}(Y) = \mu + \frac{\mu^2}{k} \quad \text{or} \quad \mu (1 + \alpha \mu)

Where \alpha = \frac{1}{k}

Var(Y) > E(Y)

22
New cards

What does R refer to k as?

theta = 1/alpha

23
New cards

Is negative binomial part of the two-param exponential family?

No, except when k is known and fixed.

24
New cards

How are fitted probabilities computed in Negative Binomial regression?

They are computed using the Negative Binomial probability function

-

The mean is modeled as \mu_i = \exp(x_i' \beta) just like in Poisson regression

-

The interpretation of \beta is the same, but the fitted \hat{P}(Y = y \mid x) uses the NB distribution to allow for overdispersion

25
New cards

How do you test for overdispersion in a Poisson model?

Use a likelihood ratio test comparing the Poisson model to a Negative Binomial model

-

Test H_0: \alpha = 0 (no overdispersion) vs. H_1: \alpha > 0

-

Because \alpha = 0 is on the boundary of the parameter space, divide the usual chi-squared p-value by 2

-

Alternatively, use the LR test p-value as a conservative check

26
New cards
<p>What model are we fitting in this R command?</p>

What model are we fitting in this R command?

Negative binomial count model.

27
New cards

What happens when overdispersion is present and a Negative Binomial model is used?

If \alpha > 0 (e.g., \hat{\alpha} = 1.48 with significant p-value), there is strong overdispersion

-

Estimated \hat{\beta} and fitted means \hat{\mu} may still look similar to Poisson

-

But Poisson underestimates standard errors of \hat{\beta}

-

Fitted probabilities from the NB model differ and better match observed data

28
New cards
<p>Negative binomial: interpret the coefficient on EDUCEC</p>

Negative binomial: interpret the coefficient on EDUCEC

Controlling for IDP and age, for a 1-unit increase in EDUC, the expected number of doctor visits is multiplied by exp(0.04617) = 1.047, or a 4.7% increase.

29
New cards
<p>What is k? What is alpha?</p>

What is k? What is alpha?

k = 0.6742

alpha = 1/k = 1/0.6742 = 1.48

30
New cards
<p>Should we account for overdisperson?</p>

Should we account for overdisperson?

Yes, statistically significant when doing a LR Test.

31
New cards

Why might count outcomes y_i not be directly comparable across subjects?

Because subjects may have different levels or durations of exposure to risk

-

E.g., longer follow-up time (person-years) or more vehicles on a road increases the chance of events occurring

-

Without accounting for exposure, we may misinterpret the effect of explanatory variables

32
New cards

How do we account for differing exposure times in Poisson or Negative Binomial models?

Include \log(t_i) as an offset term in the log-linear model

-

We model the rate \mu_i / t_i so that:

\log(\mu_i) = \log(t_i) + x_i' \beta

-

The offset \log(t_i) has no coefficient and adjusts for different durations or exposure levels

-

If t_i is constant across units, the offset isn’t needed

33
New cards

Exposure example: admissions to intensive care unit

knowt flashcard image
34
New cards
<p>What are we modelling in this R command? </p>

What are we modelling in this R command?

Exposure time using log-linear poisson

35
New cards
<p>Interpret the intercept on `<code>_novjan`</code></p>

Interpret the intercept on `_novjan`

Controlling for the time of day and day of the week, the months of Nov-Jan have higher rate of beginning admitted to the hospital relative to other months, about exp(0.574) = 1.78 times higher rate, or 78% higher.

The admission rate is 77.6% higher in Nov–Jan compared to Feb–Oct, adjusting for day of week, hour of day, and exposure time.

After adjusting for day of week, time of day, and the number of hours observed, the expected number of ICU admissions per hour is about 1.78 times higher in November–January than in February–October.

36
New cards
<p>Interpret the coeff on <code>h_22_04</code> </p>

Interpret the coeff on h_22_04

Controlling for month and day of the week, the exited number of ICU admissions per hour is about exp(1.099) = 3.00x higher during the hours of 22:00 - 04:00. Or about 300%.

OSZAR »