MarketingDistillery.com is about web analytics, data science and marketing strategy

# Linear Regression vs Logistic Regression vs Poisson Regression

Generalized Linear Models (GLMs) extend the ordinary Linear Regression and allow the response variable y to have an error distribution other than the normal distribution. GLMs are great because:

1. They are easy to understand.
2. Most statistical packages contain functions to fit and interpret the resulting model.
3. Linear regression or logistic regression is sufficient for a lot of real-life applications.

# Linear Regression

Simple Linear Regression models are the workhorse behind standard econometric modeling. Typical marketing applications include the Marketing Mix Model (predicting the response to marketing efforts, market share etc.) and estimating the total Customer Lifetime Value.

In Linear Regression models, a continuous set of input variables is used to predict a single (or many) continuous response variables. For example, an ROI model where spend by marketing channel is regressed on the total revenue. Or socio-demographic factors from a customer database onto individual customer’s total spend.

The standard Linear Regression model has the following form:

$$y = \alpha_0 + \sum_{i=0}^N \alpha_i x_i$$

It is is usually fitted using Ordinary Least Squares algorithm and in R this is done with the lm function.

The key benefit of using simple additive models is the ease with which resulting parameters can be interpreted. Basically, $$\alpha_i$$ gives us the increase in response (customer value, total revenue etc.) corresponding to one unit increase in the parameter $$x_i$$. Note, that some care needs to be taken with factor variables (e.g. sex, cities etc.)

# Logistic Regression

Logistic Regression is often used when the response variable encodes a binary outcome, e.g. campaign clicked, email opened, product viewed/purchased etc. Typical applications include a Customer Choice Model, Click-through Rate and Conversion Rate modeling, Credit Scoring or Churn prediction.

In principle, Logistic Regression uses a logistic function to squash the output of the linear model into the [0,1] interval.

$$y = \frac{1}{1 + e^{-z}} \\ z = \alpha_0 + \sum_{i=0}^N \alpha_i x_i$$

In R, use the glm function to fit all Generalized Linear Models. Simply pass the distribution family as in the third parameter. For logistic regression use binomial() as the family function.

Parameter interpretation is slightly more complex in Logistic Regression. Instead of representing function slope as in linear regression, they now provide the increase in log odds that the response variable is True. See this great article for an in-depth explanation.

# Poisson Regression

This one is used less frequently, but nevertheless should be part of your Data Science toolbox. Poisson Regression is used to model counts of events. Typical usage scenarios are around Customer Lifetime Value modeling. Think – number of orders placed by a customer in her lifetime, number of visits to the website by an individual user etc.

Here, the linear part of the model fits the log-mean of a Poisson distribution:

$$y = Poisson(\lambda) \\ ln \lambda = \alpha_0 + \sum_{i=0}^N \alpha_i x_i$$

To fit the Poisson Regression in R use the glm function and pass the poisson() function family as the third parameter.

One unit increase in a parameter results in the response variable being multiplied by $$e^{\alpha_i}$$.

### 5 responses to “Linear Regression vs Logistic Regression vs Poisson Regression”

1. many thanks

2. […] Linear Regression vs Logistic Regression vs Poisson Regression ↩ […]

3. […] Linear Regression vs Logistic Regression vs Poisson Regression ↩ […]

4. […] Linear Regression vs Logistic Regression vs Poisson Regression ↩ […]

5. […] Linear Regression vs Logistic Regression vs Poisson Regression ↩ […]