Customer behavior models are central to a successful CRM campaign in e-commerce. Let’s look at the very basic question of *"How many orders is this customer expected to make?"*. We start with basic building blocks.

# Continuous buying behavior

Our model will be in a continuous setting, where customers are free to make a purchase at any time. Standard scenarios in e-commerce and retail are for products like groceries, books, movies, hotel rooms etc.

When you look at your order history database, customers might appear to be buying randomly. A deeper analysis will usually reveal a constant average purchase rate that is specific to individual customers – one book approximately every 20 days, one movie every weekend, groceries every 5-8 days etc. **If all purchases are independent and follow a constant average rate, then customer’s buying behavior can be modeled with a Poisson Process.** It counts the number of events (orders) and the time that passes between them.

Imagine a cohort of customers observed over a period of 6 months. Assuming they all remain active during the 6 month period, the number of orders made by a single customer will follow a Poisson distribution with a parameter \(\lambda_i\). This \(\lambda_i\) is exactly the average ordering rate that is unique for each customer. If we use \(X_i\) to denote the number of orders made by the i-th customer in one time unit (month, 6 months, a year etc.), then:

\[X_i \sim Poisson(\lambda_i)\]

For example, when \(\lambda_i = 4.2\) then the Poisson distribution of the number of orders made in a unit of time has the following density plot:

1 |
plot(seq(0,10,1), dpois(seq(0,10,1),4.2), type="h", xlab = "Orders", ylab="Prob") |

The probability of making a single order in a time unit is 0.06, of making 2 orders is 0.13 and so on.

In most cases, your customer base will be a part of some larger group. It would be unrealistic to assume that average purchase rates are be totally independent and random (uniformly distributed). Social and economic similarities will create an underlying structure and behavior patterns. For example, all customers might generally order their groceries once a week etc.

To capture this effect we need an additional layer in our model, by assuming that individual customer \(\lambda_i\) are drawn from a common group-level distribution. To put it differently, we **use a single population-wide distribution from which we draw individual customer purchase rates**. A good candidate for the population-level distribution is the Gamma Distribution. It is defined for positive values (purchase rates can only be positive) and has a flexible shape. More advanced readers will notice that it is also a conjugate prior to the Poisson distribution. See figure for a sample density plot of a Gamma Distribution.

1 |
plot(seq(0,5,0.1), dgamma(seq(0,5,0.1),2,2), type="l", xlab="X", ylab="Prob") |

By combining a customer-level Poisson Distribution for the number of orders with a population-level Gamma Distribution of purchase rates, we get a two-level hierarchical model:

\[X_i \sim Poisson(\lambda_i)\] \[\lambda_i \sim Gamma(\alpha, \beta)\]

To simulate observations from this model you would first draw a purchase rate \(\lambda\) from the Gamma Distribution and then draw the number of orders from \(Poisson(\lambda)\). This model is called a **Negative Binomial Distribution** (NBD).

# Modeling Negative Binomial Distribution in Stan

Luckily the theory behind the NBD model is not overly complicated. Now, let’s look at fitting the model to order history data. Let’s generate some synthetic data:

1 2 3 |
customerLambda <- rgamma(10000, 2, 3) # Generate 10000 customers. X <- sapply(customerLambda, function(l) rpois(1,l)) hist(customerLambda, 50) |

Note that all customers were active throughout the period and that we do not model a churn process (at least not in this post).

We will use RStan to fit the model, but there are packages available that would be much faster. Stan (the underlying tool behind RStan) is a powerful piece of software… maybe even too powerful to deal with such simple tasks. Reading through the Stan manual you will notice that the NBD distribution is readily available. It will only give you the population-level parameters, but this is what you need initially. Define the model in R like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
stanData <- list(N=length(X), X=X) stanInit <- list(list(alpha=1.0, beta=1.0)) modelCode <- " data { int<lower=0> N; int<lower=0> X[N]; } parameters { real<lower=0> alpha; real<lower=0> beta; } model { alpha ~ uniform(0.0, 10.0); # Prior on alpha beta ~ uniform(0.0, 10.0); # Prior on beta X ~ neg_binomial(alpha, beta); } " testModel <- stan(model_code = modelCode, data=stanData, iter=100, chains=1, init=stanInit) |

This R code will transform the Stan model into a C++ program, compile and run it automatically (note that on Windows you need to install some additional tools to make this work – check Stan’s manual). To see the summary of the model you call:

1 2 3 4 5 6 7 8 |
> print(testModel, pars=c("alpha", "beta")) Inference for Stan model: modelCode. 1 chains, each with iter=100; warmup=50; thin=1; post-warmup draws per chain=50, total post-warmup draws=50. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat alpha 2.10 0.02 0.11 1.92 2.02 2.09 2.18 2.26 20 1.05 beta 3.09 0.04 0.17 2.84 2.96 3.11 3.25 3.34 20 1.09 |

The \(\alpha\) and \(\beta\) parameters are very close to what we used when generating the data.

In a more complex scenario you would also want to obtain the distributions of individual customers’ purchase rates. This requires that you explicitly describe the model in Stan. Note that it will have thousands of parameters (population-wide \(\alpha\) and \(\beta\) plus one parameter per customer). We will need more data. Let’s modify the data generation code to have multiple observations per customer.

1 2 |
periods <- 24 X <- t(sapply(customerLambda, function(l) rpois(periods,l))) |

Fitting the model will now take significantly longer.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
stanData <- list(P=ncol(X), N=nrow(X), X=X) stanInit <- list(list(alpha=1.0, beta=1.0, lambda=rep(1.0,nrow(X)))) modelCode <- " data { int<lower=0> P; int<lower=0> N; int<lower=0> X[N,P]; } parameters { real<lower=0> alpha; real<lower=0> beta; real<lower=0> lambda[N]; } model { alpha ~ gamma(1.0,1.0); # Prior on alpha beta ~ gamma(1.0,1.0); # Prior on beta lambda ~ gamma(alpha, beta); for (i in 1:N) { for (j in 1:P) { X[i,j] ~ poisson(lambda[i]); } } } " testModel <- stan(model_code = modelCode, data=stanData, iter=1000, chains=1, init=stanInit) |

To inspect the population parameters and the first three purchase rates:

1 2 3 4 5 6 7 8 9 10 11 12 |
> print(testModel,pars=c("alpha", "beta", "lambda[1]", "lambda[2]", "lambda[3]")) Inference for Stan model: modelCode. 1 chains, each with iter=1000; warmup=500; thin=1; post-warmup draws per chain=500, total post-warmup draws=500. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat alpha 2.00 0.01 0.10 1.81 1.93 1.99 2.06 2.20 160 1.01 beta 3.06 0.01 0.17 2.74 2.95 3.06 3.17 3.41 162 1.00 lambda[1] 0.45 0.01 0.12 0.25 0.36 0.43 0.52 0.73 417 1.00 lambda[2] 1.63 0.01 0.24 1.16 1.48 1.63 1.78 2.08 478 1.00 lambda[3] 0.63 0.01 0.16 0.35 0.52 0.61 0.72 1.00 500 1.00 |

Compare with the values we used to generate the data sample:

1 2 |
> customerLambda[1:3] [1] 0.412453 1.764508 0.637515 |

This is great! If you **add a churn model and an order value model you will be able to calculate the predicted Customer Lifetime Value**. We will look into this soon.