Customer segmentation is the process of splitting your customer database into smaller groups. By focusing on specific customer types, you will maximize customer lifetime value and better understand who they are and what they need. Typically customers differ in terms of:

- products they are interested in,
- marketing channels they interact with (e.g. offline media like TV and press, social networks etc.),
- the maximum amount they can pay for a product (willingness to pay),
- types of promotions and benefits they expect (discounts, free shipping),
- buying patterns and frequency.

Customer segmentation can help other parts of your business. It will allow you to:

**improve customer retention**by providing products tailored for specific segments,**increase profits**by leveraging disposable incomes and willingness to spend,**grow you business quicker**by focusing marketing campaigns on segments with higher propensity to buy,**improve customer lifetime value**by identifying purchasing patterns and targeting customers when they are in the market,**retain customers**by appearing as relevant and responsive,**identify new product opportunities**and improve the products you already have,**optimize operations**by focusing on geographies, age groups etc. with the most value,**increase sales**by offering free shipping to high frequency buyers,**offer improved customer support**to VIP customers,**gain brand evangelists**by incentivising them to comment, review or talk about your product with free gifts or discounts,**reactivate customers**who have churned and no longer interact with you.

Key variables to use for customer segmentation are:

**geographical location**– knowing where customers live can give you a good idea on their income and lifestyle (you can also incorporate databases like Experian Mosaic),**age and gender**– younger customers are often more impulsive and frequent buyers while female customers might have a higher long-term value,**acquisition channel**– e.g. customers from Social Media are often less valuable then customers navigating to your site directly,**first product purchased**– pay close attention to the transaction value and product category to differentiate between price-focused and quality-focused customers,**device types**– e.g. customers using a mobile device typically spend less than customers on a desktop PC,**Recency, Frequency and Monetary****value**of customer transactions is a complete segmentation strategy (see my Introduction to RFM Segmentation deck for more details).

- Data is never clean.
- You will spend most of your time cleaning and preparing data.
- 95% of tasks do not require deep learning.
- In 90% of cases generalized linear regression will do the trick.
- Big Data is just a tool.
- You should embrace the Bayesian approach.
- No one cares how you did it.
- Academia and business are two different worlds.
- Presentation is key – be a master of Power Point.
- All models are false, but some are useful.
- There is no fully automated Data Science. You need to get your hands dirty.

A Marketing Mix Model is a powerful tool in the hands of a CMO or a digital strategists. It can answer key questions like:

- What is my return of investment in digital marketing?
- When should I invest in specific channels?
- How should I allocate my budget?
- What is causing the uplift or downfall of my metrics?
- How will my numbers look like if I double my investment in Google?

Digital marketing channels are great for marketing mix modeling because there is often a direct relationship between cost and effect (visits, sales etc.). Things are not that easy with offline media like TV, press and outdoor. Parameters in the model often become insignificant or indicate no impact of offline campaigns.

The main reason behind this phenomenon is that the **regression model fails to capture the long term effect of offline media** like TV. When you see a cool ad on the TV you do not rush immediately to your PC or use your mobile phone to navigate to the website (BTW: this might be changing with second screen interaction).

TheTV campaign may have an impact on your customers a day, week or maybe even a month later.

There is a simple trick to capture that effect in standard marketing mix models. If offline media has a delayed effect on customer behaviour, why not model it directly? Imagine a bucket of water with a small hole in its bottom. The amount of water in the bucket symbolizes brand awareness your have in customers’ minds. Each week (or month etc.) some water is leaking out of the bucket as your brand slowly fades away into the noise. You can prevent it by pouring more water into the bucket. This is exactly what offline media is doing – each time someone sees your add your image is reinforced.

In a Marketing Mix Model, **instead of plugging in costs or reach metrics for offline media, you should use a “leaking bucket” model called Advertising Adstock**. The maths are very simple. The Adstock at day t is a portion of last week’s adstock plus your today’s reach. For TV the metrics are GRP (TVR is more popular in the UK). Mathematically:

$$A_t = \alpha \cdot A_{t-1} + x_t$$

The alpha parameter represents the “stickiness” of you campaign (or how much water stays in your leaking bucket from one week to another). **The higher the parameter, the longer lasting is the effect of your campaign**. Actual values are selected through repeated experiments and monitoring the significance of the model. Great TV campaigns often have a very high alpha parameter, e.g. 0.95.

Adstocks are easily calculated in R with the *filter* function**:**

library(ggplot2) # Create some synthetic TVRs data <- data.frame(Week=1:150, TVR=rep(0,150)) data[30:35,"TVR"] <- 50 data[60:63,"TVR"] <- 30 data[64:70,"TVR"] <- 67 ggplot(data, aes(x=Week, y=TVR)) + geom_line() # Calculate Adstock data$Adstock <- filter(data$TVR, filter=0.9, method="recursive") ggplot(data, aes(x=Week, y=Adstock)) + geom_line()

It is a good idea to compute Adstocks at multiple levels of alpha and including all of them into the data. Of course, only one level of Adstock is selected for the actual linear model.

RFM Segmentation is the easiest and most frequently used form of database segmentation. It is based on three key metrics: Recency, Frequency and Monetary Value of customer activity. RFM is often used with transactional history in e-commerce, but can also work for Social Media interactions, online gaming or discussion boards. Based on calculated segments a marketer can prepare cross-sell, up-sell, retention and reactivation capampaigns. This deck provides a simple introduction to the RFM Segmentation methodology.

]]>H,O,H,O,O,O,O,O,H,O

In a more complex game, you may want to toss multiple coins in each round. For example, 3 rounds and 10 coin tosses in each:

TTTHHHHHHT,HHHTHTHHHT,HHTHHTHHHT

Count the number of Heads (wins/conversions) in each round. In the example above it is 6, 7 and 7 successes respectively. The probability of observing \(x\) successes (Heads) in \(n\) trials (tosses) when the probability of Heads is \(p\) can be calculated with the binomial distribution:

\[P(x,n,p) = {n \choose x} p^x (1-p)^{n-x}\]

It can be easily calculated in Excel. For example, to calculate the probability of 3 successes in 10 trials with 0.5 probability of success use this formula:

=BINOM.DIST(3, 10, 0.5, FALSE)

The result is 11% chance of seeing 3 Heads.

Conversion Rates (and Click-through Rates etc.) can be easily modeled with a coin toss game. Each impression of an ad will be a coin toss and each click a will be a success (coin landing Heads). The probability of landing Heads is the Conversion Rate.

Let us get back to our game. Imagine you do not know if the coin is fair or not. To play the game we need to somehow estimate the probability of it landing Heads (you would not play a game where the odds are against you). The simplest way to estimate the probability of success would be to make a lot of trials (tosses) and calculate the proportion of Heads to Tails.

\[P(H) = \frac{H}{n}\]

Every marketer knows the formula for Conversion Rate as the number of clicks divided by the number of ad impressions. OK, let us get back to our coin. We perform the experiment and in 10 coin tosses we observe:

TTTHTTTTTT

Just a single success! The probability of this outcome assuming the coin is fair can be calculated in Excel:

=BINOM.DIST(1,10,0.5,FALSE)

The result is 0.1% which is not really convincing that the coin is fair.

OK – I will make things a bit more difficult. What if you could only toss the coin once? Just one go to determine if it is fair or not. You toss it and it lands Tails. The standard equation for Conversion Rate would give you an estimate of 0% chance for Heads. Is this correct? OK, what if you had two trials and observe **HH**. Does this mean a 100% Conversion Rate? No – this does not feel right. The problem is in the sample size. The natural way of estimating Conversion Rate with a proportion works only **provided** you have enough samples/trials. The same principle applies to marketing campaigns and ads. If you had just a single visitor and no sales you would not assume the Conversion Rate is 0%. You need a larger sample.

Now, if you are a marketer pause here and go check your campaigns. Notice that the smaller the number of ad impressions the higher the estimated Conversion Rate. You now know the reason why.

Imagine you have 1000 completely independent campaigns each with a true Conversion Rate of 4% and each with just 10 impressions. From the Binomial Distribution, the probability of observing no conversions for a campaign is 66%, of observing 1 conversion 27%, of observing 2 conversions 5% etc. This means that from the 1000 campaigns on average you can expect 665 campaigns to have no conversions (your campaigning software may show a 0% estimated Conversion Rate), 278 campaigns with 1 conversion (for a 10% estimated Conversion Rate) and 52 campaigns with 2 conversions (for a 20% estimated Conversion Rate). None of these estimates is even close to the actual 4%.

Luckily there is a solution – Confidence Intervals. Instead of saying that the Conversion Rate is equal to a single number, we will provide an interval where it lies with a certain probability of making an error. For example, instead of "The Conversion Rate for this campaign is 20%" you will say "With 5% error, the Conversion Rate lies between 0.1% and 25%".

Exact Confidence Intervals for Binomial Proportion can be calculated easily in Excel. No need to use R or SAS. I will assume you store the desired error chance in cell B1, the number of ad impressions in column A starting from row 5 and the number of clicks in column B starting from row 5 (see the screenshot below). The formula for the lower bound for Conversion Rate Confidence Interval is:

=IF(B5=0,0,BETA.INV($B$1/2,B5,A5-B5+1))

For the upper bound:

=IF(B5=A5,1,BETA.INV(1-$B$1/2,B5+1,A5-B5))

I use the IF function to take care of edge cases.

For 1 click after 10 impressions, with an error chance of 5%, our Conversion Rate lies between 0.25% and 44.50%. The more impressions you have, the more accurate the estimate. After 1000 impressions and 10 clicks the Confidence Interval narrows down to the 0.48% to 1.83% range.

In day-to-day analysis, it is often better to exclude campaigns with low impression numbers. A good rule of thumb is to only analyze ads with more than 1000 impressions.

]]>Customer behavior models are central to a successful CRM campaign in e-commerce. Let’s look at the very basic question of *"How many orders is this customer expected to make?"*. We start with basic building blocks.

Our model will be in a continuous setting, where customers are free to make a purchase at any time. Standard scenarios in e-commerce and retail are for products like groceries, books, movies, hotel rooms etc.

When you look at your order history database, customers might appear to be buying randomly. A deeper analysis will usually reveal a constant average purchase rate that is specific to individual customers – one book approximately every 20 days, one movie every weekend, groceries every 5-8 days etc. **If all purchases are independent and follow a constant average rate, then customer’s buying behavior can be modeled with a Poisson Process.** It counts the number of events (orders) and the time that passes between them.

Imagine a cohort of customers observed over a period of 6 months. Assuming they all remain active during the 6 month period, the number of orders made by a single customer will follow a Poisson distribution with a parameter \(\lambda_i\). This \(\lambda_i\) is exactly the average ordering rate that is unique for each customer. If we use \(X_i\) to denote the number of orders made by the i-th customer in one time unit (month, 6 months, a year etc.), then:

\[X_i \sim Poisson(\lambda_i)\]

For example, when \(\lambda_i = 4.2\) then the Poisson distribution of the number of orders made in a unit of time has the following density plot:

plot(seq(0,10,1), dpois(seq(0,10,1),4.2), type="h", xlab = "Orders", ylab="Prob")

The probability of making a single order in a time unit is 0.06, of making 2 orders is 0.13 and so on.

In most cases, your customer base will be a part of some larger group. It would be unrealistic to assume that average purchase rates are be totally independent and random (uniformly distributed). Social and economic similarities will create an underlying structure and behavior patterns. For example, all customers might generally order their groceries once a week etc.

To capture this effect we need an additional layer in our model, by assuming that individual customer \(\lambda_i\) are drawn from a common group-level distribution. To put it differently, we **use a single population-wide distribution from which we draw individual customer purchase rates**. A good candidate for the population-level distribution is the Gamma Distribution. It is defined for positive values (purchase rates can only be positive) and has a flexible shape. More advanced readers will notice that it is also a conjugate prior to the Poisson distribution. See figure for a sample density plot of a Gamma Distribution.

plot(seq(0,5,0.1), dgamma(seq(0,5,0.1),2,2), type="l", xlab="X", ylab="Prob")

By combining a customer-level Poisson Distribution for the number of orders with a population-level Gamma Distribution of purchase rates, we get a two-level hierarchical model:

\[X_i \sim Poisson(\lambda_i)\] \[\lambda_i \sim Gamma(\alpha, \beta)\]

To simulate observations from this model you would first draw a purchase rate \(\lambda\) from the Gamma Distribution and then draw the number of orders from \(Poisson(\lambda)\). This model is called a **Negative Binomial Distribution** (NBD).

Luckily the theory behind the NBD model is not overly complicated. Now, let’s look at fitting the model to order history data. Let’s generate some synthetic data:

customerLambda <- rgamma(10000, 2, 3) # Generate 10000 customers. X <- sapply(customerLambda, function(l) rpois(1,l)) hist(customerLambda, 50)

Note that all customers were active throughout the period and that we do not model a churn process (at least not in this post).

We will use RStan to fit the model, but there are packages available that would be much faster. Stan (the underlying tool behind RStan) is a powerful piece of software… maybe even too powerful to deal with such simple tasks. Reading through the Stan manual you will notice that the NBD distribution is readily available. It will only give you the population-level parameters, but this is what you need initially. Define the model in R like this:

stanData <- list(N=length(X), X=X) stanInit <- list(list(alpha=1.0, beta=1.0)) modelCode <- " data { int<lower=0> N; int<lower=0> X[N]; } parameters { real<lower=0> alpha; real<lower=0> beta; } model { alpha ~ uniform(0.0, 10.0); # Prior on alpha beta ~ uniform(0.0, 10.0); # Prior on beta X ~ neg_binomial(alpha, beta); } " testModel <- stan(model_code = modelCode, data=stanData, iter=100, chains=1, init=stanInit)

This R code will transform the Stan model into a C++ program, compile and run it automatically (note that on Windows you need to install some additional tools to make this work – check Stan’s manual). To see the summary of the model you call:

> print(testModel, pars=c("alpha", "beta")) Inference for Stan model: modelCode. 1 chains, each with iter=100; warmup=50; thin=1; post-warmup draws per chain=50, total post-warmup draws=50. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat alpha 2.10 0.02 0.11 1.92 2.02 2.09 2.18 2.26 20 1.05 beta 3.09 0.04 0.17 2.84 2.96 3.11 3.25 3.34 20 1.09

The \(\alpha\) and \(\beta\) parameters are very close to what we used when generating the data.

In a more complex scenario you would also want to obtain the distributions of individual customers’ purchase rates. This requires that you explicitly describe the model in Stan. Note that it will have thousands of parameters (population-wide \(\alpha\) and \(\beta\) plus one parameter per customer). We will need more data. Let’s modify the data generation code to have multiple observations per customer.

periods <- 24 X <- t(sapply(customerLambda, function(l) rpois(periods,l)))

Fitting the model will now take significantly longer.

stanData <- list(P=ncol(X), N=nrow(X), X=X) stanInit <- list(list(alpha=1.0, beta=1.0, lambda=rep(1.0,nrow(X)))) modelCode <- " data { int<lower=0> P; int<lower=0> N; int<lower=0> X[N,P]; } parameters { real<lower=0> alpha; real<lower=0> beta; real<lower=0> lambda[N]; } model { alpha ~ gamma(1.0,1.0); # Prior on alpha beta ~ gamma(1.0,1.0); # Prior on beta lambda ~ gamma(alpha, beta); for (i in 1:N) { for (j in 1:P) { X[i,j] ~ poisson(lambda[i]); } } } " testModel <- stan(model_code = modelCode, data=stanData, iter=1000, chains=1, init=stanInit)

To inspect the population parameters and the first three purchase rates:

> print(testModel,pars=c("alpha", "beta", "lambda[1]", "lambda[2]", "lambda[3]")) Inference for Stan model: modelCode. 1 chains, each with iter=1000; warmup=500; thin=1; post-warmup draws per chain=500, total post-warmup draws=500. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat alpha 2.00 0.01 0.10 1.81 1.93 1.99 2.06 2.20 160 1.01 beta 3.06 0.01 0.17 2.74 2.95 3.06 3.17 3.41 162 1.00 lambda[1] 0.45 0.01 0.12 0.25 0.36 0.43 0.52 0.73 417 1.00 lambda[2] 1.63 0.01 0.24 1.16 1.48 1.63 1.78 2.08 478 1.00 lambda[3] 0.63 0.01 0.16 0.35 0.52 0.61 0.72 1.00 500 1.00

Compare with the values we used to generate the data sample:

> customerLambda[1:3] [1] 0.412453 1.764508 0.637515

This is great! If you **add a churn model and an order value model you will be able to calculate the predicted Customer Lifetime Value**. We will look into this soon.

**Estimating Customer Lifetime Value allows you to say how much you can spend on customer acquisition and retention.** This is especially important for subscription-based products like games, online services, information marketing, loyalty programes etc. In this article we will use a classical approach to estimate CLV with a simple equation and customer cohort analysis. There are more accurate probabilistic models available and I will look into them in following posts. If you are just starting with CLV calculation for your business, I strongly recommend going for the simple solution first and then becoming sophisticated as necessary.

The classical equation for CLV uses a **discounted sum of monthly income multiplied by the probability of survival** to a given month:

$$CLV = \sum^T_{t=0} m \frac{r_t}{(1+d)^t}$$

Where:

- \(m\) – monthly cash flow (subscription fee).
- \(r_t\) – retention rate for month t (probability of being a customer for t consecutive months).
- \(d\) – discount rate to calculate the current value of future money.

To further simplify our model we assume that customers always subscribe for the full month, i.e. all payments (even for new customers) are made on the 1st day of the month.

To **calculate retention rate for the t-th month, we use a simple cohort analysis**. Prepare your customer base so that for each customer you know the month when she joined and for how many months she stayed a subscriber. Next, decide how far back into the past you want to go with your analysis. Let us assume you have decided to use past 2 years of data from Jan 2013 till Dec 2014 . Create a square table with one row and column per month of historical data. Next, add 1 to each cell where row indicates when the customer has subscribed and column indicates in which months she had an active subscription. If a customer has joined in Apr 2013 and remained a subscriber for 3 months, we should add 1 to cells in row Apr 2013 and columns: Apr 2013, May 2013, Jun 2013.

If you do this for all of your customers, you will end up with an upper-triangular table of customer counts similar to this one:

You read this table as follows: “100 customers joined in Jan 2013, of which 63 extended their subscription to Feb 2013. From the 100 customers that joined in Jan 2013, 47 continued their subscription to Mar 2013…”. If you sum numbers in columns you will get the total number of active subscriptions in a given month. This is a very basic example of a cohort analysis.

From the above cohort table, we can now easily calculate the retention rate from month 1-to-2, 1-to-3 etc. For example, retention rate from 1-to-2 is 63%, from 1-to-3 is 47%, from 1-to-4 is 38% and so on.

The only one piece of the equation left to untangle is the discount rate. UK government published a good article explaining discount rate and net present value. In the UK, **a recommended discount rate for calculations is 3.5% per annum**. We will divide it by 12 to get discount rate per month.

Plugging all of the above into a simple spreadsheet, we can easily do the calculation for any set of parameters:

A randomly picked customer has a Customer Lifetime Value of £406 in a 1 year time frame. This is the **amount we can spend on customer acquisition and loyalty programs** to break even. You can find the above spreadsheet here.

]]>

If anyone is interested in CPU history, one must remember or know about the great MHz race of Pentium processors. 1 GHz beaten with Pentium III in early 2000, 2 GHz of Pentium 4 in mid 2001, to finally see 3 GHz in 2003 on a Northwood HT unit. At that time it became understood, that due to various reasons (e.g. heating, power consumpition) this is not the path to go any further. The next step was to insert a second core and sell 2 separate cores in one CPU (Pentium D in 2005). almost 10 years later, in modern processors two cores are considered low segment, 4 is a standard and 4 GHz starts to appear (at least Turbo mode). But are we truely using the power given while writing our R code?

But one need to also remember, that multiple cores are not a solution to every problem. To fully use them, few conditions must be fulfilled:

- the task performed should be splittable into smaller and independent subtasks. If you calculate a sum, it can be divided into 2 independent sets, calculated on each and results added. Multicore will not help in the case of sequential tasks (where one step depends on each other)
- the amount of data required to perform each task shouldn’t overwhelm the efficiency gain. Summing a huge data frame is easier on one core, as the cost of distributing data might be simply too high

If our task is reasonable to perform it on multiple cores, one might use the R parallel package. It provides a handful of tools that will require as little as possible to adjust the code written to perform it on multiple cores.

Let us first analyze a purely artifficial example. Albeit compleatly useless in real life, it will give us a view on package’s performance.

First, we load all the packages required:

library('ggplot2') library('parallel') library('data.table')

and define the task sizes to try with, as well as a function to test. In our example, the task will calculate the expected value of a distribution being a sum of normal distributions:

sizes = c(1000, 5000, 10000, 50000, 1e5, 5e5, 1e6) testFun <- function(x, size){ set.seed(1) sum(rnorm(size)) } results = data.frame(size=c(), type=c(), time=c())

We set the seed so that the same numbers are drawn at each of the function evaluation (for comparability reasons).

Next, we need to tell R how many cores do we want to use (4 in this case):

cl <- makeCluster(4)

In this place, as the name suggests, we create a cluster by copying R around. The package offers far more than just that, but let us analyze the concept on a simple example.

Next, we would like to run our function 1000 times. In theory, each core should do 250 evaluations, however package also offeres load balancing so that might not be the case at all times (e.g.: one core can be busy with us playing a game while waiting for results).

for(size in sizes){ ## parallel computing beg = Sys.time() z=clusterApply(cl, 1:1000, testFun, size=size) td = as.numeric(Sys.time() - beg, "secs") results = rbind(results, data.frame(size=size, type="quad_core", time=td)) ## single threaded computing (to compare times and code) beg = Sys.time() z=lapply(1:1000, testFun, size=size) td = as.numeric(Sys.time() - beg, "secs") results = rbind(results, data.frame(size=size, type="single_core", time=td)) }

As one can notice, there is practically no implementational difference between using standard lapply and clusterApply. There is a subtle thing to remember though: in lapply, our function can see all the variables defined earlier in the R session. Using clusterApply causes launching the function on a brand new and empty session, so all data required shall be passed to it as arguments.

A nice thing to do (but not entirely necessary) is to stop the cluster when all is finished:

stopCluster(cl)

It will be terminated anyways when we close R, but this indicates that we are done with the task.

Let us see the time gains:

ggplot(results, aes(x=size, y=time, group=type, color=type))+ geom_point()+ scale_x_log10() dt=data.table(results)

results_ratio = as.data.frame(dt[,list(ratio = time[type == "single_core"]/time[type == "quad_core"]), by=c("size")]) ggplot(results_ratio, aes(x=size, y=ratio, group=1))+ geom_point()+ scale_y_continuous(limits=c(0,4))+ scale_x_log10()

As expected, some time loss is happening when the tasks are small. But when they grow, the efficiency gain is quite good (3-3.25x times faster instead of 4x). All computations are made on a Intel Core i5 3550K CPU (4 cores, no overclocking).

Now, make sure YOUR code is parallelized if applicable!

]]>“I think data-scientist is a sexed up term for a statistician…. Statistics is a branch of science. Data scientist is slightly redundant in some way and people shouldn’t berate the term statistician.”

Nate Silver (applied statistician)

According to Gartner, Data Science is reaching its peak of inflated expectations and is considered to be only 2-5 years away from reaching plateau. It is quite new entry on the Gartner Hype Cycle, introduced in August 2014.

It’s also interesting to note what Gartner says about big data, which already passed the peak with estimated plateau reach in 5 to 10 years: “While interest in big data remains undiminished, it has moved beyond the peak because the market has settled into a reasonable set of approaches, and the new technologies and practices are additive to existing solutions.”

Is Data Science still a buzzword without a clear definition? I tried to define it and understand who the Modern Data Scientist is. Have a look below on a new version of Modern Data Scientist infographic and let’s see how Data Science translates into real skills:

Say ‘hi’ to Krzysztof Zawadzki on LinkedIn

Tweet

Modern data scientist by MarketingDistillery.com is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. ]]>

10. On the Consultant Aproach to Web Analytics by René Dechamps Otamendi, Uploaded on Apr 20, 2007

9. Web Analytics Comparison -Sitecatalyst vs Google Analytics vs Webtrends by Aman Sandhu, Uploaded on Mar 11, 2013

8. Combining Methods: Web Analytics and User Research by User Intelligence, Uploaded on Sep 30, 2009

7. Introduction to Web Analytics for Journalists by Patrick Glinski, Head of Social Innovation at Idea Couture, Uploaded on Jan 30, 2010

6. How To Mesure And Optimise Your Roi Using Web Analytics Google, by 2tique, Uploaded on Jun 17, 2007

5. An Introduction to Web Analytics by iexpertsforum, Uploaded on Jun 03, 2009

4. Web Analytics 101 & Career Advice by Alex Cohen, Customer Acquisition Marketing at DigitalAlex, Uploaded on Dec 05, 2008

3. The Web Analytics Business Process by Marco Derksen, Enterpreneur at Koneksa Mondo, Uploaded on Oct 24, 2006

2. Web Analytics Maturity Model by Stephane Hamel, Director, Strategic Services at Cardinal Path, Uploaded on May 13, 2009

And the most popular is…

1. Web Analytics Tools Comparison by Tim Wilson, Digital Analyst at Web Analytics Demystified, Uploaded on Feb 27, 2011