In previous article, we outlined the top mistakes waiting while trying to asses the performance — which can be any variable in your marketing department you want or need to report on. Here, let’s focus on something constructive. Let’s learn a bit on a simple, yet powerful tool that can measure various relationships and by doing so, subtract the various effects your data is exposed to.

## Get the basics right – why we shouldn’t reject simple explanations

While building a relationship between analyzed quantities, usually we want to explain one variable (or many at the same time — but for simplicity let’s focus on one), so called dependant variable, and some other factors (explanatory variables). As an example, while trying to predict the number of web searches of your brand in a given time \(t\) (\(y_{t}\)) we might want to build a relationship between it and some other measurable variables: industry performance (\(x_{t}\)), your brand awareness from the polls you make (\(v_{t}\)), competitors’ above the line spend in previous time period (\(z_{t-1}\)) etc. We always need to remember that **the dependant variable should have a logical connection with the explanatory variables** — otherwise we move towards GIGO modelling (garbage in, garbage out). What sense would the model above have, if we try to add to it number of sterilized cats in northern Wales? In the ideal world, which is the world without uncertainty, we would know which explanatory variables to take and how the relationship looks like:

$$y_{t} = f(x_{t},v_{t},z_{t-1}).$$

However in the world we live in, nothing is certain (except death and taxes). So the puzzle is: what functional form binds together the explanatory variables and the dependant variable? To deal with this question, we might try various models and compare results but the first step should in most cases look similarly: try different Taylor series expansion. What does it mean? We don’t have to concentrate on it now, all we need to know is that this expansion enables us to move closer to any regular unknown function — at the expense of model simplicity. The higher expansion order, the highest precision bias we are exposed to:

- 1st order (\(n=1\)): \(f(x) \approx a+b \cdot x + error\)

- Β 2nd order (\(n=2\)): \(f(x) \approx a+b \cdot x + c \cdot x^{2}+ error\)

- and so on.

In the examples above, \(a\), \(b\) and \(c\) are constants, which knowing \(f\) we calculate from as it’s derivatives — no need to introduce maths here, you can easily find more details e.g. on Wikipedia. What we need to remember is that the error gets smaller when we increase \(n\). If we start from the first order, we’ll get so called linear regression. That is the true reason, why we use this model. Many times you can hear it’s oversimplified and outdated. But if you know why you use it — it’s still a powerful tool. We can then build a model like this:

$$y_{t} = \beta_{0}+ \beta_{1} x_{t} + \beta_{2} v_{t} + \beta_{3}z_{t-1} + \varepsilon_{t},$$

where \(\varepsilon_{t}\) stands for the measurement error. **As we are exposed to the environment that we don’t fully control either understand, the error term contains all things like the variables we don’t know about,** resulting from an insufficient approximations of \(f\) or just errors in measuring the output.

## Beware of dogmatic, uncertanity ommiting answers

What we are now interested in, is the values of parameters: \(\beta_{1}\), \(\beta_{2}\), \(\beta_{3}\) and the baseline \(\beta_{0}\). But really do we want to know values? Why should we think that these are constants? We are not so sure about them. It’s a similar story to the average: **the value is meaningless, without representing the error around**. If you hear, that on average customers rank 3/5 your brand, the first you should think of is the following question: what was the sample size and distribution shape? Is our brand of poor quality that everyone ranks it 3/5 or we had some very dissatisfied customers who ranked 1/5 and the rest 5/5? The single-number answer usually rises only more questions.

If we are not satisfied by a single-number answer, **we should move towards a distribution**. We will then see that has been averaged out. This opens the wide door of so called **Bayesian statistics** — where we speak in the language of distributions. Again, let’s leave the maths and focus on key thoughts.

## The intuition behind sandbox full of distributions

The first one is: how do you rate your parameters \(\beta_{1}\), \(\beta_{2}\), \(\beta_{3}\) and \(\beta_{0}\) before looking at the data. What do you expect it to be? Maybe you did some previous research in another company? Have a thought. **Bayesian statistics encourages thinking.** As an example: Prior to looking at the data, we suspect the increase in competitors’ offline spend will have a neagative impact on demand for our products. However, we can hardly say how strong the effect is. So let’s take a Normal distribution, centered at -5 and with 20 of standard deviation. Is it a good prior? We agree to disagree here — everyone can have a different beliefs and we should be tolerant. In real life, as well as in statistics when it comes to priors.

Second step: collecting the data! We need to take care in this step. Keep an eye on data quality, sources and reliability. It’s better to remove a variable if we don’t trust it. The GIGO trap is always ready to catch.

Third step: perform the calcs. If you’re a marketer, there’s plenty of software or tools available on the web. Please be careful and use only trusted sources — like R packages, spreadsheets from statisticians or economericians. Personally I recommend R. The nuts and bolts of calcs are not outlined here — however they are not scary. Really!

Final, most interesting step for the marketer: analyze the results — so called posterior distributions. The most important thing to look at is not the location or dispersion of our posteriors. It is the way ot moved from the priors.

## Let the data speak!

This is when the **data speak**: if your belief is right, the data will make the uncertanity smaller but won’t move the distribution far left or right. If your guess wasn’t accurate, the data will move the distribution away from

yours. It looks like in the example we underestimated the competitors’ spend – it has more relevant (distribution more spiky) and stronger (more distant from zero than assumed).

However, if the variable is inconclusive, the data will move the distribution towards zero or/and make it more flat. We can interpret the averages in the classical way of interpreting linear regression models, i.e. if the competitors spend more in a given time by one unit, we can expect a drop in the demand for our brand by (on average) 12 units. Isn’t it easy yet beautiful, far deeper than making all YoY?