# It’s The Math, Stupid

Are you data driven? Do you live by the numbers? If you are, then you’re probably wasting an enormous amount of time and energy. Even worse, you’re probably getting a lot wrong.

In the neverending quest for substantiation and certainty in corporate life, we have used numbers as a panacea. All too often, more numbers are considered better and fewer are considered worse. This is, for lack of a better term, really stupid.

In truth, what we really need is fewer numbers and a whole lot more math. Math is what the ancients invented when they ran out of fingers and toes, because they realized that they needed to start thinking about abstract relationships in order to advance. The good news is that math is much simpler than numbers, more elegant and more likely to be right.

**The Tyranny of Averages**

Imagine that you’re standing in a room with a group of people. In the US, most incomes are about $50,000. Then Bill Gates walks in. If you calculate the average in the way your grade school teacher showed you, you will probably conclude that the average income in the room went up by millions. But no one is really any richer, so what gives?

Average is a word that gets thrown around a lot, but most people don’t know what it really means. In school, we were taught to calculate an arithmetic mean by adding up all of the numbers in a set and then dividing by the number of entities. However, the term is often used to denote a median, which is a “middle value.” These are often very different.

A mean can easily be thrown off by extreme values if the data is skewed. A median is what mathematicians call a robust statistic. It doesn’t move around much even when there are extreme values, because it merely sets the point at which 50% of the data points are below and 50% are above. To see what I mean, look at the chart below:

A few unusual values can ruin the whole concept of an average as a central tendency by moving the mean far away from the most common value (i.e. the mode). That’s why most statistics that we see reported are actually medians.

There is a special case in which the mean, median and mode are all equal and it is known as a “normal” or Gaussian distribution (after Carl Friedrich Gauss, one of the first people to use it effectively), but is often called a “bell curve.”. It looks like this:

We see these types of curves when data are randomly distributed. Although true randomness is relatively rare, statisticians often assume errors in data are random (for reasons that will become clear soon) and therefore “average out.”

**Accounting For Deviance**

In my blissful youth, many people considered me to be a deviant (and to some extent still do), meaning that I very rarely did what was expected of me. In any data set, you can expect to find entities that are a lot like me, ones that refuse to conform to the average. Mathematicians have a way to account for this called standard deviation.

There’s a complicated formula for it, but you can find it pretty easily by simply subtracting each value from the mean, squaring each result (to get rid of negative numbers) and then averaging (i.e. mean) to arrive at the variance. After that, you can just take the square root to arrive at standard deviation.

The chart above shows how useful this value is. If we can assume that the data is normally distributed (meaning errors are random and therefore average out) then roughly 68% of values will fall within 1 standard deviation, 95% within 2 standard deviations and 99% within 3 standard deviations.

It is from this concept that we get the idea of standard error, because we can predict exactly how many values will fall outside a certain confidence interval. For instance, we can be 95% confident that any particular value will fall within two standard deviations; 99% confident that a value will fall within three and so on.

So if the variation within that area of the graph is something we can live with, we say it’s statistically significant. Again, there are some complicated formulas that they use to torture kids in school, but as usual there is an easier way, you can merely divide 1 by the square root of the sample size like this:

So for a sample size of 100, you can expect the total error to be 10% or +/- 5%. For some reason, that’s what many people consider to be the minimal “proper” sample size, but it’s not really true. You can decide for yourself how much error you’re willing to live with.

**Progress and Regress**

There was a reason why people spent some much time figuring all this stuff out. Guys like Gauss were trying to understand how planets and other celestial bodies moved around, but they knew that their measurements weren’t very good. The data was usually messy and looked like this:

You can imagine how frustrating it was to try to delve into the mysteries of the universe with messy data, so Gauss came up with a workaround. If he simply assumed that the errors were random, then they would be normally distributed and all of the stuff about standard deviations would apply.

So using the same concepts, he developed the method of least squares, in which he would find the line that would have smallest squared residuals (i.e. the amount of error), which effectively minimizes the variance. He could then even tell how good his line fit the data by calculating the R-squared value. This is now known as a regression analysis.

There is a little twist here, because many people think that R-squared and correlation are the same thing. In fact, they’re not, (correlation is the amount of change in one value you can expect from a change in another) but by a strange quirk, correlation is “r” so you can get to correlation pretty easily from R-squared as long as the model is linear.

Unfortunately, many things don’t follow a straight line, but curve. If they do, we can still use the method of least squares to “fit” a model. However, there is no such thing as “non-linear correlation.” Some people unfortunately use that term , but they are profoundly mistaken about the basic concepts of data analysis.

One last problem is overfitting, where people make the model curve around just to get a good fit (i.e. a high R-squared value). This is probably the best example of people losing the math in the numbers. Every model should tell a clear story and, if your story is too complicated, chances are your model is wrong no matter how well the numbers work out.

Always, always, use the simplest model that fits the data.

**When Chaos Erupts**

You might have noticed by this point that I’ve used the word “assume” a lot. More specifically, everything we’ve discussed to this point assumes that data is random, meaning that there is no interaction between entities and therefore no feedback.

But what if that assumption isn’t true? What if some people simply liked Justin Bieber because other people like Justin Bieber and that convinced even more people to like him as well? Or what if people tended to buy stock in companies when they were going up, but would sell them when they went down?

The chart above shows what happens, we end up with far more extreme values (also known as outliers) than conventional models would predict. So, for instance, if financial traders were evaluating risk based on the random assumptions of normal distributions, they would be far undercounting volatility and could cause a lot of damage.

And that’s the problem with numbers. They tell us a lot about normal situations, but very little about extreme ones. After all, it’s the outliers that are really interesting. We’d much rather hear about Justin Bieber than the “average” teenager singing in the shower, just like we are fascinated by companies like Apple, but most firms bore us to death.

**Beauty in Patterns**

We live in a technological age where computers juggle numbers at the speed of light, far greater than the relatively feeble 200 MPH that our brains tend to work at. They spit out numbers far faster than we can figure out what to do with them. Over-quantification is the chronic disease of the digital age.

We humans do have a secret weapon though. We recognize patterns very well, far better than computers can (at least for the next decade or two anyway). The great mathematician G.H. Hardy put it this way:

A mathematician, like a painter or poet, is a maker of patterns. If his patterns are more permanent than theirs, it is because they are made with ideas.

So for all the confusion about numbers, math is pretty straightforward. You look for important patterns that tell a good story and you keep that story as simple as possible. We should always strive to explain the maximum amount of variables in the fewest possible statements. That is what is meant by mathematical beauty and elegance.

Numbers often lie. Math never does.

– Greg

Greg,

This is one reason that, having done a lot of forecasting, I’m not a big fan of spreadsheets loaded down with “Macros” and overly complicated formulas. They obscure the “thinking” process that allows you to understand patterns and see those outliers.

Excellent post!

Thanks Roger!

btw. I don’t think people pay nearly enough attention to outliers. They are often the most interesting part of the data set!

– Greg

Fully agree with your prognosis here. Very similar points made in the new Nate Silver book, ‘The Signal and the Noise’ – you should check it out.

Thanks Tony!

I’m a big Nate Silver fan (followed his blog since before it was on NYTimes) and covered his book in a post on Bayesian analysis – “Why Our Numbers A re Always Wrong” https://digitaltonto.com/2012/why-our-numbers-are-always-wrong/

– Greg

When this popped up in my RSS feed I thought “what the heck is Greg on about, has he lost it?” I’m glad to see that you haven’t.

Well, some people would disagree…

😉

– Greg

Great post!

I also love Math and it represents an important part in my profession.

Yet I also think that the capacity of the human brain to deal with complex problems is unrivaled.

So, computers can help us to make fast calculations and iterations, but a mathematical model has to be at the same time simple enough to permit to us “problem solvers” to extract (and to abstract) from the numbers the patterns in order to have a better understanding of the problem to solve.

That’s actually a subject I’ll be posting quite a bit about over the next few months. Computers are getting scary good at pattern recognition and, if you believe Ray Kurzweil, our advantage will disappear by 2030.

It is a very big problem and it’s coming fast.

– Greg

As a popular blog post, this is OK. But there are a few things in the post professional statisticians would think are wrong, misleading, or too narrowly defined.

Thanks for sharing your input.

– Greg

Excellent article with well articulated points. However, there is something that most of us don’t pay enough attention to. If you really do understand how to use the numbers that well thought and well applied research can give you, then you need to pay attention to the outliers as indicators of social change. You might be interested in a blog post I did on the subject: http://www.colemanmgt.com/oddballs-outliers-and-marketing-change/ I’d be interested in your comments.

Emily,

Yes. I also believe that outliers have been overlooked. As I said in the post, they’re the data points that are really interesting.

However, I think what is probably the more important issue is when outliers aren’t really outliers at all. Many people (most disastrously the financial community) assume a random, normal distribution when there is no reason to do so and they therefore undercount extreme values.

Since the financial crises, there’s been much more attention paid to “fat tailed” models but, as Nassim Taleb points out, even that doesn’t fully insulate you.

– Greg

Greg, you are right, of course. The financial community has become so hooked on automation and gross numbers that they have lost sight (if they ever had it) of where the numbers come from. For all that they like to believe they are sophisticated and highly talented, most of them are pretty average, at best, and not as numerically astute as they think. Unfortunately, IMHO, the same thing is true for business in general.

Thanks for your input Emily.

– Greg

I am still reading “Thinking, Fast and Slow” by daniel kahneman. cannot recommend it highly enough, including many points that this alludes to. He and Taleb share are very complementary. but one of his main points is that humans tend to have a much higher risk tolerance when there is a big gain but very low probability of it actually happening. interesting to think about that in this context.

Yes, that was an excellent book. I posted about it a few times:

https://digitaltonto.com/2012/the-problem-with-pundits/

https://digitaltonto.com/2012/when-should-we-go-with-our-gut-and-when-should-we-look-before-we-leap/

https://digitaltonto.com/2012/irrational-expectations/

– Greg