Skip to content

The 3 Things About Data You Probably Don’t Know, But Need To

2017 January 18
by Greg Satell

In 1977, at the Xerox World Conference held in Boca Raton, FL that year, the company’s senior executives got a glimpse of the future. On display was a new kind of computer, the Alto, that was designed for a single person to use with nothing more than a keyboard and a small device called a “mouse” that you operated with one hand.

They were not impressed. The tasks that the machine performed were mainly for writing and handling documents, secretarial work in other words, which did not excite them. And for executives who measured their performance by how many copies they generated, they didn’t see how the thing could make money.

Times have changed, of course, and today it’s hard to imagine any executive functioning without a computer. We’re now going through a transformation similar to that of the 1970’s. Today, every manager needs to work with data effectively. The problem is that most are as ill equipped as those Xerox executives in the 1970s. Here’s what you most need to know:

1. Data Are Often Subject To Human Error

Numbers that show up on a computer screen take on a special air of authority. Data are pulled in through massive databases and analyzed through complex analytics software. Eventually, they make their way to Excel workbooks, where they are massaged further into clear metrics for decision making.

Yet where does the all that data come from? In many cases, from lowly paid, poorly trained front-line employees recording data on clipboards as part of their daily drudgery. Data, as it’s been said, is the plural of anecdote and is subject to error. We can — and should — try to minimize these errors whenever we can, but we will likely never eliminate them entirely.

As MIT’s Zeynep Ton explains in her book The Good Jobs Strategy, which focuses on the retail industry, even the most powerful systems require human input and judgment. Cashiers need to ring up products with the right codes, personnel in the back room need to place items where they can be found and shelves need to be stocked with the right products.

Errors in any of these places can result in data errors and cause tangible problems, like phantom stockouts, which can lead to poor decisions higher up in the organization, like purchasing and marketing. These seemingly small mistakes can be incredibly pervasive. In fact, in one study it was found that 65% of a retailer’s inventory data was inaccurate.

Lapses like these aren’t confined to low-level employees either. Consider the case of two Harvard economists who published a working paper that warned that US debt was approaching a critical level. Their work caused a  political firestorm but, as it turned out, they had made a simple Excel error that caused them to overstate the effect that debt had on GDP.

2. Your Numbers Are Always Wrong

Our access to data is always limited. We may look at a day of sales, or a week or even a year, but that is just a small slice of reality. If we look at a typical marketing survey, what we see is almost always a small sample. Studies are supposed to be controlled to make the sample representative, but the methods are far less than perfect.

The upshot is that our numbers are always wrong. Sometimes they are off by just a little and sometimes by a lot, but they never perfectly reflect reality at the moment. That may be the result of controls being overlooked or data mishandled or just plain bad luck, but whatever the reason, we should never take data at face value.

One alternative to traditional statistical methods that has been gaining traction in recent years is Bayesian analysis, which allows us to continually update our judgements as new information comes in. In effect, Bayesian methods don’t assume the data we have is right, but allow us to become “less wrong over time.”

In effect, that’s why big data matters. Today, digital technology enable us to collect and access massive amounts of information continuously. New open data standards, such as Hadoop and Spark, also allow us to store that data and combine it with other sources so that we can revisit earlier conclusions and correct mistakes.

3. Your Logic Is Flawed

Let’s return to the retail industry for a moment. Labor costs for a typical retail operation are about 15% of sales, so controlling payroll is key to maintaining profitability. Keep too many employees in the store and your costs will skyrocket, but leave too few and customer service will suffer, causing you to lose sales.

So it is natural to tie staffing to sales. Using past data, you can build a predictive model that will determine what your staffing needs should be on any given day for each location. Store managers can then use software to schedule the amount of salespeople to have on the floor that will maximize sales while minimizing payroll costs.

But what happens when an unforeseen event, like a heavy storm or a traffic accident reduces sales on a particular day? Most probably, that will lead to understaffing on future days, which will depress sales further and validate the idea that you need less staff in the store. Before long, a vicious cycle ensure, where fewer staff leads to reduced sales.

That’s why data scientists suggest using data that is mutually exclusive and collectively exhaustive (MECE) to avoid these kind of feedback loops and to continually test your conclusions with sources outside your model. In the example above, for instance, good communication with store managers can help to identify and fix the problem.

These types of errors arise from something psychologists call availability bias. We tend to base our judgments on the information that is most available, such as store sales and payroll costs, and neglect other factors that don’t fit as neatly into data models, such as bad weather and sales lost through poor service.

Manage For Mission Not For Metrics

Clearly, data literacy, much like basic computer skills, is becoming an essential skill for every executive. Moreover, there is increasing evidence of a growing divide between businesses who use data effectively and those who don’t. Much like those Xerox executives in the 1970s, we can’t simply turn our heads because we think data analysis is someone else’s job.

We also shouldn’t lose sight of the fact that data is a means to an end and not an end in itself. We need to manage for mission, not for metrics. The purpose of an enterprise is to serve customers, employees and other stakeholders well, not merely to keep score. That’s why we need to see data as a tool for discovery as well as evaluation.

As Steve Hillion, Chief Product Officer at Alpine Data told me, “We need to take a more exploratory approach and think more about how we can design systems to impact businesses, rather than just evaluate operational activity. We can no longer separate analysis and action, we need to integrate data science teams with line managers.”

Data has been called “the new oil” and for good reason. Like energy, talent, finance or any other resource, managing it effectively is becoming essential to competing in today’s marketplace. Managers need to take data analysis as seriously as they would any other crucial function.

– Greg


An earlier version of this article first appeared in

2 Responses leave one →
  1. January 22, 2017

    By co-incidence I am currently editing a paper on this kind of science.

    I have passed a copy of this to the student who is writing it.

    And Greg, thank you, what you have written does give the science a good perspective.


  2. January 22, 2017

    Thanks Edward. I very much appreciate that.

    – Greg

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS