Why Big Data Matters
Terabytes, Petabytes, Exabytes. Who can keep track? These strange terms have just begun to enter the business lexicon, but the hype surrounding them has reached a fever pitch. We have undoubtedly entered the age of big data.
Yet it’s hard for many to take it seriously. While the blogosphere buzzes and millennials preach, most serious business people are focused on their jobs. They have partners, customers and employees to keep happy and all the techy mumbo jumbo just doesn’t seem relevant.
Yet big data is important because it will transform how we manage our enterprises. For most of the 20th century, business leaders relied on “scientific” studies and “statistical significance” to determine what information they could trust. Now, technology is making those assumptions obsolete and the practice of management will never be the same.
A Big Discovery of a Small Planet
In 1801, the astronomer Giuseppe Piazzi noticed a small body in the night sky. At first, he thought it was a comet, but it soon became clear that it was, in actuality, the dwarf planet now known as Ceres. He tracked it for 40 days before it disappeared behind the sun and astronomers were unsure how to find it again.
It was then that a young man named Carl Friedrich Gauss applied his method of least squares and was able to predict the orbit of Ceres. Shortly after, as if by magic, it appeared exactly where Gauss said it would be. He went on to become the most influential mathematician of his time.
In business, we often run into very similar problems. We need to make decisions based on incomplete information in a rapidly changing context. So not surprisingly, Gauss’s work has formed the basis of many of the statistical techniques that modern day management employs, such as regression analysis, to make sense of a messy world.
Rules For Control
For nearly a century, Gauss’s ideas were mainly a subject for academics, but in the 1920’s Ronald A. Fisher burst onto the scene. He set out rules for the design of experiments, confidence intervals and statistical significance, among other things. Underlying his methods was an emphasis on controls. Put good data in and you would get good answers out.
Before long Fisher’s methods were adopted by business, culminating in the Six Sigma movement that purported to achieve stable and predictable results. Much like Fisher’s earlier efforts, it was thought that by controlling every aspect of the process, uncertainty could be tamed and management could be transformed from an art into a science.
Yet all was not well. Many, Nassim Taleb in particular, argued that control was a dangerous illusion. Anything that met a basic standard of statistical significance (usually 95% confidence) was treated as fact. False certainty led managers to discard inconvenient information as “outliers,” often with disastrous results.
An Alternative Approach
Hovering in the background all this time was an alternative approach called Bayesian inference, which allowed you to simply make a guess and then revise your judgement as new information came in. It was, in many ways, the polar opposite of Fisher’s approach. No specific controls, no rules about significance, just an updating of probabilities.
Although Bayesian methods were successful in some important cases where controlled studies weren’t an option, such as hunting German subs during World War II, they weren’t widely deployed. Part of the reason was that Fisher and his followers fought hard against them, but mostly it was because they were impractical. It was hard to gather enough data to make them work.
That’s what big data is starting to change. The combination of accelerating returns in storage and processing power, along with a sea of data from the Web of Things and increasingly efficient algorithms, are making Bayesian methods not only practical, but faster, cheaper and more accurate than the traditional approach.
A Fundamental Shift
Like any new technology, there is a lot of confusion surrounding big data. There are endless debates about what is and isn’t big data, armies of consultants who are eager to muddy the waters in return for a hefty retainer fee and the usual amount of hype and alphabet soup of acronyms and buzzwords.
But what you really need to know about big data is this: It represents a fundamental shift in how we do things. In effect, big data opens the door to a Bayesian approach to strategy where we no longer try to be “right” based on controlled research and small samples, but rather become less wrong over time as real world information floods in.
The truth is even the old mantra of “failing fast and cheap” is becoming too slow and expensive. That’s why there is a growing divide between businesses who use data effectively and those who don’t. Big data means much more than a change in technology, it represents a structural transformation is how we will manage our enterprises.
– Greg
Whew! Greg! What a wonderful post! Thanks for writing this. What a coincidence – I was planning to write to you this week, to get some understanding of Big data – your post addresses the philosophy and answers some unasked questions too! Thanks once again Greg!
Thanks Ajoy! I’m glad it was helpful.
– Greg
Thanks for clarifying the foresighting role of “big data”.
A well constructed and argued post Greg and on the face of it we all agree that Big Data will radically change the way we manage our businesses going forward and as you say will require among other things a structural change.
However as people ponder this, it remains a concern for me that we could end up with businesses paralysed into not making decisions, rather wanting more and more information to justify a decision. Worse that for many businesses (SME’s) the sheer volume of data becomes overwhelming, making it difficult to identify what is the most relevant information for a business to succeed, separating the ‘wood from the chaff’
I guess I am saying all advancements come with health warnings, and should be treated thus?
Glad it was helpful:-)
Kevin,
I hear what you’re saying, but from what I can see SME’s are actually better off. Most of this stuff can be done very cheaply in the cloud.
– Greg
Another fascinating example….
“Schadt got a job at the pharmaceutical giant Merck and, availing himself of the Merck supercomputer, became one of the leading exponents of the medical use of what became known as Big Data. He also had amazing success coming up with new drugs for Merck, to the extent that at one point half the drugs in development started in Schadt’s lab. Then he told Merck that they wouldn’t work. What data had taught him was that the underlying faith of molecular biology—of all biology, since Watson and Crick had elucidated the structure of the DNA molecule—was false. Untold billions had been spent in the hope that we could understand disease one gene at a time, or one genetic pathway at a time; by targeting the gene or the pathway “for” Alzheimer’s disease, say, we could target Alzheimer’s disease itself. Schadt told Merck that this was a strategy doomed to fail, because disease arose not from single genes or pathways but rather out of vast networks of genes and pathways whose interactions could be understood only by supercomputers guided by abstruse algorithms.
He wound up at a company that made advanced gene sequencers, Pacific Biosciences. There he tested his network model by resolving to become the “hub” of networks of collaborators. He did his supercomputing with Amazon; he put forth an idea of mapping pathogens in public places that attracted the attention of Google…”
http://www.esquire.com/features/patient-zero-1213