IS BIG DATA TOO . . . BIG? LEARN HOW TO AVOID DECISION WHIPLASH AND SORT THROUGH THE NUMBERS IN THE MOST EFFICIENT WAY--THE NEXT BIG BREAKTHROUGH MIGHT DEPEND ON IT.
BY: GURJEET SINGH
A highly anticipated drug trial fails to produce the desired results, costing a pharmaceutical company $500 million and 10 years of wasted research.
An energy company finds out too late that several major drilling bets are coming up dry, forcing them to take a $2 billion write-off.
Investigators follow leads for years to uncover a planned terrorist attack on a major city, only to make the fatal error in determining the location.
The pressure to discover breakthroughs is tremendous. In large organizations, it is often the difference between market success and market exit. Today, organizations need to deliver solid results within shorter innovation cycles. Research groups need to show a return on strategic investments. This “need for speed” heaps greater risk on an already risk-laden process and can sometimes make misleading discoveries look like real ones. Luck may shine once in a blue moon to produce an accidental discovery like Herceptin, but all too often misinterpretations can lead to disaster.
The intense pressure to uncover the “Next Big Thing” is our collective reality and it’s here to stay. That has led businesses and government agencies to ramp up Big Data efforts--a Gold Rush of sorts--to compile the richest and most comprehensive treasure troves of data they can to help make the best decisions possible. With greater computing power and collection technology, we now have vast datasets that hold the promise to solve some of our most challenging problems.
Few if any big decisions today are as simple as A v. B nor are the relative opportunity costs clear. Decisions related to drug discovery, energy exploration, fraud detection, and other critical problems can generate a variety of impacts downstream. And today misinformed decisions can cause a company more damage, faster than ever before.
This reality recalls the “Bullwhip Effect,” a concept popularized by Stanford University Professor Hau Lee to describe the oscillating effects caused by incorrect signal data in forecast-driven supply chains. When a person cracks a bullwhip, the small movements at the wrist produce huge waves at the other end of the whip, which describes how information becomes exaggerated and distorted as it moves up the chain, driving up costs and hurting efficiency. Great advancements have been made over the last 30 years to help companies deal with the perils of the Bullwhip Effect.
But there are no such countermeasures to minimize the bullwhip arc if decision makers do not understand what the data is telling them. When trying to develop a new drug, prevent terrorism, or identify fraud, the inputs and attributes are far more diverse, creating a complex puzzle to distinguish leading and misleading indicators.
Despite a wealth of data, decision-making today is harder, not easier. The issue is not the size of data, but the complexity. While data-crunching tools have become faster and better able to deal with large volumes of data over the years, they still all still begin with an Analyst and a query.
On the surface, we seem to have everything that we need to solve these problems. We have relatively inexpensive computation. We have a burgeoning discipline of Data Scientists and Analysts to build sophisticated models. We have faster data-crunching tools than ever before. And, we have large investments earmarked for addressing expensive problems. So why is this still so hard?
Put simply, while IBM’s Watson kills at Jeopardy, we are still confounded by the Jeopardy issue: What is the right question to ask? Every Big Data exploration starts with human assumptions and biases that amount to an educated guess in the form of a query.
With more larger and complex datasets, it is simply too difficult for the brain to the make connections that lead to making the optimal query. Instead, we spend months or years building models that examine only slices of the data, a highly unlikely path to uncovering critical discoveries or actionable insights. When it looks like we’re failing, we pile more humans on the problem. The simple truth is that--with the exponential growth of data--we’ll never have enough trained talent, or enough time to write all of the possible queries, to find the answers that we’re all looking for.
The complexity of today’s data sets--and so many investments in flawed insights--has forced decision makers to question the methods that they use for analysis. Just as in the case of the Bullwhip Effect, research teams need to go back to the start, to fix the fundamental problem that generates sub-optimal or just plain bad decisions.
In the world of Big Data, there is a wide spectrum of interplay between the human brain and machine learning systems. Think of it like a slider. Right now, our reliance on people to ask the right questions and identify the important connections between millions of data points, is too far over. Machine learning systems have made tremendous strides over the last few years and it’s time that we move that slider over and let systems do more of the heavy lifting, particularly at the beginning of the data analysis process. When presented with a holistic view of the data, Data Scientists can then examine valuable data in an agnostic manner and identify the relationships between them in a way they could not before. They can start by finding the answers to questions that they didn’t know to ask in the first place.
Let’s use both humans and machines to their best advantage. Computers do more of the computing over complex datasets and analysts do more of the analyzing. Instead of trying to ask the right question, we let those who best understand the problem--biologists researching cancers, geologists searching for energy sources, intelligence officers working to prevent terrorist attacks and other domain experts--find the right insight that inform sound investments to catalyze growth and save lives. After all, isn’t this the true promise of Big Data that we all dream of?
--Gurjeet Singh is cofounder and CEO of Ayasdi, an enterprise software company specializing in big data analytics. Follow them on Twitter at @ayasdi.