Many companies have already invested in “big data” analysis, and many that haven’t already declared that they plan to go that route in 2016. These companies invest thousands in creating data infrastructure, hiring specialists, and much more. Many people in the IT field have even gone back to school to become data analysts because it has become a very lucrative field.
Why do we do this? The whole point of data, big or small, is to predict outcomes or highlight areas that need improvement. If big data doesn’t help us do that any better than small data (or personal experience), it’s worse than useless — it’s an expensive distraction from more appropriate paths to improving decision making. That’s why it’s troubling that the foundation of what we know about big data may actually be incorrect!
Questions about Big Data
There are a number of different questions to ask about big data:
- How much big data exists?
- What is a company’s average amount of big data?
- How much noise (useless information) is, on average, in big data sets?
While we think we have answers to these questions, we don’t necessarily do.
How Much Big Data Exists?
How much big data is out there? It would be very helpful to have an estimate of this, especially for those who are considering going into the big data industry. It also seems to be an easy question to answer — if big data is so important, shouldn’t we have a rough estimation of how much data is out there and at what rate it’s growing? Unfortunately, a search through academic journals, reports, books, and news stories won’t necessarily reveal the answer. No one really knows how much big data exists, and finding out is a little more difficult than you might think. It would require doing an in-depth study using data provided by many different companies, and some of them simply do not want to release that information. Without this key piece of information, there’s no way to figure out the average amount of big data.
Do Companies Really Benefit?
Another question is how many companies are really using and benefitting from big data. A survey done in April 2015 by Vanson Bourne states that 90 percent of all companies using big data benefit from it. It also states that most of these businesses have more than one Petabyte of data. However, a number of other reports and surveys contradict this. A number of them, in fact, state than less than 30 percent of all companies that use big data are using the information to make market predictions. A survey done by KDnuggets reveals that 70 percent of respondents said that their largest data set was no more than one Terabyte in size.
Don’t get me wrong, I love data in all of its forms, but let’s call it what it is (and not over-invest in shiny object technologies)
One problem is that studies and articles almost always end up talking about how companies are investing in big data research, but they rarely actually go into detail about that data. Some even go so far as to try to predict how industries will change thanks to big data, but again, they never state how much data exists or give any specifics on how that data has been used. Without these important numbers, not much can really be factually stated about big data. Some of the predictions may be useful by a small number of companies or people, but if only a third of those companies are actually using big data, how useful is it really? These questions can’t currently be answered, but perhaps somewhere down the line, we’ll have the information we need.
What should you do? Most importantly, stop feeling like “everybody else is doing it”! Next time you read a breathless report about the latest innovation out of Google, or how the modern American is generating as much information as the Library of Alexandria with every tweet, sit back and remember that everyone loves to talk about the outliers. When it comes to your business decisions, objectively evaluate the costs and benefits of investment in new infrastructure, and remember that it’s not an either-or thing. Maybe your server logs could use a distributed filesystem for long term storage.
Whatever you decide, make sure it’s your decision.