We all understand that big data will “become normal” and expected sometime near 2018. The disruption of the new processing style and the resulting infrastructure now has to mature.
The new vendors that have emerged are proclaiming their victory over tradition.
At the same time they are scurrying to build out the robust administrative, productivity, deployment and optimisation framework that already exists in traditional data management practices. I use the word “scurrying” quite on purpose.
There are some heavy traditional boots stomping around in the data management kitchen and some mice will be ducking and running to hide in the corners—and starve.
A few brave mice will run to the centre of the kitchen, steal some of the data management cheese and avoid the spring-loaded trap waiting for them.
With this new phase of big data adoption and evolving the technology, there are two looming questions.
First, when will big data technology and techniques become part of normal data management? We did big data many times before. Client/server and relational databases did it.
Remember, flat files were bloated, processing code was overly complex and difficult to manage and the centralized mainframe computing system was costly.
Voila, we fixed it. And it took almost twenty years to mature. Remember the first relational databases mounted on client/server infrastructure. Ugh. Then it happened again.
Too many applications that create their own copies of data, how can we reconcile this mess? So, we built these things called data warehouses (but not before we tried Executive Information Systems).
And giant data warehouses with billions of rows of data and hundreds if not thousands of users began demanding more and more from them.
And it took more than twenty years for everyone to figure out the data warehouse (remember, Sabre was 1976, Frito Lay and Coke had their data warehouses LONG before Kimball and Inmon popularized the terminology). XML was going to fix data transfer rates. We are approaching twenty years there too.
Almost everyone equates Hadoop with Big Data, so let’s trot out that timeline too. Open-sourced in 2005, we are now ten years into the maturity cycle. Right on schedule about three years ago, the hype machine cranked up—in the sixth year of emergence.
Here we are in year ten, and everyone is now demanding more robust development, deployment management, optimization consistency. We like to think that IT is cranking out new technology faster and faster—but it isn’t. It’s twenty years or bust. When examining the current information management market—who will succeed faster?
The traditional, already mature and robust environment that needs to add new forms of processing and information types into its existing management approach; or, the new processing and asset management system that has to build twenty years of maturity in the next three years or get caught in the light in the middle of the kitchen?
Second, how will big data technology and techniques mature and take its rightful role in the information infrastructure and processing world? What is that rightful role?
I argue that in the traditional analytics world getting the requirements correct was the key. In the new world, letting the analysts determine the requirements through usage is the key. Why not take advantage of all of this hyper fast hardware and networks?
Let’s face facts, the only reason ANYONE captures information is to share it at some point—otherwise why capture it? So, the users are the analysts and operations teams who need to share what was done about their part of the business process with someone or something else or to see what was done somewhere else by someone else.
Users fit into four big categories:
* Casual users who want clean data that they don’t have to think too much about because someone else already thought of what it should look like
* Analysts believe they can manipulate data, but they really only manipulate the data they have within a business process model they are familiar with
* Data miners understand data, sourcing processing logic and more and there are very few of them in any organisation (although many analysts think they are miners)
* Data scientists who are, well, different than miners because they can geek speak about mathematics, business processes and data simultaneously while they are creating graph analytics in their heads.
This gets to the role that big data technologies will play in the new world of information infrastructure and management. It will be the job of this technology to render and evaluate new candidate models of analysis. The miners and scientists will play in this space and have a field day.
But they will use their tools and the data to develop all viable uses of data and then the scientists will become bored or otherwise engaged. So, the miner who helped develop all of these wonderful candidates and thinks to themself, “Wow, we should use these,” will want to show the Analysts their pretty analytics candidates.
And IT will say, wait, you are putting a very sharp object into a Toddler’s nimble fingers. So IT will develop semantic tiers to present the many candidates and track which are the highest ranking contenders for optimisation.
The analysts will marvel at all the shiny new models and then start picking the primary contenders that best represent likely or interesting scenarios.
The analysts will use different candidates until they develop those contenders and eventually casual users will wonder into the contender world and ask, “Can I simply have that ONE model there? I like that model—it is the best compromise for all of us.” And IT will put that into the data warehouse and data marts for all to see.
Think, 20 years or bust (and we are in year ten). Think candidates, contenders and compromises (and supporting all three from now on).
This is the new world after big data changes it.