How data warehouses have become the new data lakes for business
FYI, this story is more than a year old
Article by Snowflake APAC vice president of sales engineering, Alan Eldridge.
With the volume of data generated by businesses of all types growing exponentially, many are looking for more effective ways to manage and make use of it.
Until recently, the typical approach was to make use of data warehouse platforms to store and analyse data. This worked well when data volumes were contained, but in many cases this situation has changed.
Now, machine-generated data is appearing in ever increasing amounts. Automated transactions, machine-to-machine interactions, and sensor networks have created a tidal wave of data that must be captured and managed. The challenge here is that much of this data is un- or semi-structured and so doesn’t readily sit within a traditional data warehouse structure.
The rise of the data lake
To cope with this trend, growing numbers of organisations have embraced the concept of a data lake. These vast pools contain data drawn from multiple locations held in multiple formats. Often, the organisation may not be certain how or where the data will be used. All that’s known is that it has some business value and so must be stored for future analysis.
While data lakes are great when it comes to storage, they don’t perform well when it comes to analysis and reporting. The vast volumes and multiple formats mean that traditional data warehouse tools are unsuitable and another approach needs to be found.
Many organisations looked to technologies such as Hadoop to solve the problem. They believed Hadoop could be used to gain insights from data lakes that, in turn, could support business decision making.
The reality, unfortunately, was somewhat different. The performance of these platforms could not keep up with business demands, leading to frustrated users. Clearly, another approach was required.
Thankfully, database technologies have improved significantly in recent years. New platforms, such as Snowflake, allow data to be imported directly from a data lake into a relational database for rapid manipulation and analysis.
Having this capability means IT teams no longer have to take a different approach when making use of a data lake. All data in the lake is immediately available rather than having to be cascaded through a series of pipelines before it can be put to work. Essentially, the data warehouse has become the new data lake.
For organisations that have already made an investment in an existing data lake strategy, this new approach is particularly attractive. Their investment in a cloud platform such as S3 or Azure does not need to be ripped and replaced, but instead can be augmented.
Essentially a next-generation data warehouse platform can be deployed in front of the data lake and data mapped through for easy access and use. In this way, all data appears to be in the data warehouse but, in reality, it remains in the data lake infrastructure.
The future of the data warehouse
Being able to take this approach will make establishing a data lake more attractive for organisations. Many may have been put off by the perceived complexity or didn’t fully understand how they would be able to make use of their data in a meaningful way.
By linking a next-generation data warehouse platform to your data lake, you won’t need a new set of set of skills or have to construct complex pipelines to shift data from one to the other. The infrastructure will streamlined and able to quickly add significant business value.
Data lakes are going to continue to grow in size and complexity as volumes of data increase. Having a data warehouse platform that can cope with this volume will become a real business asset.
Your Data warehouse is your new data lake.