Data lakes transforming the enterprise data warehouse

Thu, 14th Jan 2016

FYI, this story is more than a year old

The introduction of data lakes is one of the most significant changes to enterprise data warehouse technology, according to CenturyLink's head of business development Martin Hooper.

Hooper says classic enterprise data warehouse architecture is evolving under the influence of new technologies, new requirements, and changing economics.

He says data lakes, large storage repositories and processing engines, are transforming the way data is handled by enterprises.

“Data lakes let enterprise data warehouses store massive amounts of data, offer enormous processing power, and let organisations to handle a virtually limitless number of tasks at the same time,” Hooper explains.

Classic enterprise data warehouses have sources feeding a staging area, and data that is consumed by analytic applications, he says.

“In this model, the access layer of the data warehouse, known as the data mart, is often part of the data warehouse fabric, and applications are responsible for knowing which databases to query.

According to Hooper, in modern enterprise data warehouses, data lake facilities based on the Apache Hadoop open source software framework replace the staging area that sits at the centre of traditional data warehouse models. While data lakes provide all of the capabilities offered by the staging area, they also have several other important benefits, he says.

“A data lake can hold raw data forever, rather than being restricted to storing it temporarily, as the classic staging area is,” Hooper explains.

“Data lakes also have compute power and other tools, so they can be used to analyse raw data to identify trends and anomalies.

“Furthermore, data lakes can store semi-structured and unstructured data, along with big data.

Using Hadoop as an enterprise data warehouse staging area is not a new concept, says Hooper.

“A data lake based on Hadoop not only provides far more flexible storage and compute power, but it is also an economically different model that can save businesses money,” he says.

In addition, a data lake provides a cost-effective, extensible platform for building more sandboxes, which are testing environments designed to isolate and execute untested code, Hooper explains.

“A Hadoop staging approach begins to solve a number of the problems with traditional enterprise data warehouse architecture, while full-blown data lakes have created an entirely new data warehouse model that is more agile, more cost-effective, and provides companies with a greater ability to leverage successful experiments across the enterprise, resulting in a greater return on data investment,” he says.