Cloud data warehouse trends and best practices
FYI, this story is more than a year old
According to a new survey conducted by TDWI on behalf of Talend, new cloud data warehouses (CDW) offer broader data capabilities, stronger performance, and greater flexibility than traditional on-premise databases.
However, the survey found that while CDWs are often an important first step in digital transformation, enterprises need to follow some best practices to overcome implementation challenges and increase investment return.
TDWI senior research director Philip Russom says, “TDWI sees a wide range of data-driven IT systems moving to the cloud aggressively, and this includes the data warehouse.
“Cloud gives the data warehouse the elastic scale, agnostic storage, multi-tenant access, and controlled cost it needs for modern requirements.”
“However, cloud data warehouses should be complemented with substantial data integration infrastructure to unify the many pieces of the warehouse with all the data sources and targets available.” Decision Resources Group (DRG) is one example of this. As a company that manages comprehensive data repositories covering 90% of the U.S. healthcare system, DRG was struggling to combine and organise their disparate data sources.
Healthcare data is stored in a structured way, creating the need for DRG to clean and normalise millions of records and group data to assess patient needs and market conditions.
DRG was successful in switching to a cloud-first strategy by implementing Talend and the Snowflake cloud data warehouse as the foundation of its new Real World Data Platform. Because of this, DRG became 150% more productive without increasing costs, and have since onboarded 100 terabytes of data in just three months.
The organisation can now supply more meaningful data enabling physicians to understand different patient populations and provider markets to interact with them in a more optimised fashion. While survey respondents noted that adopting CDWs was critical to helping them achieve faster performance and lower costs, and take advantage of cloud features, there were a number of challenges associated with CDWs as well.
Over 50% of respondents indicated data governance as a top challenge, closely followed by integrating data across multiple sources at over 40% and getting data into the warehouse at about 38%.
Organisations data analytics needs in a CDW are becoming increasingly complex. Over 35% of respondents expressed the need for in-memory processing, supporting structured and unstructured data, and integration with third-party analytics tools.
As a result, CDWs need to accommodate a wide variety of data and serve a broad range of technical use cases. Interestingly, 62% of respondents in the process of implementing CDWs want them to complement a data lake for analytics.
All survey respondents were interested in features such as data quality, metadata management, processing and transforming data both before and after data is loaded to a CDW.
As these requirements cannot be met solely by CDW technologies, the response suggests a need for integration solutions to complement the infrastructure.
CDWs have to be enabled to accommodate a range of use cases, from business to technical, and support increases in speed and scale, while handling both current and future needs.