IT Brief New Zealand - Technology news for CIOs & IT decision-makers
Story image

Databricks launches LakeFlow to streamline data engineering

Fri, 14th Jun 2024

Databricks has announced the launch of Databricks LakeFlow, a new solution designed to unify and simplify data engineering processes from data ingestion to transformation and orchestration.

LakeFlow aims to streamline data ingestion for teams by providing scalable connectors for databases such as MySQL, Postgres, and Oracle, as well as enterprise applications like Salesforce, Dynamics, Sharepoint, Workday, NetSuite, and Google Analytics. The solution also introduces Real Time Mode for Apache Spark, enabling ultra-low latency stream processing.

LakeFlow automates the deployment, operation, and monitoring of data pipelines at scale with built-in support for CI/CD and advanced workflow capabilities that include triggering, branching, and conditional execution. Additionally, it integrates data quality checks and health monitoring with alerting systems such as PagerDuty, thereby simplifying the task of building and operating production-grade data pipelines for busy data teams.

In a statement, Databricks described the purpose of LakeFlow: "Data engineering is essential for democratising data and AI within businesses, yet it remains a challenging and complex field. Data teams must ingest data from siloed and often proprietary systems, including databases and enterprise applications, often requiring the creation of complex and fragile connectors." The company added that LakeFlow aims to address these challenges by providing a unified experience built on the Databricks Data Intelligence Platform, featuring deep integrations with Unity Catalog for governance and serverless compute for efficient, scalable execution.

LakeFlow is composed of three main features: LakeFlow Connect, LakeFlow Pipelines, and LakeFlow Jobs.

LakeFlow Connect simplifies and scales data ingestion from diverse data sources. The feature offers a wide range of native, scalable connectors for various databases and enterprise applications. Moreover, these connectors are fully integrated with Unity Catalog for robust data governance. LakeFlow Connect incorporates the efficient capabilities of Arcion, which Databricks acquired in November 2023. This makes data available for batch and real-time analysis irrespective of size, format, or location.

LakeFlow Pipelines focuses on simplifying and automating real-time data pipelines. Built on Databricks' Delta Live Tables technology, it offers data teams the ability to perform data transformation and ETL using SQL or Python. Real Time Mode can be enabled with no code changes, eliminating the need for manual orchestration and unifying batch and stream processing. This feature supports incremental data processing, thereby optimising price/performance.

Finally, LakeFlow Jobs facilitates the orchestration of workflows across the Data Intelligence Platform. The feature automates orchestration, data health, and delivery, spanning from the scheduling of notebooks and SQL queries to ML training and automatic dashboard updates. It offers enhanced control flow capabilities and full observability to help detect, diagnose, and mitigate data issues, thereby increasing pipeline reliability. This unified approach makes it easier for data teams to meet their data delivery commitments.

LakeFlow is positioned to address common issues in data engineering, such as the creation and maintenance of intricate logic for data preparation and the need for multiple disparate tools for deploying pipelines and monitoring data quality. The fragmentation and incompleteness of existing solutions have often led to low data quality, higher costs, and an increasing backlog of work for data teams.

Follow us on:
Follow us on LinkedIn Follow us on X
Share on:
Share on LinkedIn Share on X