AWS enhances S3 with Apache Iceberg & metadata support
Amazon Web Services (AWS) has announced enhancements to its Amazon Simple Storage Service (Amazon S3), which now supports managed Apache Iceberg tables and automatic metadata generation.
Amazon S3 Tables are the first cloud object store with integrated Apache Iceberg table support. This new bucket type is designed to optimise storage and querying of tabular data, resulting in up to three times faster query performance and ten times higher transactions per second. Andy Warfield, Vice President, Storage and Distinguished Engineer at AWS, stated, "As the leading object store in the world with more than 450 trillion objects, S3 is used by millions of customers, and we continue to innovate to remove the complexity of working with data at an unprecedented scale."
These features are geared towards enhancing customers' ability to work with large datasets efficiently, especially concerning tabular data stored in formats such as Apache Parquet. Warfield commented, "We have seen the rapid rise of tabular data and, increasingly, customers want to query across tables, improve query performance, and understand and organize troves of data so they can easily find exactly what they need. S3 Tables and S3 Metadata remove the overhead of organizing and operating table and metadata stores on top of objects, so customers can shift their focus back to building with their data."
The managed Iceberg tables in Amazon S3 support various third-party analytics tools, allowing users to perform comprehensive analyses without the need for extensive infrastructure. These tables also provide row-level transactions and advanced data management features, such as automatic compaction and snapshot management. This integration is expected to simplify tasks traditionally requiring dedicated systems, thus reducing costs and resource demands for customers.
Amazon S3 Metadata enhances data discovery by delivering near real-time, queryable metadata, which eliminates the necessity for businesses to construct and maintain elaborate metadata systems. Organisations, such as Roche, are planning to leverage this system to simplify their metadata management, thereby expediting their generative AI initiatives. CMT, the world's largest telematics service provider, also stands to benefit from the capabilities of S3 Metadata, as it allows them to effectively query vast amounts of data.
Additionally, AWS outlined the integration of S3 Tables with AWS analytics services and third-party open source tools, such as Amazon Athena, Amazon QuickSight, and Apache Spark, highlighting the flexibility and accessibility of these new offerings. Genesys, a global AI-powered experience orchestration leader, is among the companies that plan to utilise Amazon S3 for its data lake operations. With S3 Tables, Genesys looks forward to providing a more streamlined materialised view for its data analysis processes.
The S3 Metadata system generates metadata such as object size and source, which can be queried via S3 Tables. This helps users in various sectors organise and swiftly identify relevant datasets. Organisation-specific metadata can also be annotated to cater to specific business needs, facilitating advanced AI and machine learning applications. This feature is poised to enhance business analytics and real-time inference applications.
S3 Tables are currently generally available, while S3 Metadata is in preview. When fully integrated, AWS customers will be able to query and visualise data using AWS services like Amazon Athena, Redshift, EMR, and QuickSight, utilising the entire suite of Amazon S3's new capabilities.