Apache has released version 4.0 of Apache Cassandra, the open source distributed big data management platform.
The NoSQL database handles massive amounts of data across load-intensive applications. Cassandra's largest production deployments include Apple, with over 160,000 instances and 100 petabytes of data across 1,000+ clusters, Huawei, with over 30,000 instances across 300+ clusters. And Netflix, with over 10,000 instances and 6 petabytes across 100+ clusters, and over 1 trillion requests per day.
Cassandra originated at Facebook in 2008, entered the Apache Incubator in January 2009, and graduated as an Apache Top-Level Project in February 2010.
“A long time coming, Cassandra 4.0 is the most thoroughly tested Cassandra yet,” says Apache Cassandra vice president, Nate McCall.
“The latest version is faster, more scalable, and bolstered with enterprise security features ready-for-production with unprecedented scale in the Cloud.
Three years in the making, version 4.0 signifies more than 1,000 bug fixes, improvements, and new features that including:
- Increased speed and scalability, streaming data up to five times faster during scaling operations.
- Improved consistency keeps data replicas in sync to optimise incremental repair.
- Enhanced security and observability, audit logging tracks users access and activity.
- New configuration settings, exposed system metrics and configuration settings.
- Minimised latency, garbage collector pause times are reduced to a few milliseconds with no latency degradation as heap sizes increase.
- Better compression, improved compression efficiency eases unnecessary strain on disk space.
“In our experience, nothing beats Apache Cassandra for write scaling, and we're looking forward to the performance and management improvements in the 4.0 release,” says Backblaze senior systems administrator, Elliott Sims.
“We rely on Cassandra to manage over one exabyte of customer data and serve over 50 billion files for our customers across 175 countries, so optimising Cassandra's capabilities and performance means a lot to us.
Netflix engineering manager and Cassandra committer, Vinay Chella, says Netflix uses Apache Cassandra heavily to satisfy its ever-growing persistence needs. He says Netflix has been experimenting and partially using the 4.0 beta in its environments and testing features like Audit Logging and backpressure.
“Apache Cassandra's contributors have worked hard to deliver Cassandra 4.0 as the project's most stable release yet, ready for deployment to production-critical Cloud services,” says Apache Cassandra contributor, Scott Andreas.
“Cassandra 4.0 also brings new features, such as faster host replacements, active data integrity assertions, incremental repair, and better compression. The project's investment in advanced validation tooling means that Cassandra users can expect a smooth upgrade.
He says once released, Cassandra 4.0 will also provide a stable foundation for the development of future features and the database's long-term evolution.
Apache Cassandra is used by Activision, Apple, Backblaze, BazaarVoice, Best Buy, Bloomberg Engineering, CERN, Constant Contact, Comcast, DoorDash, eBay, Fidelity, GitHub, Hulu, ING, Instagram, Intuit, Macy's, Macquarie Bank, Microsoft, McDonald's, Netflix, New York Times, Monzo, Outbrain, Pearson Education, Sky, Spotify, Target, Uber, Walmart, Yelp, and thousands of other companies that have large, active data sets. Cassandra is in use by 40% of the Fortune 100.