NVIDIA expands AWS AI infrastructure with new GPU instances

Fri, 26th Jun 2026

NVIDIA has expanded its AI infrastructure work with Amazon Web Services across Amazon EC2 and Amazon OpenSearch. The update adds new GPU-backed cloud instances, makes GPU-accelerated vector indexing the default for vector search in OpenSearch Serverless and marks AWS as meeting NVIDIA's GB300 training benchmark.

The changes focus on three parts of the AWS stack: compute for AI and graphics workloads, retrieval infrastructure for vector databases, and benchmarked cloud infrastructure for large training jobs. They are intended to help customers run AI systems at larger scale with less operational complexity.

New instances

At the compute layer, Amazon EC2 G7 instances now use NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs. The instances are aimed at AI inference, graphics, spatial computing and GPU-accelerated data analytics workloads.

According to the companies, G7 instances deliver up to 4.6 times the AI inference performance of G6 instances and up to 2.1 times the graphics performance. They also offer faster GPU-accelerated data analytics on Amazon EMR through the cuDF library for Apache Spark workloads.

Hardware options support up to eight GPUs, 256GB of total GPU memory, 700 Gbps of EFA-enabled networking and up to 7.6TB of local NVMe SSD storage. AWS is offering one-, two-, four- and eight-GPU configurations, with bare metal due later.

The range is intended to help customers choose systems more closely matched to the size of their workloads. The same instance family is positioned for lower-latency inference, graphics-heavy applications such as rendering and video workflows, and data-intensive tasks including analytics pipelines and vector database jobs.

Access to the new instances is available through AWS Deep Learning Amazon Machine Images, Amazon Deep Learning Containers, Amazon EMR, Amazon EKS, Amazon ECS and graphics AMIs. Support in Amazon SageMaker AI is due later.

Vector search

A second part of the announcement affects retrieval systems used in generative AI applications. Amazon OpenSearch Serverless now uses GPU-accelerated vector indexing with NVIDIA cuVS as the default compute choice for all vector collections.

The change targets workloads such as retrieval-augmented generation, semantic search, recommendation systems and agentic AI applications. By making GPU-based vector indexing the default rather than a specialist configuration, AWS is moving a key part of AI retrieval infrastructure into its standard managed service.

According to NVIDIA, the result is vector indexing that is up to 10 times faster and costs a quarter as much as CPU-only builds. It also said the setup makes it practical to build vector databases at billion-scale in under an hour.

For OpenSearch Serverless users, the change means the retrieval layer can scale without direct infrastructure management. That matters for teams trying to reduce the engineering work tied to AI systems with variable or idle demand.

Training benchmark

The third element is AWS achieving NVIDIA Exemplar Cloud status for NVIDIA GB300 training workloads. NVIDIA uses the program to assess whether a cloud provider meets performance thresholds against its reference architecture.

The designation is designed to give developers and corporate buyers a benchmark when comparing infrastructure for large AI training jobs. In practice, it signals that AWS has met a set of performance standards defined by NVIDIA for GB300-based training environments.

The status followed co-engineering work between AWS and NVIDIA teams. NVIDIA said the benchmark should help customers assess consistency and cost when selecting cloud infrastructure for large-scale model training.

Taken together, the changes show how competition in cloud AI is moving beyond raw access to GPUs and toward the way cloud providers package compute, storage, networking and retrieval tools into managed services. For businesses building AI systems, the challenge is no longer just securing processors, but turning them into deployable systems for inference, search and training without adding layers of operational work.

On AWS, the latest NVIDIA additions span that full stack. They cover the hardware used to run inference and analytics, the indexing systems used to retrieve data for AI applications, and the benchmarked infrastructure used to train large models.

The cuVS move is particularly notable because vector search has become a central component in many production AI systems. Retrieval-augmented generation and recommendation engines depend on indexing and searching large stores of embeddings quickly, which has often required careful tuning and infrastructure choices that many organizations would prefer to avoid.

By shifting GPU-accelerated indexing into the default configuration for OpenSearch Serverless, AWS is reducing one of those choices for customers. At the same time, the G7 rollout gives AWS another GPU instance family aimed at customers that need inference and graphics performance without managing on-premises GPU systems.

AWS now holds NVIDIA Exemplar Cloud status on GB300 for training workloads, while OpenSearch Serverless uses NVIDIA cuVS as its default for vector collections and EC2 G7 introduces RTX PRO 4500 Blackwell Server Edition GPUs into the company's public cloud estate.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google