AWS Trainium chips will be the preferred processors for training Mosaic AI models on the Databricks platform, the company announced today. The deal represents a blow to Nvidia’s continued AI dominance with its high-end GPU.
Processing capacity has emerged as one of the bottlenecks in being able to scale AI. Large language models (LLMs) like GPT-4 require enormous compute capacity, and so far Nvidia has owned the lion’s share of that market with its high-end A100 and H100 GPUs.
The hyperscalers have sought to grab a piece of this rapidly growing market. Google Cloud offers its Tensor Programming Unit (TPUs) chips to customer AI workloads, while AWS offers its Trainium and Inferentia chips for training and inference workloads, respectively.
AWS has been building its own custom processors since it acquired Annapurna Labs back in 2015 for about $350 million. Its first chip, Graviton, was an ARM-based design that easily slid into its X86-based EC2 infrastructure thanks to AWS’s innovative Nitro framework, and it followed that up with the Inferentia ASIC in 2019 and Trainium in late 2020.
Since the generative AI revolution began in late 2022, all eyes have been on the capability to train and run LLMs. And that is the focus of today’s announcement between Databricks and AWS, which focuses on getting Databricks customers to train their Mosaic AI models
AWS will provide Traininum chips to Databricks Mosaic AI customers for a variety of AI workloads, including pretraining, fine-tuning, augmenting, and serving LLMs on their private data, the companies announced.
Trainium2, which AWS unveiled in November 2023, are purpose-built for high performance training of foundation models and LLMs that are composed of trillions of parameters. The chip was designed to deliver up to 4x faster training performance and 3x more memory capacity compared to first generation Trainium chips, AWS says, while improving energy efficiency (performance/watt) up to 2x.
“By using AWS Trainium to power Mosaic AI, Databricks will make it cost-effective for customers to build and deploy generative AI applications on top of their analytics workflows, regardless of their industry or use case,” Matt Garman, the new CEO of AWS, said in a press release.
Ali Ghodsi, the co-founder and CEO at Databricks, said the expanded partnership will help customers use their data to create a competitive advantage.
“Strengthening our collaboration with AWS allows us to provide customers with unmatched scale and price-performance so they can bring their own generative AI applications to market more rapidly,” he said in a press release.
Databricks has more than 10,000 customers on its data platform, which runs on AWS, Google Cloud, and Microsoft Azure. In addition to providing data management and analytics tools, Databricks provides access to pre-trained AI models through Mosaic, the “AI factory” that it acquired in 2023 for $1.3 billion.
While there is nothing exclusive about Databricks’ and AWS’s relationship, the two companies are getting closer with today’s announcement. In addition to the Trainium hookup, the two compaines are expanding their partnership in other ways, including:
Work together to optimize and improve the security of AI workloads running on custom models on Trainium;
Migrate and modernize on-prem data lakes into Databricks and AWS;
Develop joint solutions in specific industries, such as financial services and media and entertainment;
Create new integrations for Databricks on AWS to improve onboarding and utilize AWS’ serverless offerings;
Develop go-to-market programs for GenAI solutions with system integrators;
Expanding co-marketing programs.
Related Items:
AWS Teases 65 Exaflop ‘Ultra-Cluster’ with Nvidia, Launches New Chips
Databricks Goes Serverless, Simplifying its Data Platform
AWS Leans on Custom Silicon for Processing Advantage
The post Databricks and AWS in AI Chip Hookup appeared first on BigDATAwire.