OpenAI has started leasing Google's tensor processing units (TPUs) to power its products like ChatGPT, marking its first significant use of non-NVIDIA chips. This move indicates a strategic shift to reduce dependence on NVIDIA chips and diversify its computing resources.
OpenAI's Previous Reliance on NVIDIA GPUs
As a leader in artificial intelligence, OpenAI's demand for computing power has surged. For example, training its GPT-3 model, with 175 billion parameters, required computational resources equivalent to 300,000 CPUs running continuously for a year. In 2024, training multimodal models like Google's Gemini and Baidu's Ernie 4.0 demanded 5 to 8 times more computing power than their predecessors, with single-model training costs exceeding $10 million.
Until recently, NVIDIA GPUs were OpenAI's primary computing resource for model training and inference, accessed through partnerships with Microsoft and Oracle. NVIDIA's GPUs dominate AI development due to their robust performance. However, these data center-grade GPUs face challenges under high workloads, including a lifespan of 1 to 3 years, accelerated aging, high power consumption, and increasing failure rates over time. As AI model complexity grows, OpenAI is exploring more cost-effective and sustainable computing solutions.
Google's AI Chips Match NVIDIA B200 Performance
OpenAI's adoption of Google's TPUs is a significant step in its computing strategy. Google's seventh-generation TPU, Ironwood, unveiled at its annual cloud conference, is designed for AI inference and boasts exceptional performance. Compared to Google's first TPU from 2018, Ironwood offers 3,600 times better inference performance and 29 times higher efficiency, rivaling NVIDIA's B200 chip and surpassing it in some areas.
Ironwood excels in several key metrics. Its power efficiency is twice that of the sixth-generation TPU, Trillium, and nearly 30 times higher than the first Cloud TPU. Google's liquid cooling and optimized chip design enable Ironwood to sustain double the performance of standard air-cooled systems under heavy AI workloads.
Ironwood features 192GB of high-bandwidth memory (HBM), six times that of Trillium, allowing it to handle larger models and datasets with reduced data transfer needs. Its HBM bandwidth of 7.2Tbps, 4.5 times higher than Trillium, ensures rapid data access critical for memory-intensive AI tasks. Chip interconnect (ICI) bandwidth has also improved to 1.2Tbps, 1.5 times that of Trillium, facilitating faster communication for distributed training and inference.
For Google Cloud customers, Ironwood is available in configurations of 256 or 9,216 chips. A single chip delivers a peak performance of 4,614 TFLOPs, while a 9,216-chip pod achieves 42.5 exaflops, over 24 times the computing power of the world's largest supercomputer, El Capitan.
Implications for OpenAI and the AI Chip Market
OpenAI's use of Google TPUs enhances its flexibility and autonomy in computing resources, reducing reliance on a single chip supplier and data center. This shift helps lower computing costs and supports business expansion. For the AI chip market, Google's TPUs, with their strong performance and cost benefits, challenge NVIDIA's dominance, fostering competition and driving innovation in AI chip technology.