How Computer Systems Address Challenges of Large AI Models

Dr. Mao Yang, Deputy Managing Director at Microsoft Research Asia, examines how computer systems are evolving to support larger, more distributed, and more intelligent systems driven by large AI models. In the short term, large models are transforming industries and programming paradigms; in the long term, they are driving an evolution of computing architectures.

“The rapid emergence of large models and growing demand for next-generation AI are accelerating the transformation of traditional computer systems. At the same time, modern AI technologies built on large-scale high-performance systems create new research opportunities for computer systems. Innovating supercomputers, reshaping cloud platforms, and redesigning distributed systems are three essential directions for systems to evolve.”

Computer systems research blends both classical and modern elements. Its origins predate contemporary hardware and software trends, yet modern technologies such as big data and cloud computing continue to drive its evolution. Traditional systems research areas—distributed systems theory and practice, compiler optimization, and heterogeneous computing—are highly relevant in the era of large models, and large GPU clusters as high-performance computing systems have enabled major advances in AI.

However, the rapid pace of AI development exposes new challenges for traditional computer systems: current GPU clusters face limitations in scale and efficiency for training and serving next-generation models, and existing cloud and mobile platforms must evolve from serving conventional compute tasks toward serving intelligent applications.

Modern AI technologies built on large-scale high-performance systems therefore present substantial research opportunities for computer systems. System innovation should focus on three directions:

Innovating ultra-large-scale computing systems to support future AI development;
Reshaping cloud computing as a core IT platform;
Designing advanced distributed systems to meet broader distributed intelligence needs.

1. Large-scale, Efficient Systems Are the Foundation for Next-Generation AI

Rich Sutton, one of the founders of reinforcement learning, has noted that maximizing compute is often the most effective path in AI research. Supercomputer systems remain a primary source of compute and a critical foundation for modern AI. Yet when building large GPU clusters on top of supercomputers, reliability, communication efficiency, and overall performance optimization become the main constraints on training large models. We therefore need higher-performance and higher-efficiency infrastructures and systems to drive the next generation of AI.

Over the past five years, research spanning architecture, network communication, compiler optimization, and system software has advanced computer systems and supported the evolution of AI infrastructure. Examples include the Microsoft collective communication library MSCCL, which executes collective communication algorithms across multiple accelerators, and the high-performance MoE (Mixture of Experts) library TuTe, which facilitates development of large-scale deep neural networks. These contributions support efficient training and inference for tasks including large language models.

Supercomputer innovation cannot rely solely on traditional systems approaches; AI itself can be used to innovate and evolve systems. This research direction explores how AI capabilities provide new perspectives on classical system problems, enabling smarter and more efficient performance tuning of complex systems, faster and more accurate diagnostics, and easier deployment and management. AI-assisted systems research can introduce new paradigms across chip design, architectural innovation, compiler optimization, and distributed system design, with AI acting as an intelligent assistant that can take over many routine tasks.

With AI assistance, systems researchers can focus on large-scale system design, abstraction of key modules and interfaces, and overall system evolution. For example, in AI compiler design, tools such as Welder and Grinder emphasize abstractions among model structure, compiler systems, and underlying hardware, while specific optimization search algorithms and implementations can be aided by AI. These new research paradigms will be foundational for building larger, more efficient AI infrastructure.

Four core AI compilation techniques based on a unified tiling abstraction

2. Reshaping Cloud Platforms Around Intelligence

Traditionally, the operating system manages resources and processes, enabling users to interact with computers without knowing low-level details. As disaggregated server architectures centered on GPU, HBM (high-bandwidth memory), and high-speed interconnects replace CPU-centric servers, AI agents and large models are becoming mainstream cloud services, and deep learning algorithms are replacing many traditional core algorithms. Cloud computing therefore needs to be redefined to serve intelligent applications.

Virtualization must be redesigned for disaggregated architectures. Microservices and related cloud modules need to provide efficient and reliable platforms for AI agents and large language models. Data privacy and security must be central design elements in future cloud innovations.

These changes all point toward smarter cloud systems (Cloud + AI). On one hand, large-scale heterogeneous systems in the cloud provide new computing platforms for traditional large systems; on the other hand, deep learning and large models offer novel ideas for algorithmic design and implementation in large-scale systems.

In search systems, for example, innovations based on heterogeneous computing and deep learning—from the Web-scale vector search system SPANN to the Neural Index system MEVI—have substantially improved performance in search and advertising systems and established new paradigms for information retrieval. Similar innovations are occurring in database systems and scientific computing. Cloud platforms not only support AI development but will themselves evolve by incorporating AI techniques, becoming a key component of next-generation AI infrastructure.

3. Distributed Systems as the Foundation for Distributed Intelligence

Roy Pea introduced the concept of distributed cognition to describe intelligence distributed across physical, social, and symbolic systems. This perspective helps us understand interactions between AI systems, society, and the environment.

Today, large models rely on centralized cloud data centers for training and inference. However, intelligence is distributed across environments, and future intelligent computing will span arbitrary distributed settings. Human interaction with the physical world and exchanges based on symbolic systems are manifestations of cognitive activity. Large models should be able to perceive and learn from these activities across various endpoints so that users can access AI capabilities in near real time from any device.

Supporting intelligence in distributed scenarios requires considering cloud, edge, and device-level platforms for AI computation. Beyond traditional model sparsification and compression, it is critical to overcome basic challenges when running large models at the edge, such as latency and reliability. To address these, techniques like PIT and MoFQ for mobile model quantization, sparsification, and runtime optimization have been developed.

Hardware and inference algorithm innovation for edge platforms and devices is also essential. New computation paradigms, such as lookup-table-based approaches, can fundamentally change on-device inference, for example with LUT-NN techniques that improve edge inference efficiency. Collaborative work with machine learning teams seeks learning algorithms that capture intelligence from arbitrary signals. Beyond multimodal models, research is exploring simpler and internally consistent model structures and learning algorithms that can learn from diverse signals. The goal is sparser, more efficient, and scalable models that support self-learning and real-time updates. Innovative distributed systems will be the critical infrastructure for distributed intelligence, enabling more real-time and reliable AI interaction in society.

4. Toward Self-evolving Computer Systems

Future computer systems research will be a continuous process of self-innovation. Systems must evolve to meet AI demands and become more intelligent and capable of self-evolution.

Recent innovations hint at what lies ahead. From infrastructure and cloud platforms to distributed intelligence, many new possibilities remain to be explored in systems research during the AI era. Smarter and more capable tools and assistants are expected to emerge and to provide further opportunities for discovery.

Author

Dr. Mao Yang is Deputy Managing Director at Microsoft Research Asia, leading research in computer systems and networking. He joined Microsoft Research Asia in 2006 and works on distributed systems, search engine systems, and deep learning systems design and implementation. He leads teams conducting research in computer systems, security, networking, heterogeneous computing, edge computing, and system algorithms, with publications at top systems and networking conferences such as OSDI, SOSP, NSDI, SIGCOMM, and ATC. His teams collaborate with cloud platforms and production systems including Azure, Bing, Windows, and SQL Server, as well as several open-source communities. Dr. Yang is also a doctoral advisor at the University of Science and Technology of China and holds a PhD in computer architecture from Peking University and master's and bachelor's degrees from Harbin Institute of Technology.