Breaking Compute Bottlenecks for Large Language Models
Analysis of large-model scaling: how parameter count and training tokens drive compute requirements, showing compute grows ~quadratically with model size.
Analysis of large-model scaling: how parameter count and training tokens drive compute requirements, showing compute grows ~quadratically with model size.
Microsoft’s government CTO details public sector AI adoption, cloud migration, ethical AI governance, and cybersecurity practices to enable secure, data-driven government services.
Detailed review of a 30k-line NumPy machine learning repository implementing 30+ models with explicit gradient computations, utilities, and test examples.
Technical overview of RNNs and LSTM architectures, how they model sequential data, application areas like signal and text processing, and MATLAB-based implementation.
Review of neural network quantization and numeric formats, covering floating vs integer, block floating point, logarithmic systems, and inference vs training trade-offs.
Survey of deep learning approaches for radar target detection, comparing two-stage and single-stage detectors (Faster R-CNN, YOLOv5), preprocessing, and deployment results.
Overview of artificial intelligence and its relationship to machine learning and deep learning, covering AI categories, ML workflow, and common deep architectures.
Overview of ASR (speech-to-text): pipeline, acoustic and language models, CTC training, decoding strategies, and GPU acceleration using NVIDIA NeMo and toolkits.
Technical overview of AI cybersecurity risks: automated attacks, deepfakes, adversarial examples, privacy and data security, ethics, legal challenges, and system fragility.
Overview of Mixture-of-Experts (MoE) transformers: sparse routing with gating networks and experts, training and inference trade-offs, and recent Mistral-8x7B-MoE.
Technical overview of OpenAI's Sora and its video generation capabilities, core machine learning foundations, and potential impacts on production workflows and society.
Overview of graph neural networks, graph basics and NetworkX graph creation, GNN types and challenges, plus a PyTorch spectral GNN example for node classification.
Overview of Sora, OpenAI's video diffusion model using a spatiotemporal autoencoder and DiT Transformer to generate high-resolution, minute-long videos.
Teardown analysis of NVIDIA DGX A100 AI server PCBs: PCB types, area and per-system value breakdown for GPU board assembly, CPU motherboard, substrates and accessories.
Practical deep learning tuning guide covering learning rate selection, batch size effects, weight initialization, optimizers, regularization, data augmentation and training tips.
Overview of the Transformer architecture: self-attention, multi-head attention, positional encoding, encoder-decoder stacks, and implications for distributed model training.
Dr. Mao Yang examines how computer systems must evolve - ultra-large-scale computing, cloud redesign, and distributed systems - to support large models and next-gen AI
Technical overview of AI 2.0: how generative AI drives demand for large-scale compute, data pipelines, and Model-as-a-Service (MaaS) to enable industry deployments.
Survey of LLM inference stacks covering throughput, latency and cost; explains hardware constraints, KV cache, quantization, paged/grouped attention, and practical optimizations.
Technical overview of methods to improve reward model robustness for RLHF: quantify preference strength, flip/soften labels, apply adaptive margins, contrastive learning and MetaRM