Why GPUs Are Essential for AI Training
Technical overview of why GPUs outperform CPUs for deep learning training: neural networks' matrix operations, parallelism, GPU architecture and GPGPU benefits.
Technical overview of why GPUs outperform CPUs for deep learning training: neural networks' matrix operations, parallelism, GPU architecture and GPGPU benefits.
Review of lightweight deep learning for resource-constrained devices: TinyML, quantization, architectures and deployment strategies for efficient inference.
Overview of deep learning fundamentals and a TensorFlow 2 handwritten digit recognition demo, covering neurons, activation/loss functions, CNNs, training, and prediction.
Retrieval-augmented generation robustness analysis: semantically related but answer-irrelevant retrieved fragments and higher fragment counts degrade LLM accuracy and confidence.
Survey of machine learning models grouped into six categories: neural networks, symbolic, decision trees, probabilistic, nearest neighbor, and ensemble methods.
Scale scaling (S2): run pretrained frozen vision models at multiple image resolutions to produce multi-scale representations, matching larger models on visual tasks.
System-level overview of LLM inference optimization, detailing techniques and tradeoffs to improve throughput for Transformer-based large language models.
Technical overview of AI servers, GPU/CPU architectures, training vs inference, compute demand and market estimates, including H100/A100 performance and China server market
System-level evaluation of compute-in-memory (CiM) for accelerating GEMM in ML inference: compares analog vs digital CiM, cache-level integration, and optimal dataflows.
Analysis of semiconductor advances enabling AI scale: 3D integration, CoWoS/HBM packaging, silicon photonics and energy-efficient trends toward trillion-transistor GPUs.
Network requirements for large-model GPU training: RDMA-based bandwidth, ultra-low latency, stability, and automated deployment for scalable multi-GPU clusters.
Overview of deep learning-based polarimetric imaging methods for de-scattering and denoising in complex environments, with model embedding and future directions.
Concise technical overview of GPU concepts, architecture and GPU vs CPU differences, parallel processing and performance factors for graphics and AI inference.
MegaScale system design and deployment for efficient, stable LLM training on 10,000+ GPUs: algorithm, communication, network tuning, fault tolerance, MFU gains.
Analysis of deep learning in computer vision: strengths, limits, dataset biases, comparison with classical vision methods, interpretability and risks in safety-critical applications.
Explains how generative adversarial networks (GANs) work and shows step-by-step training of a PyTorch GAN on MNIST, including network design and training loop.
Study summary: Azure Kinect computer vision and machine learning for gait recognition—classifying normal, pelvic obliquity, and knee hyperextension; SVM/KNN top accuracy.
Dynamic Memory Compression (DMC) compresses the Transformer KV cache online during autoregressive inference, improving throughput and enabling longer context windows.
RZ/V2L MPU with DRP-AI overview and pretrained plant leaf disease classification model; runtime modes, hardware/software requirements, and inference performance.
Explains Fourier transform fundamentals, FFT use in signal processing and machine learning, and Python time-series examples for frequency-domain feature extraction.