Understanding the Transformer Neural Network Model
Overview of the Transformer architecture: self-attention, multi-head attention, positional encoding, encoder-decoder stacks, and implications for distributed model training.
Overview of the Transformer architecture: self-attention, multi-head attention, positional encoding, encoder-decoder stacks, and implications for distributed model training.
Dr. Mao Yang examines how computer systems must evolve - ultra-large-scale computing, cloud redesign, and distributed systems - to support large models and next-gen AI
Technical overview of AI 2.0: how generative AI drives demand for large-scale compute, data pipelines, and Model-as-a-Service (MaaS) to enable industry deployments.
Survey of LLM inference stacks covering throughput, latency and cost; explains hardware constraints, KV cache, quantization, paged/grouped attention, and practical optimizations.
Technical overview of methods to improve reward model robustness for RLHF: quantify preference strength, flip/soften labels, apply adaptive margins, contrastive learning and MetaRM
Guide to machine learning visualization techniques covering model structure, performance plots (ROC, confusion matrix), feature importance, and practical analysis.
Survey of deep metric learning: formulations, sample selection and metric loss functions (contrastive, triplet, N?pair), architectures and applications in vision, audio, and text.
Overview of 12 deep learning interview questions highlighting core concepts and training considerations, including batch normalization and practical model evaluation.
Analysis of recent research evaluating whether LLMs can plan or reason, showing limited autonomous planning and that apparent emergent capabilities stem from in-context learning.
F-Learning: a parameter-based fine-tuning paradigm that subtracts knowledge parameter deltas to forget outdated facts, then fine-tunes (LoRA or full-model) to update LLM knowledge.
Synopsys.ai and Microsoft extend Copilot into EDA with Azure OpenAI, adding GenAI features for RTL generation, formal verification assertions, and validated design workflows.
Tsinghua study analyzes implicit harmful content in LLMs, showing how SFT and RLHF can induce subtle abusive outputs that evade harmful-content detectors.
PrefixRL uses deep reinforcement learning to optimize parallel prefix circuits, producing smaller, lower-latency adders and mapping Pareto trade-offs between area and latency.
DAICL: a retrieval-augmented in-context learning framework for unsupervised domain adaptation that retrieves target examples and jointly optimizes task and LM.
Simplifying Transformer blocks by removing skip connections, projections and normalization; introduces Simplified Attention to reduce parameters and raise training throughput.
Mamba: a selective SSM state-space model that generalizes S4 to enable linear long-context scaling, million-token sequences, and improved language modeling.
OMGEval presents an open-source multilingual open-ended QA benchmark (804 Chinese prompts) localized from AlpacaEval, using Text-Davinci-003 baseline and GPT-4 evaluation.
Technical overview of AI server interconnects and components: DGX H100 architecture, PCIe switches and Retimers, and DDR5 memory interface chip trends.
AI overview with latent space representations and practical applications in manufacturing and semiconductor manufacturing, including predictive maintenance and quality assurance.
Technical overview of neural networks and GPT: how images and text are vectorized, forward/backpropagation, gradient descent training, activations, and prediction.