Understanding IEEE P802.3dj Ethernet Physical Layer
Analysis of IEEE 802.3dj and 212 Gb/s per-lane PHY: electrical validation, jitter measurement methodology, and implications for hyperscale data center interconnects.
Analysis of IEEE 802.3dj and 212 Gb/s per-lane PHY: electrical validation, jitter measurement methodology, and implications for hyperscale data center interconnects.
Technical guide to deploying PP-OCRv5 with Intel OpenVINO on a modular mini-PC: export Paddle models to ONNX, run CPU inference, and enable hardware-accelerated OCR.
Analysis of semiconductor advances enabling AI scale: 3D integration, CoWoS/HBM packaging, silicon photonics and energy-efficient trends toward trillion-transistor GPUs.
Overview of AIGC and ChatGPT: technologies, industry chain, applications in text/image/video, e-commerce impact, and prompt engineering best practices.
Technical overview of large-model fine-tuning and PEFT approaches, covering prompt/prefix tuning, P-tuning v2, AdaLoRA, adapter/LoRA methods and standard training workflow.
Technical overview of AI server interconnects and components: DGX H100 architecture, PCIe switches and Retimers, and DDR5 memory interface chip trends.
A concise technical review of AI history covering 10 pivotal milestones—from Dartmouth and perceptron to deep learning breakthroughs and the rise of large models.
Explains Fourier transform fundamentals, FFT use in signal processing and machine learning, and Python time-series examples for frequency-domain feature extraction.
Overview of Mixture-of-Experts (MoE) transformers: sparse routing with gating networks and experts, training and inference trade-offs, and recent Mistral-8x7B-MoE.
Overview of Synopsys VSO.ai integration into VCS and its AI-driven verification methods to accelerate coverage convergence, infer coverage, and reduce redundant regressions.
GEAR: hybrid KV cache compression combining 4-bit quantization, low-rank residual approximation, and sparse corrections to cut peak memory and boost inference throughput
Dynamic Memory Compression (DMC) compresses the Transformer KV cache online during autoregressive inference, improving throughput and enabling longer context windows.
Network requirements for large-model GPU training: RDMA-based bandwidth, ultra-low latency, stability, and automated deployment for scalable multi-GPU clusters.
MegaScale system for large-scale LLM training beyond 10,000 GPUs, detailing algorithm-system co-design, communication and network tuning, MFU improvements, and fault-tolerant recovery.
Overview of the AI-RAN Alliance formed at MWC 2024, its goals to integrate AI into radio access networks for 5G/6G, edge AI deployment, and contrast with OpenRAN.
Guide to machine learning visualization techniques covering model structure, performance plots (ROC, confusion matrix), feature importance, and practical analysis.
Guide to converting and deploying the DeepSeek LLM on Rockchip RK3588 using RKLLM-Toolkit: environment setup, cross-compilation, model conversion and board deployment.
AI super-resolution and upscaling: GPU and transfer-learning advances, training-data limits, and applications in satellite, medical, gaming, and video-conferencing.
MegaScale system design and deployment for efficient, stable LLM training on 10,000+ GPUs: algorithm, communication, network tuning, fault tolerance, MFU gains.
Analysis of ML hardware trends across GPUs and accelerators, quantifying compute performance, interconnects, cost-performance, and energy efficiency.