Optimizing Deep Learning Models
Practical deep learning tuning guide covering learning rate selection, batch size effects, weight initialization, optimizers, regularization, data augmentation and training tips.
Practical deep learning tuning guide covering learning rate selection, batch size effects, weight initialization, optimizers, regularization, data augmentation and training tips.
Explore thermal and EMI challenges in AI chip design, focusing on heat dissipation and noise suppression for high-performance computing.
Overview of AI memory demands and new technologies: capacity, bandwidth, latency, power, reliability, and adoption challenges for future AI systems.
NVIDIA AI Workbench simplifies AI development with tools for RAG apps, GPU setups, and model customization across systems.
Explains why ML models can't reach zero error, detailing irreducible error, bias-variance tradeoff, model complexity, overfitting, and MSE for prediction accuracy.
PrefixRL uses deep reinforcement learning to optimize parallel prefix circuits, producing smaller, lower-latency adders and mapping Pareto trade-offs between area and latency.
Overview of convolutional neural networks: principles like padding, stride, pooling and filters, edge detection fundamentals, architecture patterns and a Keras MNIST implementation.
OpenAI's study unveils an instruction hierarchy to boost LLM security against attacks like prompt injections, enhancing model safety.
Explore deep learning for defect detection in industries, offering accurate solutions for quality control with advanced frameworks.
Technical overview and setup of the Raspberry Pi AI kit with Hailo 8L NPU, covering M.2 HAT+ installation, thermal management, and software setup for Pi 5.
Summary of terahertz sub-THz testing for 6G: spectrum use, RF front-end modules, signal generation, and channel measurement tools for terahertz communications research.
Overview of the MediaTek MT8391 (Genio 720) edge AI platform: 6 nm octa-core CPU, 10 TOPS NPU, dual ISPs, LPDDR5 support and multi-interface connectivity for AIoT devices.
Technical overview of why GPUs outperform CPUs for deep learning training: neural networks' matrix operations, parallelism, GPU architecture and GPGPU benefits.
Overview of FPGA applications in machine learning: accelerating neural network inference, hardware quantization, algorithm optimization, and efficiency for edge AI deployments.
Analysis of deep learning in computer vision: strengths, limits, dataset biases, comparison with classical vision methods, interpretability and risks in safety-critical applications.
Technical guide to installing RKLLM-Toolkit and converting/deploying the DeepSeek-R1 LLM on EASY-EAI-Orin-Nano (RK3576), covering env setup, conversion, and on-device inference.
Survey of hyperparameter optimization methods - grid/random search, Bayesian optimization, simulated annealing, genetic algorithms and successive halving for ML tuning.
Analysis of an IoT smart classroom solution - hardware connectivity, data interoperability and scenario intelligence for unified, energy-efficient device management.
Summary of TensorNODE, a TensorWave bare-metal AI cloud using AMD MI300X GPUs and a PCIe Gen5 memory fabric to enable petabyte-scale GPU memory pools.
Technical guide to scaling LLM training: analyzes memory usage, gradient accumulation, ZeRO, and tensor/data parallelism to improve throughput and GPU utilization.