Optimizing Deep Learning Models
Practical deep learning tuning guide covering learning rate selection, batch size effects, weight initialization, optimizers, regularization, data augmentation and training tips.
Practical deep learning tuning guide covering learning rate selection, batch size effects, weight initialization, optimizers, regularization, data augmentation and training tips.
Explore thermal and EMI challenges in AI chip design, focusing on heat dissipation and noise suppression for high-performance computing.
PrefixRL uses deep reinforcement learning to optimize parallel prefix circuits, producing smaller, lower-latency adders and mapping Pareto trade-offs between area and latency.
Explains why ML models can't reach zero error, detailing irreducible error, bias-variance tradeoff, model complexity, overfitting, and MSE for prediction accuracy.
NVIDIA AI Workbench simplifies AI development with tools for RAG apps, GPU setups, and model customization across systems.
Overview of AI memory demands and new technologies: capacity, bandwidth, latency, power, reliability, and adoption challenges for future AI systems.
Technical overview and setup of the Raspberry Pi AI kit with Hailo 8L NPU, covering M.2 HAT+ installation, thermal management, and software setup for Pi 5.
Overview of convolutional neural networks: principles like padding, stride, pooling and filters, edge detection fundamentals, architecture patterns and a Keras MNIST implementation.
OpenAI's study unveils an instruction hierarchy to boost LLM security against attacks like prompt injections, enhancing model safety.
Overview of the MediaTek MT8391 (Genio 720) edge AI platform: 6 nm octa-core CPU, 10 TOPS NPU, dual ISPs, LPDDR5 support and multi-interface connectivity for AIoT devices.
Overview of FPGA applications in machine learning: accelerating neural network inference, hardware quantization, algorithm optimization, and efficiency for edge AI deployments.
Summary of terahertz sub-THz testing for 6G: spectrum use, RF front-end modules, signal generation, and channel measurement tools for terahertz communications research.
Explore deep learning for defect detection in industries, offering accurate solutions for quality control with advanced frameworks.
Technical overview of why GPUs outperform CPUs for deep learning training: neural networks' matrix operations, parallelism, GPU architecture and GPGPU benefits.
Analysis of deep learning in computer vision: strengths, limits, dataset biases, comparison with classical vision methods, interpretability and risks in safety-critical applications.
Analysis of an IoT smart classroom solution - hardware connectivity, data interoperability and scenario intelligence for unified, energy-efficient device management.
Survey of hyperparameter optimization methods - grid/random search, Bayesian optimization, simulated annealing, genetic algorithms and successive halving for ML tuning.
Technical guide to installing RKLLM-Toolkit and converting/deploying the DeepSeek-R1 LLM on EASY-EAI-Orin-Nano (RK3576), covering env setup, conversion, and on-device inference.
Summary of TensorNODE, a TensorWave bare-metal AI cloud using AMD MI300X GPUs and a PCIe Gen5 memory fabric to enable petabyte-scale GPU memory pools.
Technical guide to scaling LLM training: analyzes memory usage, gradient accumulation, ZeRO, and tensor/data parallelism to improve throughput and GPU utilization.