DeepPointMap: LiDAR SLAM Framework Using Deep Learning

Introduction

SLAM is a fundamental problem in robotics and autonomous driving, aiming to reconstruct a map of the environment while estimating the vehicle or robot pose within it. LiDAR point cloud data are widely used to capture complex 3D scene structure. Existing SLAM approaches either rely on dense point clouds for high-precision localization or use compact, general-purpose descriptors to reduce map size. These two goals can conflict. This work proposes a unified architecture, DeepPointMap (DPM), that addresses both objectives.

Problem Statement

This work introduces DeepPointMap, a deep learning framework for LiDAR SLAM that addresses localization and map representation. Traditional LiDAR SLAM methods tend to trade off between dense point-based, high-precision localization and compact descriptor-based map compression. DeepPointMap provides a unified solution that supports memory-efficient map representations while enabling accurate multi-scale localization tasks such as odometry and loop closure detection.

Contributions

The main contributions are:

Unified neural descriptors: The DPM framework includes a DPM encoder and a DPM decoder. The DPM encoder extracts highly representative and sparse neural descriptors from point clouds, enabling efficient environment encoding. Compared to handcrafted features, these neural descriptors reduce map memory while retaining localization and reconstruction accuracy.
Multi-scale matching and registration: The DPM decoder performs multi-scale matching and registration based on the neural descriptors, covering tasks such as odometry and loop closure detection. Unlike other descriptor-based methods, the DPM decoder handles multiple SLAM subtasks within a single framework, balancing localization accuracy, memory efficiency, map fidelity, and real-time processing.
Multi-agent collaborative SLAM: The DPM framework is extended to multi-agent collaborative SLAM. Each agent maintains a local SLAM system and performs local odometry and loop closure detection. By merging and optimizing observations, the system produces globally consistent trajectories and map reconstructions, which is important for multi-agent systems with limited communication bandwidth.

Method

The framework comprises two neural networks: a DPM encoder and a DPM decoder. The DPM encoder extracts highly representative and sparse neural descriptors from point clouds, enabling memory-efficient map representation and accurate multi-scale localization. The DPM decoder performs multi-scale matching and registration using these neural descriptors. Unlike other neural-descriptor methods, DPM descriptors are applied uniformly across multiple SLAM subtasks, achieving strong localization accuracy, memory efficiency, map fidelity, and real-time capability.

Experiments

Experimental setup: Four driving-related datasets were used for evaluation: SemanticKITTI, KITTI-360, MulRan, and KITTI-Carla. During training, the AdamW optimizer was used with an initial learning rate, weight decay, and a cosine learning rate scheduler. The network was trained on six RTX 3090 GPUs for 12 epochs.

Potential Future Directions

Possible follow-up research directions include:

Network architecture improvements
Multi-modal fusion
Robustness enhancements
Real-time performance optimization
Multi-agent collaborative SLAM

Conclusion

This work presents DeepPointMap (DPM), a deep learning LiDAR SLAM framework consisting of a DPM encoder and a DPM decoder. DPM descriptors can be used uniformly across multiple SLAM subtasks, delivering competitive localization accuracy, memory efficiency, map fidelity, and real-time performance. DeepPointMap achieves improved results in localization accuracy, map reconstruction quality, and memory consumption, and it demonstrates flexibility and potential in multi-agent collaborative SLAM.