Deep Learning-Based 3D Multi-Object Tracking Using Multimodal Fusion in Smart Cities

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

The intelligent processing of visual perception information is one of the core technologies of smart cities. Deep learning-based 3D multi-object tracking is important in improving the intelligence and safety of robots in smart cities. However, 3D multi-object tracking still faces many challenges due to the complexity of the environment and uncertainty of the object. In this paper, we make the most of the multimodal information of image and point cloud and propose a multimodal adaptive feature gating fusion module to improve the feature fusion effect. In the object association stage, we designed an orientation-position-aware affinity matrix (EO-IoU) by using Euclidean distance, orientation similarity, and intersection over union, which is more suitable for the association to solve the problem of association failure when there is little or no overlap between the detection box and the prediction box. At the same time, we adopt a more robust two-stage data association method to solve the trajectory fragmentation and identity switching caused by discarding low-scoring detection boxes. The results of extensive experiments on the KITTI and NuScenes benchmark datasets demonstrate that our method outperforms existing state-of-the-art methods with better robustness and accuracy.

Original languageEnglish
Article number47
JournalHuman-centric Computing and Information Sciences
Volume14
DOIs
StatePublished - 2024

Keywords

  • 3D Multi-Object Tracking
  • Data Association
  • Multimodal Feature Fusion
  • Position Affinity Matrix
  • Smart Cities
  • Visual Perception

Fingerprint

Dive into the research topics of 'Deep Learning-Based 3D Multi-Object Tracking Using Multimodal Fusion in Smart Cities'. Together they form a unique fingerprint.

Cite this