TY - JOUR
T1 - Deep Learning-Based 3D Multi-Object Tracking Using Multimodal Fusion in Smart Cities
AU - Li, Hui
AU - Liu, Xiang
AU - Jia, Hong
AU - Ahanger, Tariq Ahamed
AU - Xu, Lingwei
AU - Alzamil, Zamil
AU - Li, Xingwang
N1 - Publisher Copyright:
© (2024), (Korea Information Processing Society). All rights reserved.
PY - 2024
Y1 - 2024
N2 - The intelligent processing of visual perception information is one of the core technologies of smart cities. Deep learning-based 3D multi-object tracking is important in improving the intelligence and safety of robots in smart cities. However, 3D multi-object tracking still faces many challenges due to the complexity of the environment and uncertainty of the object. In this paper, we make the most of the multimodal information of image and point cloud and propose a multimodal adaptive feature gating fusion module to improve the feature fusion effect. In the object association stage, we designed an orientation-position-aware affinity matrix (EO-IoU) by using Euclidean distance, orientation similarity, and intersection over union, which is more suitable for the association to solve the problem of association failure when there is little or no overlap between the detection box and the prediction box. At the same time, we adopt a more robust two-stage data association method to solve the trajectory fragmentation and identity switching caused by discarding low-scoring detection boxes. The results of extensive experiments on the KITTI and NuScenes benchmark datasets demonstrate that our method outperforms existing state-of-the-art methods with better robustness and accuracy.
AB - The intelligent processing of visual perception information is one of the core technologies of smart cities. Deep learning-based 3D multi-object tracking is important in improving the intelligence and safety of robots in smart cities. However, 3D multi-object tracking still faces many challenges due to the complexity of the environment and uncertainty of the object. In this paper, we make the most of the multimodal information of image and point cloud and propose a multimodal adaptive feature gating fusion module to improve the feature fusion effect. In the object association stage, we designed an orientation-position-aware affinity matrix (EO-IoU) by using Euclidean distance, orientation similarity, and intersection over union, which is more suitable for the association to solve the problem of association failure when there is little or no overlap between the detection box and the prediction box. At the same time, we adopt a more robust two-stage data association method to solve the trajectory fragmentation and identity switching caused by discarding low-scoring detection boxes. The results of extensive experiments on the KITTI and NuScenes benchmark datasets demonstrate that our method outperforms existing state-of-the-art methods with better robustness and accuracy.
KW - 3D Multi-Object Tracking
KW - Data Association
KW - Multimodal Feature Fusion
KW - Position Affinity Matrix
KW - Smart Cities
KW - Visual Perception
UR - https://www.scopus.com/pages/publications/85202302591
U2 - 10.22967/HCIS.2024.14.047
DO - 10.22967/HCIS.2024.14.047
M3 - Article
AN - SCOPUS:85202302591
SN - 2192-1962
VL - 14
JO - Human-centric Computing and Information Sciences
JF - Human-centric Computing and Information Sciences
M1 - 47
ER -