Multimodal scene recognition using semantic segmentation and deep learning integration

Aysha Naseer, Mohammed Alnusayri, Haifa F. Alhasson, Mohammed Alatiyyah, Dina Abdulaziz AlHammadi, Ahmad Jalal, Jeongmin Park

Research output: Contribution to journalArticlepeer-review

Abstract

Semantic modeling and recognition of indoor scenes present a significant challenge due to the complex composition of generic scenes, which contain a variety of features including themes and objects, makes semantic modeling and indoor scene recognition difficult. The gap between high-level scene interpretation and low-level visual features increases the complexity of scene recognition. In order to overcome these obstacles, this study presents a novel multimodal deep learning technique that enhances scene recognition accuracy and robustness by combining depth information with conventional red-green-blue (RGB) image data. Convolutional neural networks (CNNs) and spatial pyramid pooling (SPP) are used for analysis after a depth-aware segmentation methodology is used to identify several objects in an image. This allows for more precise image classification. The effectiveness of this method is demonstrated by experimental findings, which show 91.73% accuracy on the RGB-D scene dataset and 90.53% accuracy on the NYU Depth v2 dataset. These results demonstrate how the multimodal approach can improve scene detection and classification, with potential uses in fields including robotics, sports analysis, and security systems.

Original languageEnglish
Article numbere2858
JournalPeerJ Computer Science
Volume11
DOIs
StatePublished - 2025

Keywords

  • Artificial intelligence
  • Features optimization
  • Image analysis
  • Machine learning
  • Scene modeling
  • Spatial pyramid pooling
  • Voxel grid representation

Fingerprint

Dive into the research topics of 'Multimodal scene recognition using semantic segmentation and deep learning integration'. Together they form a unique fingerprint.

Cite this