Deep learning-based few-shot person re-identification from top-view RGB and depth images

Almustafa Abed; Belhassen Akrout; Ikram Amous

doi:10.1007/s00521-024-10239-6

Deep learning-based few-shot person re-identification from top-view RGB and depth images

Almustafa Abed, Belhassen Akrout, Ikram Amous

Computer Sciences

University of Sfax

Research output: Contribution to journal › Article › peer-review

7 Scopus citations

Abstract

Person re-identification (re-id) attempts to match a person from the images of different time steps. Existing deep learning approaches either use appearance or geometry features for re-id which does not provide the required robustness because of higher intra-class similarity. Existing supervised re-id approaches utilize Convolutional Neural Networks (CNNs) and identity-labeled images to train, where the person images are taken by the sensors from a horizontal view. The horizontal view exposes the privacy of the people because of their facial appearance in the image. Moreover, person re-id includes new unseen people; however, CNN does not have the ability to identify the new unseen people because of a lack of continual learning. Privacy-preserved computer vision-assisted person re-id systems can benefit from visual appearance and geometry features extracted from top-view RGB and depth input. This paper presents the privacy-preserved person top-view re-id few-shot network which uses the appearance and geometry features. The EfficientNet is used for appearance-based features from RGB input, while PointNet is used to extract the geometry features from the point cloud which is made from the RGB-D image registration. Concatenated features from EfficientNet and PointNet are fed to the two-layer Bi-LSTM network for person identification. Finally, the whole network is converted into a few-shot network to achieve continual learning by removing the output layer and joining the similarity measurement unit. This approach is based on CNN and fine-tunes a TVPR/2 dataset acquired by using a top-view arrangement that is publicly available. The experimental results on TVPR/2 and GODPR datasets show that the proposed re-id network outperforms other state-of-the-art networks.

Original language	English
Pages (from-to)	19365-19382
Number of pages	18
Journal	Neural Computing and Applications
Volume	36
Issue number	31
DOIs	https://doi.org/10.1007/s00521-024-10239-6
State	Published - Nov 2024

Keywords

Convolutional neural networks
Intelligent retail stores
Person re-identification (Re-ID)
Top-view configuration

Access to Document

10.1007/s00521-024-10239-6

Cite this

@article{537f1027dc1044fb9b8007bc8d1fe1a1,

title = "Deep learning-based few-shot person re-identification from top-view RGB and depth images",

abstract = "Person re-identification (re-id) attempts to match a person from the images of different time steps. Existing deep learning approaches either use appearance or geometry features for re-id which does not provide the required robustness because of higher intra-class similarity. Existing supervised re-id approaches utilize Convolutional Neural Networks (CNNs) and identity-labeled images to train, where the person images are taken by the sensors from a horizontal view. The horizontal view exposes the privacy of the people because of their facial appearance in the image. Moreover, person re-id includes new unseen people; however, CNN does not have the ability to identify the new unseen people because of a lack of continual learning. Privacy-preserved computer vision-assisted person re-id systems can benefit from visual appearance and geometry features extracted from top-view RGB and depth input. This paper presents the privacy-preserved person top-view re-id few-shot network which uses the appearance and geometry features. The EfficientNet is used for appearance-based features from RGB input, while PointNet is used to extract the geometry features from the point cloud which is made from the RGB-D image registration. Concatenated features from EfficientNet and PointNet are fed to the two-layer Bi-LSTM network for person identification. Finally, the whole network is converted into a few-shot network to achieve continual learning by removing the output layer and joining the similarity measurement unit. This approach is based on CNN and fine-tunes a TVPR/2 dataset acquired by using a top-view arrangement that is publicly available. The experimental results on TVPR/2 and GODPR datasets show that the proposed re-id network outperforms other state-of-the-art networks.",

keywords = "Convolutional neural networks, Intelligent retail stores, Person re-identification (Re-ID), Top-view configuration",

author = "Almustafa Abed and Belhassen Akrout and Ikram Amous",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.",

year = "2024",

month = nov,

doi = "10.1007/s00521-024-10239-6",

language = "English",

volume = "36",

pages = "19365--19382",

journal = "Neural Computing and Applications",

issn = "0941-0643",

publisher = "Springer London",

number = "31",

}

TY - JOUR

T1 - Deep learning-based few-shot person re-identification from top-view RGB and depth images

AU - Abed, Almustafa

AU - Akrout, Belhassen

AU - Amous, Ikram

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

PY - 2024/11

Y1 - 2024/11

N2 - Person re-identification (re-id) attempts to match a person from the images of different time steps. Existing deep learning approaches either use appearance or geometry features for re-id which does not provide the required robustness because of higher intra-class similarity. Existing supervised re-id approaches utilize Convolutional Neural Networks (CNNs) and identity-labeled images to train, where the person images are taken by the sensors from a horizontal view. The horizontal view exposes the privacy of the people because of their facial appearance in the image. Moreover, person re-id includes new unseen people; however, CNN does not have the ability to identify the new unseen people because of a lack of continual learning. Privacy-preserved computer vision-assisted person re-id systems can benefit from visual appearance and geometry features extracted from top-view RGB and depth input. This paper presents the privacy-preserved person top-view re-id few-shot network which uses the appearance and geometry features. The EfficientNet is used for appearance-based features from RGB input, while PointNet is used to extract the geometry features from the point cloud which is made from the RGB-D image registration. Concatenated features from EfficientNet and PointNet are fed to the two-layer Bi-LSTM network for person identification. Finally, the whole network is converted into a few-shot network to achieve continual learning by removing the output layer and joining the similarity measurement unit. This approach is based on CNN and fine-tunes a TVPR/2 dataset acquired by using a top-view arrangement that is publicly available. The experimental results on TVPR/2 and GODPR datasets show that the proposed re-id network outperforms other state-of-the-art networks.

AB - Person re-identification (re-id) attempts to match a person from the images of different time steps. Existing deep learning approaches either use appearance or geometry features for re-id which does not provide the required robustness because of higher intra-class similarity. Existing supervised re-id approaches utilize Convolutional Neural Networks (CNNs) and identity-labeled images to train, where the person images are taken by the sensors from a horizontal view. The horizontal view exposes the privacy of the people because of their facial appearance in the image. Moreover, person re-id includes new unseen people; however, CNN does not have the ability to identify the new unseen people because of a lack of continual learning. Privacy-preserved computer vision-assisted person re-id systems can benefit from visual appearance and geometry features extracted from top-view RGB and depth input. This paper presents the privacy-preserved person top-view re-id few-shot network which uses the appearance and geometry features. The EfficientNet is used for appearance-based features from RGB input, while PointNet is used to extract the geometry features from the point cloud which is made from the RGB-D image registration. Concatenated features from EfficientNet and PointNet are fed to the two-layer Bi-LSTM network for person identification. Finally, the whole network is converted into a few-shot network to achieve continual learning by removing the output layer and joining the similarity measurement unit. This approach is based on CNN and fine-tunes a TVPR/2 dataset acquired by using a top-view arrangement that is publicly available. The experimental results on TVPR/2 and GODPR datasets show that the proposed re-id network outperforms other state-of-the-art networks.

KW - Convolutional neural networks

KW - Intelligent retail stores

KW - Person re-identification (Re-ID)

KW - Top-view configuration

UR - https://www.scopus.com/pages/publications/85200343620

U2 - 10.1007/s00521-024-10239-6

DO - 10.1007/s00521-024-10239-6

M3 - Article

AN - SCOPUS:85200343620

SN - 0941-0643

VL - 36

SP - 19365

EP - 19382

JO - Neural Computing and Applications

JF - Neural Computing and Applications

IS - 31

ER -

Deep learning-based few-shot person re-identification from top-view RGB and depth images

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this