Convolutional Neural Network for Head Segmentation and Counting in Crowded Retail Environment Using Top-view Depth Images

Almustafa Abed; Belhassen Akrout; Ikram Amous

doi:10.1007/s13369-023-08159-z

Convolutional Neural Network for Head Segmentation and Counting in Crowded Retail Environment Using Top-view Depth Images

Almustafa Abed, Belhassen Akrout, Ikram Amous

Computer Sciences

University of Sfax

Research output: Contribution to journal › Article › peer-review

9 Scopus citations

Abstract

Since the emergence of big data, the popularity of deep learning models has increased and they are being implemented in a wide range of applications, including people detection and counting in congested environments. Detecting and counting people for human behavior analysis in retail stores is a challenging research problem due to the congested and crowded environment. This paper proposes a deep learning approach for detecting and counting people in the presence of occlusions and illuminance variation in a crowded retail environment, utilizing deep CNNs (DCNNs) for semantic segmentation of top-view depth visual data. Semantic segmentation has been implemented using (DCNNs) in recent years since it is a powerful approach. The objective of this paper is to design a novel architecture that consists of an encoder–decoder architecture. We were motivated to use transfer learning to solve the problem of insufficient training data. We used ResNet50 for the encoder, and we built the decoder part as a novel contribution. Our model was trained and evaluated on the TVHeads dataset and the people counting dataset (PCDS) that are available for research purposes. It consists of depth data of people captured from a top-view RGB-D sensor. The segmentation results indicate high accuracy and demonstrate that the proposed model is robust and accurate.

Original language	English
Pages (from-to)	3735-3749
Number of pages	15
Journal	Arabian Journal for Science and Engineering
Volume	49
Issue number	3
DOIs	https://doi.org/10.1007/s13369-023-08159-z
State	Published - Mar 2024

Keywords

CNNs
Intelligent retail stores
People counting
Top-view configuration

Access to Document

10.1007/s13369-023-08159-z

Cite this

@article{1d239bdce9d9466e8792f10d4a37e9cd,

title = "Convolutional Neural Network for Head Segmentation and Counting in Crowded Retail Environment Using Top-view Depth Images",

abstract = "Since the emergence of big data, the popularity of deep learning models has increased and they are being implemented in a wide range of applications, including people detection and counting in congested environments. Detecting and counting people for human behavior analysis in retail stores is a challenging research problem due to the congested and crowded environment. This paper proposes a deep learning approach for detecting and counting people in the presence of occlusions and illuminance variation in a crowded retail environment, utilizing deep CNNs (DCNNs) for semantic segmentation of top-view depth visual data. Semantic segmentation has been implemented using (DCNNs) in recent years since it is a powerful approach. The objective of this paper is to design a novel architecture that consists of an encoder–decoder architecture. We were motivated to use transfer learning to solve the problem of insufficient training data. We used ResNet50 for the encoder, and we built the decoder part as a novel contribution. Our model was trained and evaluated on the TVHeads dataset and the people counting dataset (PCDS) that are available for research purposes. It consists of depth data of people captured from a top-view RGB-D sensor. The segmentation results indicate high accuracy and demonstrate that the proposed model is robust and accurate.",

keywords = "CNNs, Intelligent retail stores, People counting, Top-view configuration",

author = "Almustafa Abed and Belhassen Akrout and Ikram Amous",

note = "Publisher Copyright: {\textcopyright} King Fahd University of Petroleum \& Minerals 2023.",

year = "2024",

month = mar,

doi = "10.1007/s13369-023-08159-z",

language = "English",

volume = "49",

pages = "3735--3749",

journal = "Arabian Journal for Science and Engineering",

issn = "2193-567X",

publisher = "Springer Nature",

number = "3",

}

TY - JOUR

T1 - Convolutional Neural Network for Head Segmentation and Counting in Crowded Retail Environment Using Top-view Depth Images

AU - Abed, Almustafa

AU - Akrout, Belhassen

AU - Amous, Ikram

N1 - Publisher Copyright: © King Fahd University of Petroleum & Minerals 2023.

PY - 2024/3

Y1 - 2024/3

N2 - Since the emergence of big data, the popularity of deep learning models has increased and they are being implemented in a wide range of applications, including people detection and counting in congested environments. Detecting and counting people for human behavior analysis in retail stores is a challenging research problem due to the congested and crowded environment. This paper proposes a deep learning approach for detecting and counting people in the presence of occlusions and illuminance variation in a crowded retail environment, utilizing deep CNNs (DCNNs) for semantic segmentation of top-view depth visual data. Semantic segmentation has been implemented using (DCNNs) in recent years since it is a powerful approach. The objective of this paper is to design a novel architecture that consists of an encoder–decoder architecture. We were motivated to use transfer learning to solve the problem of insufficient training data. We used ResNet50 for the encoder, and we built the decoder part as a novel contribution. Our model was trained and evaluated on the TVHeads dataset and the people counting dataset (PCDS) that are available for research purposes. It consists of depth data of people captured from a top-view RGB-D sensor. The segmentation results indicate high accuracy and demonstrate that the proposed model is robust and accurate.

AB - Since the emergence of big data, the popularity of deep learning models has increased and they are being implemented in a wide range of applications, including people detection and counting in congested environments. Detecting and counting people for human behavior analysis in retail stores is a challenging research problem due to the congested and crowded environment. This paper proposes a deep learning approach for detecting and counting people in the presence of occlusions and illuminance variation in a crowded retail environment, utilizing deep CNNs (DCNNs) for semantic segmentation of top-view depth visual data. Semantic segmentation has been implemented using (DCNNs) in recent years since it is a powerful approach. The objective of this paper is to design a novel architecture that consists of an encoder–decoder architecture. We were motivated to use transfer learning to solve the problem of insufficient training data. We used ResNet50 for the encoder, and we built the decoder part as a novel contribution. Our model was trained and evaluated on the TVHeads dataset and the people counting dataset (PCDS) that are available for research purposes. It consists of depth data of people captured from a top-view RGB-D sensor. The segmentation results indicate high accuracy and demonstrate that the proposed model is robust and accurate.

KW - CNNs

KW - Intelligent retail stores

KW - People counting

KW - Top-view configuration

UR - https://www.scopus.com/pages/publications/85167910641

U2 - 10.1007/s13369-023-08159-z

DO - 10.1007/s13369-023-08159-z

M3 - Article

AN - SCOPUS:85167910641

SN - 2193-567X

VL - 49

SP - 3735

EP - 3749

JO - Arabian Journal for Science and Engineering

JF - Arabian Journal for Science and Engineering

IS - 3

ER -

Convolutional Neural Network for Head Segmentation and Counting in Crowded Retail Environment Using Top-view Depth Images

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this