TY - JOUR
T1 - Semantic Heads Segmentation and Counting in Crowded Retail Environment with Convolutional Neural Networks Using Top View Depth Images
AU - Abed, Almustafa
AU - Akrout, Belhassen
AU - Amous, Ikram
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
PY - 2023/1
Y1 - 2023/1
N2 - In recent years, researchers have developed several techniques to accurately count the number of people in a crowded retail environment for human behavior analysis, ranging from vision and feature based approaches to machine-learning and deep learning approaches. Due to the availability and affordability of recent advanced technologies such as depth sensors and high computing powers as well as the availability of big data, the need for building accurate models to detect and count people in crowded environments arises. Deep learning approaches has proven to be highly accurate especially when there is enough data to train the model and high computation powers. People detection and counting is a challenging problem. due to several issues such as occlusion, light variations, complex backgrounds to name a few. To cope with these issues, we utilize a top-view configuration depth data to train our model. We propose a convolutional encoder decoder architecture consisting of a resnet50 encoder trained on the ImageNet dataset as a transfer learning technique and we built the decoder part of the model as a novel contribution, for segmenting and counting customers heads in a crowded retail stores where there are more than 6 individuals per square meter without compromising people privacy because the camera does not record people’s faces. The objective of our method is to segment and count people using the publicly available TV-Head Dataset and People Counting DataSet (PCDS). The results demonstrate that our model is robust and can be used for real time people counting with accurate results.
AB - In recent years, researchers have developed several techniques to accurately count the number of people in a crowded retail environment for human behavior analysis, ranging from vision and feature based approaches to machine-learning and deep learning approaches. Due to the availability and affordability of recent advanced technologies such as depth sensors and high computing powers as well as the availability of big data, the need for building accurate models to detect and count people in crowded environments arises. Deep learning approaches has proven to be highly accurate especially when there is enough data to train the model and high computation powers. People detection and counting is a challenging problem. due to several issues such as occlusion, light variations, complex backgrounds to name a few. To cope with these issues, we utilize a top-view configuration depth data to train our model. We propose a convolutional encoder decoder architecture consisting of a resnet50 encoder trained on the ImageNet dataset as a transfer learning technique and we built the decoder part of the model as a novel contribution, for segmenting and counting customers heads in a crowded retail stores where there are more than 6 individuals per square meter without compromising people privacy because the camera does not record people’s faces. The objective of our method is to segment and count people using the publicly available TV-Head Dataset and People Counting DataSet (PCDS). The results demonstrate that our model is robust and can be used for real time people counting with accurate results.
KW - Computer vision
KW - Convolutional neural networks
KW - Intelligent retail environment
KW - People counting
KW - Top-view configuration
UR - http://www.scopus.com/inward/record.url?scp=85142168320&partnerID=8YFLogxK
U2 - 10.1007/s42979-022-01467-5
DO - 10.1007/s42979-022-01467-5
M3 - Article
AN - SCOPUS:85142168320
SN - 2662-995X
VL - 4
JO - SN Computer Science
JF - SN Computer Science
IS - 1
M1 - 61
ER -