Semantic Heads Segmentation and Counting in Crowded Retail Environment with Convolutional Neural Networks Using Top View Depth Images

Almustafa Abed; Belhassen Akrout; Ikram Amous

doi:10.1007/s42979-022-01467-5

Semantic Heads Segmentation and Counting in Crowded Retail Environment with Convolutional Neural Networks Using Top View Depth Images

Almustafa Abed, Belhassen Akrout, Ikram Amous

Computer Sciences

University of Sfax

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

In recent years, researchers have developed several techniques to accurately count the number of people in a crowded retail environment for human behavior analysis, ranging from vision and feature based approaches to machine-learning and deep learning approaches. Due to the availability and affordability of recent advanced technologies such as depth sensors and high computing powers as well as the availability of big data, the need for building accurate models to detect and count people in crowded environments arises. Deep learning approaches has proven to be highly accurate especially when there is enough data to train the model and high computation powers. People detection and counting is a challenging problem. due to several issues such as occlusion, light variations, complex backgrounds to name a few. To cope with these issues, we utilize a top-view configuration depth data to train our model. We propose a convolutional encoder decoder architecture consisting of a resnet50 encoder trained on the ImageNet dataset as a transfer learning technique and we built the decoder part of the model as a novel contribution, for segmenting and counting customers heads in a crowded retail stores where there are more than 6 individuals per square meter without compromising people privacy because the camera does not record people’s faces. The objective of our method is to segment and count people using the publicly available TV-Head Dataset and People Counting DataSet (PCDS). The results demonstrate that our model is robust and can be used for real time people counting with accurate results.

Original language	English
Article number	61
Journal	SN Computer Science
Volume	4
Issue number	1
DOIs	https://doi.org/10.1007/s42979-022-01467-5
State	Published - Jan 2023

Keywords

Computer vision
Convolutional neural networks
Intelligent retail environment
People counting
Top-view configuration

Access to Document

10.1007/s42979-022-01467-5

Cite this

@article{3e4d136c3d3e4f74a5f536cc094f1f81,

title = "Semantic Heads Segmentation and Counting in Crowded Retail Environment with Convolutional Neural Networks Using Top View Depth Images",

abstract = "In recent years, researchers have developed several techniques to accurately count the number of people in a crowded retail environment for human behavior analysis, ranging from vision and feature based approaches to machine-learning and deep learning approaches. Due to the availability and affordability of recent advanced technologies such as depth sensors and high computing powers as well as the availability of big data, the need for building accurate models to detect and count people in crowded environments arises. Deep learning approaches has proven to be highly accurate especially when there is enough data to train the model and high computation powers. People detection and counting is a challenging problem. due to several issues such as occlusion, light variations, complex backgrounds to name a few. To cope with these issues, we utilize a top-view configuration depth data to train our model. We propose a convolutional encoder decoder architecture consisting of a resnet50 encoder trained on the ImageNet dataset as a transfer learning technique and we built the decoder part of the model as a novel contribution, for segmenting and counting customers heads in a crowded retail stores where there are more than 6 individuals per square meter without compromising people privacy because the camera does not record people{\textquoteright}s faces. The objective of our method is to segment and count people using the publicly available TV-Head Dataset and People Counting DataSet (PCDS). The results demonstrate that our model is robust and can be used for real time people counting with accurate results.",

keywords = "Computer vision, Convolutional neural networks, Intelligent retail environment, People counting, Top-view configuration",

author = "Almustafa Abed and Belhassen Akrout and Ikram Amous",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.",

year = "2023",

month = jan,

doi = "10.1007/s42979-022-01467-5",

language = "English",

volume = "4",

journal = "SN Computer Science",

issn = "2662-995X",

publisher = "Springer",

number = "1",

}

TY - JOUR

T1 - Semantic Heads Segmentation and Counting in Crowded Retail Environment with Convolutional Neural Networks Using Top View Depth Images

AU - Abed, Almustafa

AU - Akrout, Belhassen

AU - Amous, Ikram

PY - 2023/1

Y1 - 2023/1

N2 - In recent years, researchers have developed several techniques to accurately count the number of people in a crowded retail environment for human behavior analysis, ranging from vision and feature based approaches to machine-learning and deep learning approaches. Due to the availability and affordability of recent advanced technologies such as depth sensors and high computing powers as well as the availability of big data, the need for building accurate models to detect and count people in crowded environments arises. Deep learning approaches has proven to be highly accurate especially when there is enough data to train the model and high computation powers. People detection and counting is a challenging problem. due to several issues such as occlusion, light variations, complex backgrounds to name a few. To cope with these issues, we utilize a top-view configuration depth data to train our model. We propose a convolutional encoder decoder architecture consisting of a resnet50 encoder trained on the ImageNet dataset as a transfer learning technique and we built the decoder part of the model as a novel contribution, for segmenting and counting customers heads in a crowded retail stores where there are more than 6 individuals per square meter without compromising people privacy because the camera does not record people’s faces. The objective of our method is to segment and count people using the publicly available TV-Head Dataset and People Counting DataSet (PCDS). The results demonstrate that our model is robust and can be used for real time people counting with accurate results.

AB - In recent years, researchers have developed several techniques to accurately count the number of people in a crowded retail environment for human behavior analysis, ranging from vision and feature based approaches to machine-learning and deep learning approaches. Due to the availability and affordability of recent advanced technologies such as depth sensors and high computing powers as well as the availability of big data, the need for building accurate models to detect and count people in crowded environments arises. Deep learning approaches has proven to be highly accurate especially when there is enough data to train the model and high computation powers. People detection and counting is a challenging problem. due to several issues such as occlusion, light variations, complex backgrounds to name a few. To cope with these issues, we utilize a top-view configuration depth data to train our model. We propose a convolutional encoder decoder architecture consisting of a resnet50 encoder trained on the ImageNet dataset as a transfer learning technique and we built the decoder part of the model as a novel contribution, for segmenting and counting customers heads in a crowded retail stores where there are more than 6 individuals per square meter without compromising people privacy because the camera does not record people’s faces. The objective of our method is to segment and count people using the publicly available TV-Head Dataset and People Counting DataSet (PCDS). The results demonstrate that our model is robust and can be used for real time people counting with accurate results.

KW - Computer vision

KW - Convolutional neural networks

KW - Intelligent retail environment

KW - People counting

KW - Top-view configuration

UR - http://www.scopus.com/inward/record.url?scp=85142168320&partnerID=8YFLogxK

U2 - 10.1007/s42979-022-01467-5

DO - 10.1007/s42979-022-01467-5

M3 - Article

AN - SCOPUS:85142168320

SN - 2662-995X

VL - 4

JO - SN Computer Science

JF - SN Computer Science

IS - 1

M1 - 61

ER -

Semantic Heads Segmentation and Counting in Crowded Retail Environment with Convolutional Neural Networks Using Top View Depth Images

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this