Spatial-Temporal Dynamic Graph Attention Network for Skeleton-Based Action Recognition

Mrugendrasinh Rahevar; Amit Ganatra; Tanzila Saba; Amjad Rehman; Saeed Ali Bahaj

doi:10.1109/ACCESS.2023.3247820

Spatial-Temporal Dynamic Graph Attention Network for Skeleton-Based Action Recognition

Mrugendrasinh Rahevar
, Amit Ganatra
, Tanzila Saba
, Amjad Rehman
, Saeed Ali Bahaj

Management Information Systems

Research output: Contribution to journal › Article › peer-review

35 Scopus citations

Abstract

Human body skeleton, acting as a spatiotemporal graph, is increasing attentions of researchers to adopt graph convolutional networks (GCN) to mine the discriminative features from skeleton joints. However, one of GCN's flaws is its inability to handle long-distance reliance between joints. In this regard, graph attention network (GAT) was recently suggested, which combines graph convolutions with a self-attention mechanism to extract the most informative joint of a human skeleton and increase the model accuracy. However, GAT can compute only static attention: for each query node, the attention rank is same which severely hurts and limits the expressivity of an attention mechanism. In this work, we present a spatial-temporal dynamic graph attention network (ST-DGAT) to learn the spatial-temporal patterns of skeleton sequences. For dynamic graph attention, we tweak the order of weighted vector operations in GAT, our approach achieves a global approximate attention function, making it strictly superior to GAT. Experiments show that by fixing the order of internal operation of GAT the proposed model achieved better action classification results while maintaining the same computing cost as GAT. The proposed framework has been evaluated on well-known publicly available large-scale datasets NTU60, NTU120, and Kinetics-400, which notably outperforms state-of-the-art (SOTA) results with an accuracy of 96.4%, 88.2%, and 61.0%, respectively.

Original language	English
Pages (from-to)	21546-21553
Number of pages	8
Journal	IEEE Access
Volume	11
DOIs	https://doi.org/10.1109/ACCESS.2023.3247820
State	Published - 2023

Keywords

action recognition
graph attention network
multihead attention
Skeleton

Access to Document

10.1109/ACCESS.2023.3247820

Cite this

@article{27f8f68f6e1f42e3b83727154922bdac,

title = "Spatial-Temporal Dynamic Graph Attention Network for Skeleton-Based Action Recognition",

abstract = "Human body skeleton, acting as a spatiotemporal graph, is increasing attentions of researchers to adopt graph convolutional networks (GCN) to mine the discriminative features from skeleton joints. However, one of GCN's flaws is its inability to handle long-distance reliance between joints. In this regard, graph attention network (GAT) was recently suggested, which combines graph convolutions with a self-attention mechanism to extract the most informative joint of a human skeleton and increase the model accuracy. However, GAT can compute only static attention: for each query node, the attention rank is same which severely hurts and limits the expressivity of an attention mechanism. In this work, we present a spatial-temporal dynamic graph attention network (ST-DGAT) to learn the spatial-temporal patterns of skeleton sequences. For dynamic graph attention, we tweak the order of weighted vector operations in GAT, our approach achieves a global approximate attention function, making it strictly superior to GAT. Experiments show that by fixing the order of internal operation of GAT the proposed model achieved better action classification results while maintaining the same computing cost as GAT. The proposed framework has been evaluated on well-known publicly available large-scale datasets NTU60, NTU120, and Kinetics-400, which notably outperforms state-of-the-art (SOTA) results with an accuracy of 96.4\%, 88.2\%, and 61.0\%, respectively.",

keywords = "action recognition, graph attention network, multihead attention, Skeleton",

author = "Mrugendrasinh Rahevar and Amit Ganatra and Tanzila Saba and Amjad Rehman and Bahaj, \{Saeed Ali\}",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2023",

doi = "10.1109/ACCESS.2023.3247820",

language = "English",

volume = "11",

pages = "21546--21553",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Spatial-Temporal Dynamic Graph Attention Network for Skeleton-Based Action Recognition

AU - Rahevar, Mrugendrasinh

AU - Ganatra, Amit

AU - Saba, Tanzila

AU - Rehman, Amjad

AU - Bahaj, Saeed Ali

PY - 2023

Y1 - 2023

N2 - Human body skeleton, acting as a spatiotemporal graph, is increasing attentions of researchers to adopt graph convolutional networks (GCN) to mine the discriminative features from skeleton joints. However, one of GCN's flaws is its inability to handle long-distance reliance between joints. In this regard, graph attention network (GAT) was recently suggested, which combines graph convolutions with a self-attention mechanism to extract the most informative joint of a human skeleton and increase the model accuracy. However, GAT can compute only static attention: for each query node, the attention rank is same which severely hurts and limits the expressivity of an attention mechanism. In this work, we present a spatial-temporal dynamic graph attention network (ST-DGAT) to learn the spatial-temporal patterns of skeleton sequences. For dynamic graph attention, we tweak the order of weighted vector operations in GAT, our approach achieves a global approximate attention function, making it strictly superior to GAT. Experiments show that by fixing the order of internal operation of GAT the proposed model achieved better action classification results while maintaining the same computing cost as GAT. The proposed framework has been evaluated on well-known publicly available large-scale datasets NTU60, NTU120, and Kinetics-400, which notably outperforms state-of-the-art (SOTA) results with an accuracy of 96.4%, 88.2%, and 61.0%, respectively.

AB - Human body skeleton, acting as a spatiotemporal graph, is increasing attentions of researchers to adopt graph convolutional networks (GCN) to mine the discriminative features from skeleton joints. However, one of GCN's flaws is its inability to handle long-distance reliance between joints. In this regard, graph attention network (GAT) was recently suggested, which combines graph convolutions with a self-attention mechanism to extract the most informative joint of a human skeleton and increase the model accuracy. However, GAT can compute only static attention: for each query node, the attention rank is same which severely hurts and limits the expressivity of an attention mechanism. In this work, we present a spatial-temporal dynamic graph attention network (ST-DGAT) to learn the spatial-temporal patterns of skeleton sequences. For dynamic graph attention, we tweak the order of weighted vector operations in GAT, our approach achieves a global approximate attention function, making it strictly superior to GAT. Experiments show that by fixing the order of internal operation of GAT the proposed model achieved better action classification results while maintaining the same computing cost as GAT. The proposed framework has been evaluated on well-known publicly available large-scale datasets NTU60, NTU120, and Kinetics-400, which notably outperforms state-of-the-art (SOTA) results with an accuracy of 96.4%, 88.2%, and 61.0%, respectively.

KW - action recognition

KW - graph attention network

KW - multihead attention

KW - Skeleton

UR - https://www.scopus.com/pages/publications/85149391146

U2 - 10.1109/ACCESS.2023.3247820

DO - 10.1109/ACCESS.2023.3247820

M3 - Article

AN - SCOPUS:85149391146

SN - 2169-3536

VL - 11

SP - 21546

EP - 21553

JO - IEEE Access

JF - IEEE Access

ER -

Spatial-Temporal Dynamic Graph Attention Network for Skeleton-Based Action Recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this