TY - JOUR
T1 - Spatial-Temporal Dynamic Graph Attention Network for Skeleton-Based Action Recognition
AU - Rahevar, Mrugendrasinh
AU - Ganatra, Amit
AU - Saba, Tanzila
AU - Rehman, Amjad
AU - Bahaj, Saeed Ali
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2023
Y1 - 2023
N2 - Human body skeleton, acting as a spatiotemporal graph, is increasing attentions of researchers to adopt graph convolutional networks (GCN) to mine the discriminative features from skeleton joints. However, one of GCN's flaws is its inability to handle long-distance reliance between joints. In this regard, graph attention network (GAT) was recently suggested, which combines graph convolutions with a self-attention mechanism to extract the most informative joint of a human skeleton and increase the model accuracy. However, GAT can compute only static attention: for each query node, the attention rank is same which severely hurts and limits the expressivity of an attention mechanism. In this work, we present a spatial-temporal dynamic graph attention network (ST-DGAT) to learn the spatial-temporal patterns of skeleton sequences. For dynamic graph attention, we tweak the order of weighted vector operations in GAT, our approach achieves a global approximate attention function, making it strictly superior to GAT. Experiments show that by fixing the order of internal operation of GAT the proposed model achieved better action classification results while maintaining the same computing cost as GAT. The proposed framework has been evaluated on well-known publicly available large-scale datasets NTU60, NTU120, and Kinetics-400, which notably outperforms state-of-the-art (SOTA) results with an accuracy of 96.4%, 88.2%, and 61.0%, respectively.
AB - Human body skeleton, acting as a spatiotemporal graph, is increasing attentions of researchers to adopt graph convolutional networks (GCN) to mine the discriminative features from skeleton joints. However, one of GCN's flaws is its inability to handle long-distance reliance between joints. In this regard, graph attention network (GAT) was recently suggested, which combines graph convolutions with a self-attention mechanism to extract the most informative joint of a human skeleton and increase the model accuracy. However, GAT can compute only static attention: for each query node, the attention rank is same which severely hurts and limits the expressivity of an attention mechanism. In this work, we present a spatial-temporal dynamic graph attention network (ST-DGAT) to learn the spatial-temporal patterns of skeleton sequences. For dynamic graph attention, we tweak the order of weighted vector operations in GAT, our approach achieves a global approximate attention function, making it strictly superior to GAT. Experiments show that by fixing the order of internal operation of GAT the proposed model achieved better action classification results while maintaining the same computing cost as GAT. The proposed framework has been evaluated on well-known publicly available large-scale datasets NTU60, NTU120, and Kinetics-400, which notably outperforms state-of-the-art (SOTA) results with an accuracy of 96.4%, 88.2%, and 61.0%, respectively.
KW - action recognition
KW - graph attention network
KW - multihead attention
KW - Skeleton
UR - http://www.scopus.com/inward/record.url?scp=85149391146&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2023.3247820
DO - 10.1109/ACCESS.2023.3247820
M3 - Article
AN - SCOPUS:85149391146
SN - 2169-3536
VL - 11
SP - 21546
EP - 21553
JO - IEEE Access
JF - IEEE Access
ER -