Pyramidal attention with progressive multi-stage iterative feature refinement for salient object segmentation

Rahim Khan; Nada Alzaben; Yousef Ibrahim Daradkeh; Xianxun Zhu; Inam Ullah

doi:10.1016/j.imavis.2025.105670

Pyramidal attention with progressive multi-stage iterative feature refinement for salient object segmentation

Rahim Khan, Nada Alzaben, Yousef Ibrahim Daradkeh, Xianxun Zhu, Inam Ullah

Computer Engineering

Research output: Contribution to journal › Article › peer-review

Abstract

Accurate detection of salient objects in complex visual scenes remains a fundamental yet challenging task in visual intelligence, often impeded by significant scale variation, background clutter, and indistinct object boundaries. While recent approaches attempt to exploit multi-level features, they frequently encounter limitations such as semantic misalignment across feature hierarchies, spatial detail degradation, and weak cross-dataset generalization. To overcome these challenges, we propose a novel Pyramidal Attention Mechanism (PAM) with Progressive Multi-stage Iterative Feature Refinement Network (PIFRNet) designed for robust and precise Salient Object Detection (SOD). Specifically, our method begins by hierarchically aggregating features from four representative stages of a powerful backbone, ensuring rich multi-scale context and semantic diversity. To bridge semantic gaps and recover fine structures, we introduce a Progressive Bilateral Feature Refinement (PBFR) module, which enhances early-stage features through cascaded convolutions and spatial attention. Furthermore, the novel PAM, equipped with dilated convolutions, is introduced to refine high-level semantics and reinforce object completeness. The network integrates these components through a multi-stage iterative refinement process, enabling gradual enhancement of spatial precision and structural fidelity. Extensive experiments conducted on five public SOD benchmarks demonstrate that our approach achieves superior performance compared to state-of-the-art methods, both quantitatively and qualitatively. Cross-dataset evaluations further validate its strong generalization capability, making it highly applicable to real-world visual intelligence scenarios.

Original language	English
Article number	105670
Journal	Image and Vision Computing
Volume	162
DOIs	https://doi.org/10.1016/j.imavis.2025.105670
State	Published - Oct 2025

Keywords

Bilateral merging
Hierarchical aggregation
Multi-scale representation
Pyramidal attention
Saliency detection
Visual intelligence

Access to Document

10.1016/j.imavis.2025.105670

Cite this

@article{b69542110138490c899388b9c20c1951,

title = "Pyramidal attention with progressive multi-stage iterative feature refinement for salient object segmentation",

abstract = "Accurate detection of salient objects in complex visual scenes remains a fundamental yet challenging task in visual intelligence, often impeded by significant scale variation, background clutter, and indistinct object boundaries. While recent approaches attempt to exploit multi-level features, they frequently encounter limitations such as semantic misalignment across feature hierarchies, spatial detail degradation, and weak cross-dataset generalization. To overcome these challenges, we propose a novel Pyramidal Attention Mechanism (PAM) with Progressive Multi-stage Iterative Feature Refinement Network (PIFRNet) designed for robust and precise Salient Object Detection (SOD). Specifically, our method begins by hierarchically aggregating features from four representative stages of a powerful backbone, ensuring rich multi-scale context and semantic diversity. To bridge semantic gaps and recover fine structures, we introduce a Progressive Bilateral Feature Refinement (PBFR) module, which enhances early-stage features through cascaded convolutions and spatial attention. Furthermore, the novel PAM, equipped with dilated convolutions, is introduced to refine high-level semantics and reinforce object completeness. The network integrates these components through a multi-stage iterative refinement process, enabling gradual enhancement of spatial precision and structural fidelity. Extensive experiments conducted on five public SOD benchmarks demonstrate that our approach achieves superior performance compared to state-of-the-art methods, both quantitatively and qualitatively. Cross-dataset evaluations further validate its strong generalization capability, making it highly applicable to real-world visual intelligence scenarios.",

keywords = "Bilateral merging, Hierarchical aggregation, Multi-scale representation, Pyramidal attention, Saliency detection, Visual intelligence",

author = "Rahim Khan and Nada Alzaben and Daradkeh, \{Yousef Ibrahim\} and Xianxun Zhu and Inam Ullah",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier B.V.",

year = "2025",

month = oct,

doi = "10.1016/j.imavis.2025.105670",

language = "English",

volume = "162",

journal = "Image and Vision Computing",

issn = "0262-8856",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - Pyramidal attention with progressive multi-stage iterative feature refinement for salient object segmentation

AU - Khan, Rahim

AU - Alzaben, Nada

AU - Daradkeh, Yousef Ibrahim

AU - Zhu, Xianxun

AU - Ullah, Inam

PY - 2025/10

Y1 - 2025/10

N2 - Accurate detection of salient objects in complex visual scenes remains a fundamental yet challenging task in visual intelligence, often impeded by significant scale variation, background clutter, and indistinct object boundaries. While recent approaches attempt to exploit multi-level features, they frequently encounter limitations such as semantic misalignment across feature hierarchies, spatial detail degradation, and weak cross-dataset generalization. To overcome these challenges, we propose a novel Pyramidal Attention Mechanism (PAM) with Progressive Multi-stage Iterative Feature Refinement Network (PIFRNet) designed for robust and precise Salient Object Detection (SOD). Specifically, our method begins by hierarchically aggregating features from four representative stages of a powerful backbone, ensuring rich multi-scale context and semantic diversity. To bridge semantic gaps and recover fine structures, we introduce a Progressive Bilateral Feature Refinement (PBFR) module, which enhances early-stage features through cascaded convolutions and spatial attention. Furthermore, the novel PAM, equipped with dilated convolutions, is introduced to refine high-level semantics and reinforce object completeness. The network integrates these components through a multi-stage iterative refinement process, enabling gradual enhancement of spatial precision and structural fidelity. Extensive experiments conducted on five public SOD benchmarks demonstrate that our approach achieves superior performance compared to state-of-the-art methods, both quantitatively and qualitatively. Cross-dataset evaluations further validate its strong generalization capability, making it highly applicable to real-world visual intelligence scenarios.

AB - Accurate detection of salient objects in complex visual scenes remains a fundamental yet challenging task in visual intelligence, often impeded by significant scale variation, background clutter, and indistinct object boundaries. While recent approaches attempt to exploit multi-level features, they frequently encounter limitations such as semantic misalignment across feature hierarchies, spatial detail degradation, and weak cross-dataset generalization. To overcome these challenges, we propose a novel Pyramidal Attention Mechanism (PAM) with Progressive Multi-stage Iterative Feature Refinement Network (PIFRNet) designed for robust and precise Salient Object Detection (SOD). Specifically, our method begins by hierarchically aggregating features from four representative stages of a powerful backbone, ensuring rich multi-scale context and semantic diversity. To bridge semantic gaps and recover fine structures, we introduce a Progressive Bilateral Feature Refinement (PBFR) module, which enhances early-stage features through cascaded convolutions and spatial attention. Furthermore, the novel PAM, equipped with dilated convolutions, is introduced to refine high-level semantics and reinforce object completeness. The network integrates these components through a multi-stage iterative refinement process, enabling gradual enhancement of spatial precision and structural fidelity. Extensive experiments conducted on five public SOD benchmarks demonstrate that our approach achieves superior performance compared to state-of-the-art methods, both quantitatively and qualitatively. Cross-dataset evaluations further validate its strong generalization capability, making it highly applicable to real-world visual intelligence scenarios.

KW - Bilateral merging

KW - Hierarchical aggregation

KW - Multi-scale representation

KW - Pyramidal attention

KW - Saliency detection

KW - Visual intelligence

UR - http://www.scopus.com/inward/record.url?scp=105012090330&partnerID=8YFLogxK

U2 - 10.1016/j.imavis.2025.105670

DO - 10.1016/j.imavis.2025.105670

M3 - Article

AN - SCOPUS:105012090330

SN - 0262-8856

VL - 162

JO - Image and Vision Computing

JF - Image and Vision Computing

M1 - 105670

ER -

Pyramidal attention with progressive multi-stage iterative feature refinement for salient object segmentation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this