TY - JOUR
T1 - Benchmarking pathology foundation models for predicting microsatellite instability in colorectal cancer histopathology
AU - Bilal, Mohsin
AU - Gulzar, Muhammad Aamir
AU - Jaffar, Nazish
AU - Alabduljabbar, Abdulrahman
AU - Altherwy, Youssef
AU - Alsuhaibani, Anas
AU - Almarshad, Fahdah
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2026/1
Y1 - 2026/1
N2 - The rapid evolution of pathology foundation models necessitates rigorous benchmarking for clinical tasks. We evaluated three leading foundation models, UNI, Virchow2, and CONCH, for predicting microsatellite instability status from colorectal cancer whole-slide images, an essential routine clinical test. Our comprehensive framework assessed stain, tissue, and resolution invariance using datasets from The Cancer Genome Atlas (TCGA, USA; n = 409) and Pathology Artificial Intelligence Platform (PAIP, South Korea; training n = 47, testing n = 21 and n = 78). We developed an efficient pipeline with minimal preprocessing, omitting stain normalization, color augmentation, and tumor segmentation. To improve contextual encoding, we applied a five-crop strategy per patch, averaging embeddings from the center and four peripheral crops. We compared three slide-level aggregation and four efficient adaptation strategies. CONCH, using 2-cluster aggregation and ProtoNet adaptation, achieved top balanced accuracies (0.775 and 0.778) in external validation on PAIP. Conversely, UNI, with mean-averaging aggregation and ANN adaptation, excelled in TCGA cross-validation (0.778) but not in external validation (0.764), suggesting potential overfitting. The proposed 5-Crop augmentation enhances robustness to scale in UNI and CONCH and reflects intrinsic invariance achieved by Virchow2 through large-scale pretraining. For prescreening, CONCH demonstrated specificity of 0.65 and 0.45 at sensitivities of 0.90 and 0.94, respectively, highlighting its effectiveness in identifying stable cases and minimizing number of rapid molecular tests needed. Interestingly, a fine-tuned ResNet34 adaptation achieved superior performance (0.836) in the smaller internal validation cohort, suggesting current pathology foundation models training recipes may not sufficiently generalize without task-specific fine-tuning. Interpretability analyses using CONCH's multimodal embeddings identified plasma cells as key morphological features differentiating microsatellite instability from stability, validated by pathologists (accuracy up to 92.4 %). This study underscores the feasibility and clinical significance of adapting foundation models to enhance diagnostic efficiency and patient outcomes.
AB - The rapid evolution of pathology foundation models necessitates rigorous benchmarking for clinical tasks. We evaluated three leading foundation models, UNI, Virchow2, and CONCH, for predicting microsatellite instability status from colorectal cancer whole-slide images, an essential routine clinical test. Our comprehensive framework assessed stain, tissue, and resolution invariance using datasets from The Cancer Genome Atlas (TCGA, USA; n = 409) and Pathology Artificial Intelligence Platform (PAIP, South Korea; training n = 47, testing n = 21 and n = 78). We developed an efficient pipeline with minimal preprocessing, omitting stain normalization, color augmentation, and tumor segmentation. To improve contextual encoding, we applied a five-crop strategy per patch, averaging embeddings from the center and four peripheral crops. We compared three slide-level aggregation and four efficient adaptation strategies. CONCH, using 2-cluster aggregation and ProtoNet adaptation, achieved top balanced accuracies (0.775 and 0.778) in external validation on PAIP. Conversely, UNI, with mean-averaging aggregation and ANN adaptation, excelled in TCGA cross-validation (0.778) but not in external validation (0.764), suggesting potential overfitting. The proposed 5-Crop augmentation enhances robustness to scale in UNI and CONCH and reflects intrinsic invariance achieved by Virchow2 through large-scale pretraining. For prescreening, CONCH demonstrated specificity of 0.65 and 0.45 at sensitivities of 0.90 and 0.94, respectively, highlighting its effectiveness in identifying stable cases and minimizing number of rapid molecular tests needed. Interestingly, a fine-tuned ResNet34 adaptation achieved superior performance (0.836) in the smaller internal validation cohort, suggesting current pathology foundation models training recipes may not sufficiently generalize without task-specific fine-tuning. Interpretability analyses using CONCH's multimodal embeddings identified plasma cells as key morphological features differentiating microsatellite instability from stability, validated by pathologists (accuracy up to 92.4 %). This study underscores the feasibility and clinical significance of adapting foundation models to enhance diagnostic efficiency and patient outcomes.
KW - Adaptation models
KW - Aggregation methods
KW - Foundation models
KW - Generalization
KW - Microsatellite instability
UR - https://www.scopus.com/pages/publications/105023826613
U2 - 10.1016/j.compmedimag.2025.102680
DO - 10.1016/j.compmedimag.2025.102680
M3 - Article
C2 - 41352179
AN - SCOPUS:105023826613
SN - 0895-6111
VL - 127
JO - Computerized Medical Imaging and Graphics
JF - Computerized Medical Imaging and Graphics
M1 - 102680
ER -