A Multimodal Large Language Model Framework for Clinical Subtyping and Malignant Transformation Risk Prediction in Oral Lichen Planus: A Paired Comparison With Expert Clinicians

Research output: Contribution to journalArticlepeer-review

Abstract

Background Oral lichen planus (OLP), oral lichenoid lesions (OLL), and squamous cell carcinoma on a lichenoid background (SCC-over-LP/LLP) overlap clinically, delaying malignant transformation recognition. Objective To evaluate a multimodal large language model (ChatGPT-5) against oral medicine (OM) specialists for tripartite classification (OLP/OLL/SCC-over-LP/LLP) and malignant-risk flagging. Methods Cross-sectional, paired diagnostic accuracy study adhering to STARD/STARD-AI. Retrospective, anonymized cases ( n = 262; OLP = 100, OLL = 100, SCC-over-LP/LLP = 62) were independently evaluated by ChatGPT-5 and a comparator panel of board-certified OM specialists using identical clinical histories and intraoral photographs (no histopathology provided to either). A separate reference standard panel (three OM experts) established the diagnosis using full clinical data and histopathology prior to index testing. Primary outcome: paired accuracy (McNemar). Secondary: certainty (1-5), management agreement (Gwet’s AC1), and recognition of malignant red-flag features. Results Overall accuracy was comparable (84.7% ChatGPT-5 vs 85.5% OM specialists; McNemar P = .856, Cohen’s h = 0.03). Sensitivity was high for OLP 0.99 and SCC-over-LP/LLP 0.85; OLL sensitivity 0.70 with specificity 1.00. Biopsy/referral agreement was near-perfect (AC1 = 0.91). Malignant-risk features were correctly identified in 88% of SCC-over-LP/LLP cases by ChatGPT-5 vs 92% by OM specialists ( P = .41). Conclusions A multimodal large language model can reach expert-level accuracy for OLP/OLL/SCC-over-LP/LLP and reliably flag malignant transformation risk, supporting its role as an adjunctive decision-support tool in OM.

Original languageEnglish
Article number109357
JournalInternational Dental Journal
Volume76
Issue number1
DOIs
StatePublished - Feb 2026

Keywords

  • Artificial intelligence
  • Diagnostic accuracy
  • Large language model
  • Oral lichen planus
  • Oral lichenoid lesion
  • Squamous cell carcinoma

Fingerprint

Dive into the research topics of 'A Multimodal Large Language Model Framework for Clinical Subtyping and Malignant Transformation Risk Prediction in Oral Lichen Planus: A Paired Comparison With Expert Clinicians'. Together they form a unique fingerprint.

Cite this