TY - JOUR
T1 - Speech-based respiratory diagnostics
T2 - A study on COVID-19 detection with machine learning
AU - Datkhile, Gaurav
AU - Kachare, Pramod H.
AU - Sangle, Sandeep B.
AU - Al-Shourbaji, Ibrahim
AU - Jabbari, Abdoh
AU - Kirner, Raimund
AU - Alameen, Abdalla
N1 - Publisher Copyright:
© 2025 Datkhile et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2025/11
Y1 - 2025/11
N2 - Respiratory sound analysis has emerged as a promising approach for detecting and diagnosing respiratory diseases, including COVID-19. This study investigates using OpenSMILE features for COVID-19 detection using vowel speech sounds /a/, /e/, and /o/ from the COSWARA dataset. OpenSMILE facilitates the extraction of audio and functional features, which are then classified using various machine learning algorithms. Multiple ML classifiers Random Forest (RF), Support Vector Machine, Decision Tree, and Artificial Neural Network are evaluated. To enhance classification performance, five distinct feature selection techniques were applied: ANOVA, chi-square, Information Gain, ReliefF, and Gini index. Among these, ANOVA-based selection yielded the most consistent results across classifiers and vowel sounds. Among the models evaluated, the RF classifier achieved the highest accuracies of 76.47% for vowel /a/ and 75.54% for vowels /a/ and /o/, respectively, when combined with ANOVA-selected features (155, 163, and 161 features). To statistically assess model and feature selection performances, the Friedman test was conducted across classifiers and feature selection techniques. Results confirmed the significance of Random Forest and ANOVA as robust combinations. This research contributes to developing accessible, scalable, and non-invasive diagnostic tools, enhancing the potential of telemedicine and remote healthcare systems for the early detection of respiratory diseases.
AB - Respiratory sound analysis has emerged as a promising approach for detecting and diagnosing respiratory diseases, including COVID-19. This study investigates using OpenSMILE features for COVID-19 detection using vowel speech sounds /a/, /e/, and /o/ from the COSWARA dataset. OpenSMILE facilitates the extraction of audio and functional features, which are then classified using various machine learning algorithms. Multiple ML classifiers Random Forest (RF), Support Vector Machine, Decision Tree, and Artificial Neural Network are evaluated. To enhance classification performance, five distinct feature selection techniques were applied: ANOVA, chi-square, Information Gain, ReliefF, and Gini index. Among these, ANOVA-based selection yielded the most consistent results across classifiers and vowel sounds. Among the models evaluated, the RF classifier achieved the highest accuracies of 76.47% for vowel /a/ and 75.54% for vowels /a/ and /o/, respectively, when combined with ANOVA-selected features (155, 163, and 161 features). To statistically assess model and feature selection performances, the Friedman test was conducted across classifiers and feature selection techniques. Results confirmed the significance of Random Forest and ANOVA as robust combinations. This research contributes to developing accessible, scalable, and non-invasive diagnostic tools, enhancing the potential of telemedicine and remote healthcare systems for the early detection of respiratory diseases.
UR - https://www.scopus.com/pages/publications/105022614377
U2 - 10.1371/journal.pone.0332146
DO - 10.1371/journal.pone.0332146
M3 - Article
C2 - 41270108
AN - SCOPUS:105022614377
SN - 1932-6203
VL - 20
JO - PLoS ONE
JF - PLoS ONE
IS - 11 November
M1 - e0332146
ER -