TY - JOUR
T1 - An interpretable and balanced machine learning framework for Parkinson’s disease prediction using feature engineering and explainable AI
AU - Nayan, Nasim Mahmud
AU - Rana, Al Mamun
AU - Islam, Md. Monirul
AU - Uddin, Jia
AU - Yasmin, Tahmina
AU - Uddin, Jasim
A2 - Bukhari, Syed Nisar Hussain
N1 - Copyright: © 2025 Nayan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2025/10/31
Y1 - 2025/10/31
N2 - Parkinson’s disease (PD) is a progressive neurological disorder that affects millions globally, posing significant challenges in early and accurate diagnosis. Recent advancements in machine learning (ML) offer promising approaches for addressing these challenges by enabling more precise and efficient PD predictions. This paper proposes an enhanced ML framework for PD prediction, integrating data balancing, feature selection, and explainable AI techniques. We evaluate nine different ML algorithms using a dataset of clinical and voice features. To address the class imbalance, we employ the Synthetic Minority Oversampling Technique (SMOTE) and NearMiss, comparing results to an imbalanced baseline. Feature engineering approaches, including Featurewiz, Tree based Feature Importance and the chi-square test, are utilized to identify key predictive features such as Pitch Period Entropy (PPE), Noise-to-Harmonic Ratio (NHR), and other voice biomarkers. Explainable AI (XAI) techniques (SHAP and LIME) interpret model decision-making and highlight influential features. The best-performing model, KNN with SMOTE, achieved 92% accuracy, F1-score 0.94, and a G-Mean of 0.95—demonstrating balanced, reliable PD detection. While some models achieved higher accuracy on imbalanced data (up to 97%), their performance lacked sensitivity and balance. Our findings suggest that combining SMOTE with feature engineering and XAI substantially enhances model fairness, performance, and interpretability. This research advances PD prediction by providing an accurate and interpretable ML-based diagnostic tool to support early diagnosis and better patient management.
AB - Parkinson’s disease (PD) is a progressive neurological disorder that affects millions globally, posing significant challenges in early and accurate diagnosis. Recent advancements in machine learning (ML) offer promising approaches for addressing these challenges by enabling more precise and efficient PD predictions. This paper proposes an enhanced ML framework for PD prediction, integrating data balancing, feature selection, and explainable AI techniques. We evaluate nine different ML algorithms using a dataset of clinical and voice features. To address the class imbalance, we employ the Synthetic Minority Oversampling Technique (SMOTE) and NearMiss, comparing results to an imbalanced baseline. Feature engineering approaches, including Featurewiz, Tree based Feature Importance and the chi-square test, are utilized to identify key predictive features such as Pitch Period Entropy (PPE), Noise-to-Harmonic Ratio (NHR), and other voice biomarkers. Explainable AI (XAI) techniques (SHAP and LIME) interpret model decision-making and highlight influential features. The best-performing model, KNN with SMOTE, achieved 92% accuracy, F1-score 0.94, and a G-Mean of 0.95—demonstrating balanced, reliable PD detection. While some models achieved higher accuracy on imbalanced data (up to 97%), their performance lacked sensitivity and balance. Our findings suggest that combining SMOTE with feature engineering and XAI substantially enhances model fairness, performance, and interpretability. This research advances PD prediction by providing an accurate and interpretable ML-based diagnostic tool to support early diagnosis and better patient management.
KW - Algorithms
KW - Humans
KW - Machine Learning
KW - Male
KW - Parkinson Disease/diagnosis
UR - https://www.scopus.com/pages/publications/105020651497
U2 - 10.1371/journal.pone.0333418
DO - 10.1371/journal.pone.0333418
M3 - Article
C2 - 41171887
SN - 1932-6203
VL - 20
JO - PLoS ONE
JF - PLoS ONE
IS - 10
M1 - e0333418
ER -