TY - JOUR
T1 - Spectro-Image Analysis with Vision Graph Neural Networks and Contrastive Learning for Parkinson’s Disease Detection
AU - Madusanka, Nuwan
AU - Malekroodi, Hadi Sedigh
AU - Herath, H. M. K. K. M. B.
AU - Hewage, Chaminda
AU - Yi, Myunggi
AU - Lee, Byeong-Il
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/7/2
Y1 - 2025/7/2
N2 - This study presents a novel framework that integrates Vision Graph Neural Networks (ViGs) with supervised contrastive learning for enhanced spectro-temporal image analysis of speech signals in Parkinson’s disease (PD) detection. The approach introduces a frequency band decomposition strategy that transforms raw audio into three complementary spectral representations, capturing distinct PD-specific characteristics across low-frequency (0–2 kHz), mid-frequency (2–6 kHz), and high-frequency (6 kHz+) bands. The framework processes mel multi-band spectro-temporal representations through a ViG architecture that models complex graph-based relationships between spectral and temporal components, trained using a supervised contrastive objective that learns discriminative representations distinguishing PD-affected from healthy speech patterns. Comprehensive experimental validation on multi-institutional datasets from Italy, Colombia, and Spain demonstrates that the proposed ViG-contrastive framework achieves superior classification performance, with the ViG-M-GELU architecture achieving 91.78% test accuracy. The integration of graph neural networks with contrastive learning enables effective learning from limited labeled data while capturing complex spectro-temporal relationships that traditional Convolution Neural Network (CNN) approaches miss, representing a promising direction for developing more accurate and clinically viable speech-based diagnostic tools for PD.
AB - This study presents a novel framework that integrates Vision Graph Neural Networks (ViGs) with supervised contrastive learning for enhanced spectro-temporal image analysis of speech signals in Parkinson’s disease (PD) detection. The approach introduces a frequency band decomposition strategy that transforms raw audio into three complementary spectral representations, capturing distinct PD-specific characteristics across low-frequency (0–2 kHz), mid-frequency (2–6 kHz), and high-frequency (6 kHz+) bands. The framework processes mel multi-band spectro-temporal representations through a ViG architecture that models complex graph-based relationships between spectral and temporal components, trained using a supervised contrastive objective that learns discriminative representations distinguishing PD-affected from healthy speech patterns. Comprehensive experimental validation on multi-institutional datasets from Italy, Colombia, and Spain demonstrates that the proposed ViG-contrastive framework achieves superior classification performance, with the ViG-M-GELU architecture achieving 91.78% test accuracy. The integration of graph neural networks with contrastive learning enables effective learning from limited labeled data while capturing complex spectro-temporal relationships that traditional Convolution Neural Network (CNN) approaches miss, representing a promising direction for developing more accurate and clinically viable speech-based diagnostic tools for PD.
KW - frequency band decomposition
KW - Parkinson’s disease
KW - spectro-temporal analysis
KW - speech analysis
KW - supervised contrastive learning
KW - Vision Graph Neural Networks
UR - https://www.scopus.com/pages/publications/105011417030
U2 - 10.3390/jimaging11070220
DO - 10.3390/jimaging11070220
M3 - Article
C2 - 40710607
SN - 2313-433X
VL - 11
JO - Journal of Imaging
JF - Journal of Imaging
IS - 7
M1 - 220
ER -