Abstract
This study presents a novel framework that integrates Vision Graph Neural Networks (ViGs) with supervised contrastive learning for enhanced spectro-temporal image analysis of speech signals in Parkinson’s disease (PD) detection. The approach introduces a frequency band decomposition strategy that transforms raw audio into three complementary spectral representations, capturing distinct PD-specific characteristics across low-frequency (0–2 kHz), mid-frequency (2–6 kHz), and high-frequency (6 kHz+) bands. The framework processes mel multi-band spectro-temporal representations through a ViG architecture that models complex graph-based relationships between spectral and temporal components, trained using a supervised contrastive objective that learns discriminative representations distinguishing PD-affected from healthy speech patterns. Comprehensive experimental validation on multi-institutional datasets from Italy, Colombia, and Spain demonstrates that the proposed ViG-contrastive framework achieves superior classification performance, with the ViG-M-GELU architecture achieving 91.78% test accuracy. The integration of graph neural networks with contrastive learning enables effective learning from limited labeled data while capturing complex spectro-temporal relationships that traditional Convolution Neural Network (CNN) approaches miss, representing a promising direction for developing more accurate and clinically viable speech-based diagnostic tools for PD.
| Original language | English |
|---|---|
| Article number | 220 |
| Journal | Journal of Imaging |
| Volume | 11 |
| Issue number | 7 |
| Early online date | 2 Jul 2025 |
| DOIs | |
| Publication status | Published - 2 Jul 2025 |
Keywords
- frequency band decomposition
- Parkinson’s disease
- spectro-temporal analysis
- speech analysis
- supervised contrastive learning
- Vision Graph Neural Networks
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver