Abstract
Accurate assessment of fruit and leaf health is essential for early disease detection, quality grading, and automated management in commercial dragon fruit production. Variability in illumination, symptom intensity, and morphological features often limits the reliability of conventional machine learning models trained on raw datasets. This study evaluates the effect of dataset quality on deep learning performance using a publicly available dragon fruit and leaf dataset containing 4,518 images across four classes: Healthy Fruit, Healthy Leaves, Infected Fruits, and Infected Leaves. Three dataset versions were constructed (i) the original dataset, (ii) an augmented dataset expanding each image threefold, and (iii) a cleaned augmented dataset created by removing mislabeled, ambiguous, or low-quality samples. Four deep architectures (MobileNetV3, InceptionV3, ResNet101, and VGG16) were trained under identical settings to assess classification performance. Across all models, the cleaned augmented dataset produced the most stable training behavior and highest accuracy. InceptionV3 achieved the strongest overall performance with an F1-score above 0.95 and validation accuracy approaching 0.97, while MobileNetV3 delivered competitive results (accuracy 0.9613) with minimal computational cost. Confusion matrices confirmed major reductions in fruit–fruit and leaf–leaf misclassification after dataset cleaning. The findings highlight that targeted data refinement, combined with augmentation, is critical for building reliable deep learning models for real-world agricultural applications.
| Original language | English |
|---|---|
| Pages (from-to) | 1-16 |
| Number of pages | 16 |
| Journal | Impact in Agriculture |
| Volume | 1 |
| Early online date | 13 Oct 2025 |
| DOIs | |
| Publication status | Published - 13 Oct 2025 |