Impact of Dataset Quality on Deep Learning Models for Dragon Fruit and Leaf Health Classification

  • Shahnawaz Ayoub
  • , Imran Baig
  • , Mudasir Ashraf
  • , Mahmoud Okasha

Research output: Contribution to journalArticlepeer-review

Abstract

Accurate assessment of fruit and leaf health is essential for early disease detection, quality grading, and automated management in commercial dragon fruit production. Variability in illumination, symptom intensity, and morphological features often limits the reliability of conventional machine learning models trained on raw datasets. This study evaluates the effect of dataset quality on deep learning performance using a publicly available dragon fruit and leaf dataset containing 4,518 images across four classes: Healthy Fruit, Healthy Leaves, Infected Fruits, and Infected Leaves. Three dataset versions were constructed (i) the original dataset, (ii) an augmented dataset expanding each image threefold, and (iii) a cleaned augmented dataset created by removing mislabeled, ambiguous, or low-quality samples. Four deep architectures (MobileNetV3, InceptionV3, ResNet101, and VGG16) were trained under identical settings to assess classification performance. Across all models, the cleaned augmented dataset produced the most stable training behavior and highest accuracy. InceptionV3 achieved the strongest overall performance with an F1-score above 0.95 and validation accuracy approaching 0.97, while MobileNetV3 delivered competitive results (accuracy 0.9613) with minimal computational cost. Confusion matrices confirmed major reductions in fruit–fruit and leaf–leaf misclassification after dataset cleaning. The findings highlight that targeted data refinement, combined with augmentation, is critical for building reliable deep learning models for real-world agricultural applications.
Original languageEnglish
Pages (from-to)1-16
Number of pages16
JournalImpact in Agriculture
Volume1
Early online date13 Oct 2025
DOIs
Publication statusPublished - 13 Oct 2025

Cite this