Adaptive fault tolerance mechanisms for ensuring high availability of digital twins in distributed edge computing systems

  • Dinesh Sahu
  • , Nidhi
  • , Shiv Prakash*
  • , Tiansheng Yang
  • , Rajkumar Singh Rathore*
  • , Lu Wang
  • , Usha Sharma
  • , Idrees Alsolbi*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The increasing adoption of Digital Twins (DTs) in distributed edge computing systems necessitates robust fault tolerance mechanisms to ensure high availability and reliability. This paper presents an adaptive fault tolerance framework designed to maintain the continuous operation of DTs in dynamic and resource-constrained edge environments. The primary objective is to mitigate failures at edge nodes, minimize downtime, and ensure seamless migration of DT instances without disrupting system performance. The proposed framework integrates a novel Hybrid Genetic-PSO for Adaptive Fault Tolerance (HGPAFT) algorithm, combining the strengths of genetic algorithms and particle swarm optimization. The algorithm dynamically reallocates resources and migrates DT instances in response to node failures, utilizing real-time monitoring and predictive failure detection to enhance system resilience. A key innovation lies in the adaptive nature of the fault tolerance mechanisms, which adjust resource reallocation and task migration strategies based on the evolving conditions of the edge network, such as node load, energy constraints, and communication delays. The results, validated through extensive simulations, demonstrate significant improvements in system availability, with recovery probabilities exceeding 98% and up to 20% reductions in reallocation and migration costs compared to traditional fault tolerance mechanisms. Additionally, the proposed framework optimizes energy consumption and resource utilization, critical for sustainable edge computing. This research contributes to the state of the art by offering a scalable and energy-efficient fault tolerance solution tailored for the decentralized and heterogeneous nature of distributed edge computing, ensuring the continuous and reliable operation of Digital Twins.

Original languageEnglish
Article number41676
JournalScientific Reports
Volume15
Issue number1
DOIs
Publication statusPublished - 24 Nov 2025

Keywords

  • Adaptive fault tolerance
  • Digital twins
  • Distributed edge computing
  • Energy-efficient computing
  • High availability
  • Hybrid genetic-PSO algorithm
  • Node failure recovery
  • Resource reallocation
  • System resilience
  • Task migration

Cite this