TY - JOUR
T1 - Adaptive fault tolerance mechanisms for ensuring high availability of digital twins in distributed edge computing systems
AU - Sahu, Dinesh
AU - Nidhi,
AU - Prakash, Shiv
AU - Yang, Tiansheng
AU - Rathore, Rajkumar Singh
AU - Wang, Lu
AU - Sharma, Usha
AU - Alsolbi, Idrees
N1 - © 2025. The Author(s).
PY - 2025/11/24
Y1 - 2025/11/24
N2 - The increasing adoption of Digital Twins (DTs) in distributed edge computing systems necessitates robust fault tolerance mechanisms to ensure high availability and reliability. This paper presents an adaptive fault tolerance framework designed to maintain the continuous operation of DTs in dynamic and resource-constrained edge environments. The primary objective is to mitigate failures at edge nodes, minimize downtime, and ensure seamless migration of DT instances without disrupting system performance. The proposed framework integrates a novel Hybrid Genetic-PSO for Adaptive Fault Tolerance (HGPAFT) algorithm, combining the strengths of genetic algorithms and particle swarm optimization. The algorithm dynamically reallocates resources and migrates DT instances in response to node failures, utilizing real-time monitoring and predictive failure detection to enhance system resilience. A key innovation lies in the adaptive nature of the fault tolerance mechanisms, which adjust resource reallocation and task migration strategies based on the evolving conditions of the edge network, such as node load, energy constraints, and communication delays. The results, validated through extensive simulations, demonstrate significant improvements in system availability, with recovery probabilities exceeding 98% and up to 20% reductions in reallocation and migration costs compared to traditional fault tolerance mechanisms. Additionally, the proposed framework optimizes energy consumption and resource utilization, critical for sustainable edge computing. This research contributes to the state of the art by offering a scalable and energy-efficient fault tolerance solution tailored for the decentralized and heterogeneous nature of distributed edge computing, ensuring the continuous and reliable operation of Digital Twins.
AB - The increasing adoption of Digital Twins (DTs) in distributed edge computing systems necessitates robust fault tolerance mechanisms to ensure high availability and reliability. This paper presents an adaptive fault tolerance framework designed to maintain the continuous operation of DTs in dynamic and resource-constrained edge environments. The primary objective is to mitigate failures at edge nodes, minimize downtime, and ensure seamless migration of DT instances without disrupting system performance. The proposed framework integrates a novel Hybrid Genetic-PSO for Adaptive Fault Tolerance (HGPAFT) algorithm, combining the strengths of genetic algorithms and particle swarm optimization. The algorithm dynamically reallocates resources and migrates DT instances in response to node failures, utilizing real-time monitoring and predictive failure detection to enhance system resilience. A key innovation lies in the adaptive nature of the fault tolerance mechanisms, which adjust resource reallocation and task migration strategies based on the evolving conditions of the edge network, such as node load, energy constraints, and communication delays. The results, validated through extensive simulations, demonstrate significant improvements in system availability, with recovery probabilities exceeding 98% and up to 20% reductions in reallocation and migration costs compared to traditional fault tolerance mechanisms. Additionally, the proposed framework optimizes energy consumption and resource utilization, critical for sustainable edge computing. This research contributes to the state of the art by offering a scalable and energy-efficient fault tolerance solution tailored for the decentralized and heterogeneous nature of distributed edge computing, ensuring the continuous and reliable operation of Digital Twins.
KW - Adaptive fault tolerance
KW - Digital twins
KW - Distributed edge computing
KW - Energy-efficient computing
KW - High availability
KW - Hybrid genetic-PSO algorithm
KW - Node failure recovery
KW - Resource reallocation
KW - System resilience
KW - Task migration
UR - https://www.scopus.com/pages/publications/105022761964
U2 - 10.1038/s41598-025-25590-4
DO - 10.1038/s41598-025-25590-4
M3 - Article
C2 - 41286220
AN - SCOPUS:105022761964
SN - 2045-2322
VL - 15
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 41676
ER -