TY - JOUR
T1 - Detecting Termination by Weight-Throwing in a Faulty Distributed System
AU - Tseng, Yu-Chee
PY - 1995/1/1
Y1 - 1995/1/1
N2 - This paper presents a fault-tolerant termination detection algorithm for a distributed system in which processes tend to fail. Allowing an arbitrary number of processes to have fail-stop behavior, the algorithm can detect termination efficiently with O(M + kn + n) control messages and O(k + 1) detection delays, where M is the number of basic messages issued, n is the number of processes, and k is the actual number of processes that fail. This algorithm has fewer detection delays than existing algorithms in the literature and comparable performance in terms of message complexity. In particular, when no fault occurs, the algorithm has constant detection delay and it uses, in the worst case, an optimal number of messages.
AB - This paper presents a fault-tolerant termination detection algorithm for a distributed system in which processes tend to fail. Allowing an arbitrary number of processes to have fail-stop behavior, the algorithm can detect termination efficiently with O(M + kn + n) control messages and O(k + 1) detection delays, where M is the number of basic messages issued, n is the number of processes, and k is the actual number of processes that fail. This algorithm has fewer detection delays than existing algorithms in the literature and comparable performance in terms of message complexity. In particular, when no fault occurs, the algorithm has constant detection delay and it uses, in the worst case, an optimal number of messages.
UR - http://www.scopus.com/inward/record.url?scp=0004722897&partnerID=8YFLogxK
U2 - 10.1006/jpdc.1995.1025
DO - 10.1006/jpdc.1995.1025
M3 - Article
AN - SCOPUS:0004722897
VL - 25
SP - 7
EP - 15
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
SN - 0743-7315
IS - 1
ER -