TY - JOUR
T1 - Coordinated cooperative task computing using crash-prone processors with unreliable multicast
AU - Davtyan, Seda
AU - De Prisco, Roberto
AU - Georgiou, Chryssis
AU - Hadjistasi, Theophanis
AU - Schwarzmann, Alexander Allister
N1 - Publisher Copyright:
© 2017 Elsevier Inc.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2017
Y1 - 2017
N2 - This paper presents a new message-passing algorithm, called Do-UM, for distributed cooperative task computing in synchronous settings where processors may crash, and where any multicasts (or broadcasts) performed by crashing processors are unreliable. We specify the algorithm, prove its correctness and analyse its complexity. We show that its worst case available processor steps is S=Θt+n [Formula presented] +f(n−f) and that the number of messages sent is less than n2t+ [Formula presented], where n is the number of processors, t is the number of tasks to be executed and f is the number of failures. To assess the performance of the algorithm in practical scenarios, we perform an experimental evaluation on a planetary-scale distributed platform. This also allows us to compare our algorithm with the currently best algorithm that is, however, explicitly designed to use reliable multicast; the results suggest that our algorithm does not lose much efficiency in order to cope with unreliable multicast.
AB - This paper presents a new message-passing algorithm, called Do-UM, for distributed cooperative task computing in synchronous settings where processors may crash, and where any multicasts (or broadcasts) performed by crashing processors are unreliable. We specify the algorithm, prove its correctness and analyse its complexity. We show that its worst case available processor steps is S=Θt+n [Formula presented] +f(n−f) and that the number of messages sent is less than n2t+ [Formula presented], where n is the number of processors, t is the number of tasks to be executed and f is the number of failures. To assess the performance of the algorithm in practical scenarios, we perform an experimental evaluation on a planetary-scale distributed platform. This also allows us to compare our algorithm with the currently best algorithm that is, however, explicitly designed to use reliable multicast; the results suggest that our algorithm does not lose much efficiency in order to cope with unreliable multicast.
KW - Crash faults
KW - Fault-tolerant distributed algorithms
KW - Task computing
KW - Unreliable multicast
UR - http://www.scopus.com/inward/record.url?scp=85024405331&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85024405331&partnerID=8YFLogxK
U2 - 10.1016/j.jpdc.2017.06.013
DO - 10.1016/j.jpdc.2017.06.013
M3 - Article
AN - SCOPUS:85024405331
SN - 0743-7315
VL - 109
SP - 272
EP - 285
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
ER -