Coordinated cooperative task computing using crash-prone processors with unreliable multicast

Seda Davtyan; Roberto De Prisco; Chryssis Georgiou; Theophanis Hadjistasi; Alexander Allister Schwarzmann

doi:10.1016/j.jpdc.2017.06.013

Coordinated cooperative task computing using crash-prone processors with unreliable multicast

Seda Davtyan, Roberto De Prisco, Chryssis Georgiou, Theophanis Hadjistasi, Alexander Allister Schwarzmann

School of Computer and Cyber Sciences

Research output: Contribution to journal › Article › peer-review

Abstract

This paper presents a new message-passing algorithm, called Do-UM, for distributed cooperative task computing in synchronous settings where processors may crash, and where any multicasts (or broadcasts) performed by crashing processors are unreliable. We specify the algorithm, prove its correctness and analyse its complexity. We show that its worst case available processor steps is S=Θt+n [Formula presented] +f(n−f) and that the number of messages sent is less than n2t+ [Formula presented], where n is the number of processors, t is the number of tasks to be executed and f is the number of failures. To assess the performance of the algorithm in practical scenarios, we perform an experimental evaluation on a planetary-scale distributed platform. This also allows us to compare our algorithm with the currently best algorithm that is, however, explicitly designed to use reliable multicast; the results suggest that our algorithm does not lose much efficiency in order to cope with unreliable multicast.

Original language	English (US)
Pages (from-to)	272-285
Number of pages	14
Journal	Journal of Parallel and Distributed Computing
Volume	109
DOIs	https://doi.org/10.1016/j.jpdc.2017.06.013
State	Published - 2017

Keywords

Crash faults
Fault-tolerant distributed algorithms
Task computing
Unreliable multicast

ASJC Scopus subject areas

Software
Theoretical Computer Science
Hardware and Architecture
Computer Networks and Communications
Artificial Intelligence

Access to Document

10.1016/j.jpdc.2017.06.013

https://dblp.org/db/journals/jpdc/jpdc109.html#DavtyanPGHS17

Cite this

@article{455021300ed2455a8c7ec5d1c472039b,

title = "Coordinated cooperative task computing using crash-prone processors with unreliable multicast",

abstract = "This paper presents a new message-passing algorithm, called Do-UM, for distributed cooperative task computing in synchronous settings where processors may crash, and where any multicasts (or broadcasts) performed by crashing processors are unreliable. We specify the algorithm, prove its correctness and analyse its complexity. We show that its worst case available processor steps is S=Θt+n [Formula presented] +f(n−f) and that the number of messages sent is less than n2t+ [Formula presented], where n is the number of processors, t is the number of tasks to be executed and f is the number of failures. To assess the performance of the algorithm in practical scenarios, we perform an experimental evaluation on a planetary-scale distributed platform. This also allows us to compare our algorithm with the currently best algorithm that is, however, explicitly designed to use reliable multicast; the results suggest that our algorithm does not lose much efficiency in order to cope with unreliable multicast.",

keywords = "Crash faults, Fault-tolerant distributed algorithms, Task computing, Unreliable multicast",

author = "Seda Davtyan and {De Prisco}, Roberto and Chryssis Georgiou and Theophanis Hadjistasi and Schwarzmann, {Alexander Allister}",

year = "2017",

doi = "10.1016/j.jpdc.2017.06.013",

language = "English (US)",

volume = "109",

pages = "272--285",

journal = "Journal of Parallel and Distributed Computing",

issn = "0743-7315",

publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Coordinated cooperative task computing using crash-prone processors with unreliable multicast

AU - Davtyan, Seda

AU - De Prisco, Roberto

AU - Georgiou, Chryssis

AU - Hadjistasi, Theophanis

AU - Schwarzmann, Alexander Allister

PY - 2017

Y1 - 2017

N2 - This paper presents a new message-passing algorithm, called Do-UM, for distributed cooperative task computing in synchronous settings where processors may crash, and where any multicasts (or broadcasts) performed by crashing processors are unreliable. We specify the algorithm, prove its correctness and analyse its complexity. We show that its worst case available processor steps is S=Θt+n [Formula presented] +f(n−f) and that the number of messages sent is less than n2t+ [Formula presented], where n is the number of processors, t is the number of tasks to be executed and f is the number of failures. To assess the performance of the algorithm in practical scenarios, we perform an experimental evaluation on a planetary-scale distributed platform. This also allows us to compare our algorithm with the currently best algorithm that is, however, explicitly designed to use reliable multicast; the results suggest that our algorithm does not lose much efficiency in order to cope with unreliable multicast.

AB - This paper presents a new message-passing algorithm, called Do-UM, for distributed cooperative task computing in synchronous settings where processors may crash, and where any multicasts (or broadcasts) performed by crashing processors are unreliable. We specify the algorithm, prove its correctness and analyse its complexity. We show that its worst case available processor steps is S=Θt+n [Formula presented] +f(n−f) and that the number of messages sent is less than n2t+ [Formula presented], where n is the number of processors, t is the number of tasks to be executed and f is the number of failures. To assess the performance of the algorithm in practical scenarios, we perform an experimental evaluation on a planetary-scale distributed platform. This also allows us to compare our algorithm with the currently best algorithm that is, however, explicitly designed to use reliable multicast; the results suggest that our algorithm does not lose much efficiency in order to cope with unreliable multicast.

KW - Crash faults

KW - Fault-tolerant distributed algorithms

KW - Task computing

KW - Unreliable multicast

UR - http://www.scopus.com/inward/record.url?scp=85024405331&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85024405331&partnerID=8YFLogxK

U2 - 10.1016/j.jpdc.2017.06.013

DO - 10.1016/j.jpdc.2017.06.013

M3 - Article

AN - SCOPUS:85024405331

SN - 0743-7315

VL - 109

SP - 272

EP - 285

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

ER -

Coordinated cooperative task computing using crash-prone processors with unreliable multicast

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this