Randomization helps to perform tasks on processors prone to failures

Bogdan S. Chlebus; Dariusz R. Kowalski

Randomization helps to perform tasks on processors prone to failures

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

The problem of performing t tasks in a distributed system of p processors is studied. The tasks are assumed to be independent, similar (each takes one stepto be completed), and idempotent (can be performed many times and concurrently). The processors communicate by passing messages and each of them may fail. This problem is usually called do-all, it was introduced by Dwork, Halpern and Waarts. The distributed setting considered in this paper is as follows: The system is synchronous, the processors fail by stopping, reliable multicast is available. The occurrence of faults is modeled by an adversary who has to choose at least c · p processors prior to the start of the computation, for a fixed constant 0 < c < 1, must not fail the selected processors but may fail any of the remaining processors at any time. The main result is showing that there is a sharpdi fference between the expected performance of randomized algorithms versus the worst-case deterministic performance of algorithms solving the do-all problem in such a setting. Performance is measured in terms of work and communication of algorithms. Work is the total number of steps performed by all the processors while they are operational, including idling. Communication is the total number of point-to-point messages exchanged. Let effort be the sum of work and communication. A randomized algorithm is developed which has the expected effort O(t + p (1 + log* p − log*(p/t))), where log* is the number of iterations of the log function required to go with the value of function down to 1. For deterministic algorithms and their worst-case behavior, a lower bound Ω(t+p log t/ log log t) on work holds, and it is matched by the work performed by a simple algorithm.

Original language	English (US)
Title of host publication	Distributed Computing - 13th International Symposium, DISC 1999, Proceedings
Editors	Prasad Jayanti
Publisher	Springer Verlag
Pages	284-296
Number of pages	13
ISBN (Print)	3540665315, 9783540665311
State	Published - 1999
Externally published	Yes
Event	13th International Symposium on Distributed Computing, DISC 1999 - Bratislava, Slovakia Duration: Sep 27 1999 → Sep 29 1999

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	1693
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	13th International Symposium on Distributed Computing, DISC 1999
Country/Territory	Slovakia
City	Bratislava
Period	9/27/99 → 9/29/99

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Cite this

Chlebus, B. S., & Kowalski, D. R. (1999). Randomization helps to perform tasks on processors prone to failures. In P. Jayanti (Ed.), Distributed Computing - 13th International Symposium, DISC 1999, Proceedings (pp. 284-296). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1693). Springer Verlag.

Randomization helps to perform tasks on processors prone to failures. / Chlebus, Bogdan S.; Kowalski, Dariusz R.
Distributed Computing - 13th International Symposium, DISC 1999, Proceedings. ed. / Prasad Jayanti. Springer Verlag, 1999. p. 284-296 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1693).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Chlebus, BS & Kowalski, DR 1999, Randomization helps to perform tasks on processors prone to failures. in P Jayanti (ed.), Distributed Computing - 13th International Symposium, DISC 1999, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1693, Springer Verlag, pp. 284-296, 13th International Symposium on Distributed Computing, DISC 1999, Bratislava, Slovakia, 9/27/99.

Chlebus, Bogdan S. ; Kowalski, Dariusz R. / Randomization helps to perform tasks on processors prone to failures. Distributed Computing - 13th International Symposium, DISC 1999, Proceedings. editor / Prasad Jayanti. Springer Verlag, 1999. pp. 284-296 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{92721b906ab347478b04d0d98edce9ae,

title = "Randomization helps to perform tasks on processors prone to failures",

abstract = "The problem of performing t tasks in a distributed system of p processors is studied. The tasks are assumed to be independent, similar (each takes one stepto be completed), and idempotent (can be performed many times and concurrently). The processors communicate by passing messages and each of them may fail. This problem is usually called do-all, it was introduced by Dwork, Halpern and Waarts. The distributed setting considered in this paper is as follows: The system is synchronous, the processors fail by stopping, reliable multicast is available. The occurrence of faults is modeled by an adversary who has to choose at least c · p processors prior to the start of the computation, for a fixed constant 0 < c < 1, must not fail the selected processors but may fail any of the remaining processors at any time. The main result is showing that there is a sharpdi fference between the expected performance of randomized algorithms versus the worst-case deterministic performance of algorithms solving the do-all problem in such a setting. Performance is measured in terms of work and communication of algorithms. Work is the total number of steps performed by all the processors while they are operational, including idling. Communication is the total number of point-to-point messages exchanged. Let effort be the sum of work and communication. A randomized algorithm is developed which has the expected effort O(t + p (1 + log* p − log*(p/t))), where log* is the number of iterations of the log function required to go with the value of function down to 1. For deterministic algorithms and their worst-case behavior, a lower bound Ω(t+p log t/ log log t) on work holds, and it is matched by the work performed by a simple algorithm.",

author = "Chlebus, {Bogdan S.} and Kowalski, {Dariusz R.}",

note = "Publisher Copyright: {\textcopyright} Springer-Verlag Berlin Heidelberg 1999.; 13th International Symposium on Distributed Computing, DISC 1999 ; Conference date: 27-09-1999 Through 29-09-1999",

year = "1999",

language = "English (US)",

isbn = "3540665315",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "284--296",

editor = "Prasad Jayanti",

booktitle = "Distributed Computing - 13th International Symposium, DISC 1999, Proceedings",

}

TY - GEN

T1 - Randomization helps to perform tasks on processors prone to failures

AU - Chlebus, Bogdan S.

AU - Kowalski, Dariusz R.

N1 - Publisher Copyright: © Springer-Verlag Berlin Heidelberg 1999.

PY - 1999

Y1 - 1999

N2 - The problem of performing t tasks in a distributed system of p processors is studied. The tasks are assumed to be independent, similar (each takes one stepto be completed), and idempotent (can be performed many times and concurrently). The processors communicate by passing messages and each of them may fail. This problem is usually called do-all, it was introduced by Dwork, Halpern and Waarts. The distributed setting considered in this paper is as follows: The system is synchronous, the processors fail by stopping, reliable multicast is available. The occurrence of faults is modeled by an adversary who has to choose at least c · p processors prior to the start of the computation, for a fixed constant 0 < c < 1, must not fail the selected processors but may fail any of the remaining processors at any time. The main result is showing that there is a sharpdi fference between the expected performance of randomized algorithms versus the worst-case deterministic performance of algorithms solving the do-all problem in such a setting. Performance is measured in terms of work and communication of algorithms. Work is the total number of steps performed by all the processors while they are operational, including idling. Communication is the total number of point-to-point messages exchanged. Let effort be the sum of work and communication. A randomized algorithm is developed which has the expected effort O(t + p (1 + log* p − log*(p/t))), where log* is the number of iterations of the log function required to go with the value of function down to 1. For deterministic algorithms and their worst-case behavior, a lower bound Ω(t+p log t/ log log t) on work holds, and it is matched by the work performed by a simple algorithm.

AB - The problem of performing t tasks in a distributed system of p processors is studied. The tasks are assumed to be independent, similar (each takes one stepto be completed), and idempotent (can be performed many times and concurrently). The processors communicate by passing messages and each of them may fail. This problem is usually called do-all, it was introduced by Dwork, Halpern and Waarts. The distributed setting considered in this paper is as follows: The system is synchronous, the processors fail by stopping, reliable multicast is available. The occurrence of faults is modeled by an adversary who has to choose at least c · p processors prior to the start of the computation, for a fixed constant 0 < c < 1, must not fail the selected processors but may fail any of the remaining processors at any time. The main result is showing that there is a sharpdi fference between the expected performance of randomized algorithms versus the worst-case deterministic performance of algorithms solving the do-all problem in such a setting. Performance is measured in terms of work and communication of algorithms. Work is the total number of steps performed by all the processors while they are operational, including idling. Communication is the total number of point-to-point messages exchanged. Let effort be the sum of work and communication. A randomized algorithm is developed which has the expected effort O(t + p (1 + log* p − log*(p/t))), where log* is the number of iterations of the log function required to go with the value of function down to 1. For deterministic algorithms and their worst-case behavior, a lower bound Ω(t+p log t/ log log t) on work holds, and it is matched by the work performed by a simple algorithm.

UR - http://www.scopus.com/inward/record.url?scp=84947932797&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84947932797&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84947932797

SN - 3540665315

SN - 9783540665311

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 284

EP - 296

BT - Distributed Computing - 13th International Symposium, DISC 1999, Proceedings

A2 - Jayanti, Prasad

PB - Springer Verlag

T2 - 13th International Symposium on Distributed Computing, DISC 1999

Y2 - 27 September 1999 through 29 September 1999

ER -

Randomization helps to perform tasks on processors prone to failures

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this