Robust network supercomputing without centralized control

Seda Davtyan; Kishori M. Konwar; Alexander A. Shvartsman

doi:10.1007/978-3-642-25873-2_30

Robust network supercomputing without centralized control

Seda Davtyan, Kishori M. Konwar, Alexander A. Shvartsman

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

6 Scopus citations

Abstract

Internet supercomputing provides means for harnessing the power of a vast number of interconnected computers. With this come the challenges of marshaling distributed resources and dealing with failures. Traditional centralized approaches employ a master processor and many worker processors that execute a collection of tasks on behalf of the master. Despite the simplicity and advantages of centralized schemes, the master processor is a performance bottleneck and a single point of failure. Additionally, a phenomenon of increasing concern is that workers may return incorrect results, e.g., due to unintended failures, over-clocked processors, or due to workers claiming to have performed work to obtain a high rank in the system. This paper develops an original approach that eliminates the master and instead uses a decentralized algorithm, where workers cooperate in performing tasks. The failure model assumes that the average probability of a worker returning a wrong result is inferior to 1/2. We present a randomized synchronous algorithm for n processors and t tasks (t ≥ n) achieving time complexity Θ(t/n log n) and work Θ(t log n). It is shown that upon termination the workers know the results of all tasks with high probability, and that these results are correct with high probability. The message complexity of the algorithm is Θ(n log n), and the bit complexity is O(tn log ³n). Simulations illustrate the behavior of the algorithm under realistic assumptions.

Original language	English (US)
Title of host publication	Principles of Distributed Systems - 15th International Conference, OPODIS 2011, Proceedings
Pages	435-450
Number of pages	16
DOIs	https://doi.org/10.1007/978-3-642-25873-2_30
State	Published - 2011
Externally published	Yes
Event	15th International Conference on Principles of Distributed Systems, OPODIS 2011 - Toulouse, France Duration: Dec 13 2011 → Dec 16 2011

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	7109 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	15th International Conference on Principles of Distributed Systems, OPODIS 2011
Country/Territory	France
City	Toulouse
Period	12/13/11 → 12/16/11

Keywords

Distributed Algorithms
Fault-Tolerance
Internet Supercomputing

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/978-3-642-25873-2_30

Cite this

Davtyan, S., Konwar, K. M., & Shvartsman, A. A. (2011). Robust network supercomputing without centralized control. In Principles of Distributed Systems - 15th International Conference, OPODIS 2011, Proceedings (pp. 435-450). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7109 LNCS). https://doi.org/10.1007/978-3-642-25873-2_30

Robust network supercomputing without centralized control. / Davtyan, Seda; Konwar, Kishori M.; Shvartsman, Alexander A.
Principles of Distributed Systems - 15th International Conference, OPODIS 2011, Proceedings. 2011. p. 435-450 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7109 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Davtyan, S, Konwar, KM & Shvartsman, AA 2011, Robust network supercomputing without centralized control. in Principles of Distributed Systems - 15th International Conference, OPODIS 2011, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7109 LNCS, pp. 435-450, 15th International Conference on Principles of Distributed Systems, OPODIS 2011, Toulouse, France, 12/13/11. https://doi.org/10.1007/978-3-642-25873-2_30

Davtyan S, Konwar KM, Shvartsman AA. Robust network supercomputing without centralized control. In Principles of Distributed Systems - 15th International Conference, OPODIS 2011, Proceedings. 2011. p. 435-450. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-642-25873-2_30

Davtyan, Seda ; Konwar, Kishori M. ; Shvartsman, Alexander A. / Robust network supercomputing without centralized control. Principles of Distributed Systems - 15th International Conference, OPODIS 2011, Proceedings. 2011. pp. 435-450 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{f4ca9a5bbcaf4f3bb5fac25e96c0c72b,

title = "Robust network supercomputing without centralized control",

abstract = "Internet supercomputing provides means for harnessing the power of a vast number of interconnected computers. With this come the challenges of marshaling distributed resources and dealing with failures. Traditional centralized approaches employ a master processor and many worker processors that execute a collection of tasks on behalf of the master. Despite the simplicity and advantages of centralized schemes, the master processor is a performance bottleneck and a single point of failure. Additionally, a phenomenon of increasing concern is that workers may return incorrect results, e.g., due to unintended failures, over-clocked processors, or due to workers claiming to have performed work to obtain a high rank in the system. This paper develops an original approach that eliminates the master and instead uses a decentralized algorithm, where workers cooperate in performing tasks. The failure model assumes that the average probability of a worker returning a wrong result is inferior to 1/2. We present a randomized synchronous algorithm for n processors and t tasks (t ≥ n) achieving time complexity Θ(t/n log n) and work Θ(t log n). It is shown that upon termination the workers know the results of all tasks with high probability, and that these results are correct with high probability. The message complexity of the algorithm is Θ(n log n), and the bit complexity is O(tn log 3n). Simulations illustrate the behavior of the algorithm under realistic assumptions.",

keywords = "Distributed Algorithms, Fault-Tolerance, Internet Supercomputing",

author = "Seda Davtyan and Konwar, {Kishori M.} and Shvartsman, {Alexander A.}",

note = "Funding Information: This work is supported in part by the NSF award 1017232.; 15th International Conference on Principles of Distributed Systems, OPODIS 2011 ; Conference date: 13-12-2011 Through 16-12-2011",

year = "2011",

doi = "10.1007/978-3-642-25873-2_30",

language = "English (US)",

isbn = "9783642258725",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "435--450",

booktitle = "Principles of Distributed Systems - 15th International Conference, OPODIS 2011, Proceedings",

}

TY - GEN

T1 - Robust network supercomputing without centralized control

AU - Davtyan, Seda

AU - Konwar, Kishori M.

AU - Shvartsman, Alexander A.

N1 - Funding Information: This work is supported in part by the NSF award 1017232.

PY - 2011

Y1 - 2011

N2 - Internet supercomputing provides means for harnessing the power of a vast number of interconnected computers. With this come the challenges of marshaling distributed resources and dealing with failures. Traditional centralized approaches employ a master processor and many worker processors that execute a collection of tasks on behalf of the master. Despite the simplicity and advantages of centralized schemes, the master processor is a performance bottleneck and a single point of failure. Additionally, a phenomenon of increasing concern is that workers may return incorrect results, e.g., due to unintended failures, over-clocked processors, or due to workers claiming to have performed work to obtain a high rank in the system. This paper develops an original approach that eliminates the master and instead uses a decentralized algorithm, where workers cooperate in performing tasks. The failure model assumes that the average probability of a worker returning a wrong result is inferior to 1/2. We present a randomized synchronous algorithm for n processors and t tasks (t ≥ n) achieving time complexity Θ(t/n log n) and work Θ(t log n). It is shown that upon termination the workers know the results of all tasks with high probability, and that these results are correct with high probability. The message complexity of the algorithm is Θ(n log n), and the bit complexity is O(tn log 3n). Simulations illustrate the behavior of the algorithm under realistic assumptions.

AB - Internet supercomputing provides means for harnessing the power of a vast number of interconnected computers. With this come the challenges of marshaling distributed resources and dealing with failures. Traditional centralized approaches employ a master processor and many worker processors that execute a collection of tasks on behalf of the master. Despite the simplicity and advantages of centralized schemes, the master processor is a performance bottleneck and a single point of failure. Additionally, a phenomenon of increasing concern is that workers may return incorrect results, e.g., due to unintended failures, over-clocked processors, or due to workers claiming to have performed work to obtain a high rank in the system. This paper develops an original approach that eliminates the master and instead uses a decentralized algorithm, where workers cooperate in performing tasks. The failure model assumes that the average probability of a worker returning a wrong result is inferior to 1/2. We present a randomized synchronous algorithm for n processors and t tasks (t ≥ n) achieving time complexity Θ(t/n log n) and work Θ(t log n). It is shown that upon termination the workers know the results of all tasks with high probability, and that these results are correct with high probability. The message complexity of the algorithm is Θ(n log n), and the bit complexity is O(tn log 3n). Simulations illustrate the behavior of the algorithm under realistic assumptions.

KW - Distributed Algorithms

KW - Fault-Tolerance

KW - Internet Supercomputing

UR - http://www.scopus.com/inward/record.url?scp=84055218390&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84055218390&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-25873-2_30

DO - 10.1007/978-3-642-25873-2_30

M3 - Conference contribution

AN - SCOPUS:84055218390

SN - 9783642258725

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 435

EP - 450

BT - Principles of Distributed Systems - 15th International Conference, OPODIS 2011, Proceedings

T2 - 15th International Conference on Principles of Distributed Systems, OPODIS 2011

Y2 - 13 December 2011 through 16 December 2011

ER -

Robust network supercomputing without centralized control

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this