Robust network supercomputing with unreliable workers

Kishori M. Konwar, Sanguthevar Rajasekaran, Alexander A. Shvartsman

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Internet supercomputing is becoming a powerful tool for harnessing massive amounts of computational resources. However in typical master-worker settings the correctness of the results of the computation crucially relies on the ability of the master to depend on the computation performed by the workers. We consider a distributed system consisting of a master process and a collection of synchronous worker processes that can execute tasks on behalf of the master and that may act nefariously by deliberately returning fallacious results. The master decides on the correctness of the results by assigning the same task to several workers. For such a setting with n processes and t tasks we study the problem of collectively performing the tasks under two different failure models: model Fa, where some fraction f of workers that are prone to returning arbitrary (e.g., incorrect) results and the probability p of such faulty behavior are not known a priori to the master, and model Fb when these quantities are known to the master. Previous works assume that the number of faulty processes or the probability of a process acting maliciously is known to the master, e.g., as in model Fb. In this paper this assumption is removed in model Fa. First, for model Fa we provide an efficient algorithm-based on the Stopping Rule Algorithm by Dagum et al. (1995) - that can estimate f and p with (ε,δ)-approximation, for any 0<δ<1 and ε>0. We also provide a randomized algorithm for detecting the faulty processes for model Fa. Finally, we provide algorithms to perform t tasks, with n workers, in models Fa and Fb.

Original languageEnglish (US)
Pages (from-to)81-92
Number of pages12
JournalJournal of Parallel and Distributed Computing
Volume75
DOIs
StatePublished - 2015
Externally publishedYes

Keywords

  • Distributed algorithms
  • Fault-tolerance
  • Internet supercomputing
  • Randomized algorithms
  • Reliability

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Robust network supercomputing with unreliable workers'. Together they form a unique fingerprint.

Cite this