## Abstract

Internet supercomputing is becoming a powerful tool for harnessing massive amounts of computational resources. However in typical master-worker settings the reliability of computation crucially depends on the ability of the master to depend on the computation performed by the workers. Fernandez, Georgiou, Lopez, and Santos [12,13] considered a system consisting of a master process and a collection of worker processes that can execute tasks on behalf of the master and that may act maliciously by deliberately returning fallacious results. The master decides on the correctness of the results by assigning the same task to several workers. The master is charged one work unit for each task performed by a worker. The goal is to design an algorithm that enables the master to determine the correct result with high probability, and at the least possible cost. Fernandez et al. assume that the number of faulty processes or the probability of a process acting maliciously is known to the master. In this paper this assumption is removed. In the setting with n processes and n tasks we consider two different failure models, viz., model F_{a}, where f-fraction, 0 < f < 5, of the workers provide faulty results with probability 0 < p < 5, given that the master has no a priori knowledge of the values of p and f; and model F_{b}, where at most f-fraction, 0 < f < 5, of the workers can reply with arbitrary results and the rest reply with incorrect results with probability p, 0 < p < 1/2, but the master knows the values of f and p. For model F_{a}, we provide an algorithm-based on the Stopping Rule Algorithm by Dagum, Karp, Luby, and Ross [10] - that can estimate f and p with (ε, δ-approximation, for any 0 < δ < 1 and ε > 0. This algorithm runs in O(log n) time, O(log^{2}n) message complexity, and O(log^{2} n) task-oriented work and O(n log n) total-work complexities. We also provide a randomized algorithm for detecting the faulty processes, i.e., identifying the processes that have non-zero probability of failures in model F_{a}, with task-oriented work O(n), and time O(log n). A lower bound on the total-work complexity of performing n tasks correctly with high probability is shown. Finally, two randomized algorithms to perform n tasks with high probability are given for both failure models with closely matching upper bounds on total-work and task-oriented work complexities, and time O(log n).

Original language | English (US) |
---|---|

Title of host publication | Distributed Computing - 20th International Symposium, DISC 2006, Proceedings |

Publisher | Springer Verlag |

Pages | 474-488 |

Number of pages | 15 |

ISBN (Print) | 3540446249, 9783540446248 |

DOIs | |

State | Published - 2006 |

Externally published | Yes |

Event | 20th International Symposium on Distributed Computing, DISC 2006 - Stockholm, Sweden Duration: Sep 18 2006 → Sep 20 2006 |

### Publication series

Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|

Volume | 4167 LNCS |

ISSN (Print) | 0302-9743 |

ISSN (Electronic) | 1611-3349 |

### Conference

Conference | 20th International Symposium on Distributed Computing, DISC 2006 |
---|---|

Country/Territory | Sweden |

City | Stockholm |

Period | 9/18/06 → 9/20/06 |

## Keywords

- Distributed algorithms
- Fault-tolerance
- Internet supercomputing
- Randomized algorithms
- Reliability

## ASJC Scopus subject areas

- Theoretical Computer Science
- General Computer Science