Abstract
With the end of Moore's Law in sight, parallelism became the main means for speeding up computationally intensive applications, especially in the cases where large collections of tasks need to be performed. Network supercomputing - taking advantage of very large numbers of computers in a distributed environment is an effective approach to massive parallelism that harnesses the processing power inherent in large networked settings. In such settings, processor failures are no longer an exception, but the norm. Any algorithm designed for realistic settings must be able to deal with failures. This paper presents a new message-passing algorithm for distributed cooperative work in synchronous settings where processors may crash, and where any broadcasts performed by crashing processors are unreliable. We specify the algorithm, prove that it is correct, and perform extensive simulations that show that its performance is close to similar algorithms that use reliable broadcast, and that its work compares favorably to the relevant lower bounds.
Original language | English (US) |
---|---|
Pages | 17-26 |
Number of pages | 10 |
DOIs | |
State | Published - 2014 |
Externally published | Yes |
Event | 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2014 - Turin, Italy Duration: Feb 12 2014 → Feb 14 2014 |
Conference
Conference | 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2014 |
---|---|
Country/Territory | Italy |
City | Turin |
Period | 2/12/14 → 2/14/14 |
Keywords
- distributed algorithms
- fault-tolerance
- processor crashes
- task computing
- unreliable broadcast
ASJC Scopus subject areas
- Computer Networks and Communications
- Software