The impossibility of boosting distributed service resilience

Paul Attie; Rachid Guerraoui; Petr Kuznetsov; Nancy Lynch; Sergio Rajsbaum

doi:10.1016/j.ic.2010.07.005

The impossibility of boosting distributed service resilience

Paul Attie, Rachid Guerraoui, Petr Kuznetsov, Nancy Lynch, Sergio Rajsbaum

Research output: Contribution to journal › Article › peer-review

Abstract

We study f-resilient services, which are guaranteed to operate as long as no more than f of the associated processes fail. We prove three theorems asserting the impossibility of boosting the resilience of such services. Our first theorem allows any connection pattern between processes and services but assumes these services to be atomic (linearizable) objects. This theorem says that no distributed system in which processes coordinate using f-resilient atomic objects and reliable registers can solve the consensus problem in the presence of f+1 undetectable process stopping failures. In contrast, we show that it is possible to boost the resilience of some systems solving problems easier than consensus: for example, the 2-set-consensus problem is solvable for 2n processes and 2n-1 failures (i.e., wait-free) using n-process consensus services resilient to n-1 failures (wait-free). Our proof is short and self-contained. We then introduce the larger class of failure-oblivious services. These are services that cannot use information about failures, although they may behave more flexibly than atomic objects. An example of such a service is totally ordered broadcast. Our second theorem generalizes the first theorem and its proof to failure-oblivious services. Our third theorem allows the system to contain failure-aware services, such as failure detectors, in addition to failure-oblivious services. This theorem requires that each failure-aware service be connected to all processes; thus, f+1 process failures overall can disable all the failure-aware services. In contrast, it is possible to boost the resilience of a system solving consensus using failure-aware services if arbitrary connection patterns between processes and services are allowed: consensus is solvable for any number of failures using only 1-resilient 2-process perfect failure detectors. As far as we know, this is the first time a unified framework has been used to describe both atomic and non-atomic objects, and the first time boosting analysis has been performed for services more general than atomic objects.

Original language	English (US)
Pages (from-to)	927-950
Number of pages	24
Journal	Information and Computation
Volume	209
Issue number	6
DOIs	https://doi.org/10.1016/j.ic.2010.07.005
State	Published - Jun 2011
Externally published	Yes

Keywords

Atomic objects
Boosting
Consensus
Distributed services
Failure detectors
I/O automata
Resilience

ASJC Scopus subject areas

Theoretical Computer Science
Information Systems
Computer Science Applications
Computational Theory and Mathematics

Access to Document

10.1016/j.ic.2010.07.005

Cite this

@article{8855213840fa47c4b1086763680eaf41,

title = "The impossibility of boosting distributed service resilience",

abstract = "We study f-resilient services, which are guaranteed to operate as long as no more than f of the associated processes fail. We prove three theorems asserting the impossibility of boosting the resilience of such services. Our first theorem allows any connection pattern between processes and services but assumes these services to be atomic (linearizable) objects. This theorem says that no distributed system in which processes coordinate using f-resilient atomic objects and reliable registers can solve the consensus problem in the presence of f+1 undetectable process stopping failures. In contrast, we show that it is possible to boost the resilience of some systems solving problems easier than consensus: for example, the 2-set-consensus problem is solvable for 2n processes and 2n-1 failures (i.e., wait-free) using n-process consensus services resilient to n-1 failures (wait-free). Our proof is short and self-contained. We then introduce the larger class of failure-oblivious services. These are services that cannot use information about failures, although they may behave more flexibly than atomic objects. An example of such a service is totally ordered broadcast. Our second theorem generalizes the first theorem and its proof to failure-oblivious services. Our third theorem allows the system to contain failure-aware services, such as failure detectors, in addition to failure-oblivious services. This theorem requires that each failure-aware service be connected to all processes; thus, f+1 process failures overall can disable all the failure-aware services. In contrast, it is possible to boost the resilience of a system solving consensus using failure-aware services if arbitrary connection patterns between processes and services are allowed: consensus is solvable for any number of failures using only 1-resilient 2-process perfect failure detectors. As far as we know, this is the first time a unified framework has been used to describe both atomic and non-atomic objects, and the first time boosting analysis has been performed for services more general than atomic objects.",

keywords = "Atomic objects, Boosting, Consensus, Distributed services, Failure detectors, I/O automata, Resilience",

author = "Paul Attie and Rachid Guerraoui and Petr Kuznetsov and Nancy Lynch and Sergio Rajsbaum",

note = "Funding Information: Foundation under Grants NSF CNS-0121277 and NSF CCF-0726514 and by the Air Force Office of Scientific Research under Contracts FA9550-04-1-0121 and FA9550-08-1-0159. Part of this work was done while the third author was at the School of Computer and Communication Sciences, EPFL, and Max Planck Institute for Software Systems. The fifth author was supported by a UNAM-PAPIIT research grant. The basic results appeared initially in a 2002 technical report [2]. An extended abstract [1] containing some of the results of this paper was presented at the 25th International Conference on Distributed Computing Systems, June 2005, Columbus, Ohio. ∗ Corresponding author. Funding Information: < The first author was supported by the National Science Foundation under Grant NSF CCF-0438971. The fourth author was supported by the National Science",

year = "2011",

month = jun,

doi = "10.1016/j.ic.2010.07.005",

language = "English (US)",

volume = "209",

pages = "927--950",

journal = "Information and Computation",

issn = "0890-5401",

publisher = "Elsevier Inc.",

number = "6",

}

TY - JOUR

T1 - The impossibility of boosting distributed service resilience

AU - Attie, Paul

AU - Guerraoui, Rachid

AU - Kuznetsov, Petr

AU - Lynch, Nancy

AU - Rajsbaum, Sergio

N1 - Funding Information: Foundation under Grants NSF CNS-0121277 and NSF CCF-0726514 and by the Air Force Office of Scientific Research under Contracts FA9550-04-1-0121 and FA9550-08-1-0159. Part of this work was done while the third author was at the School of Computer and Communication Sciences, EPFL, and Max Planck Institute for Software Systems. The fifth author was supported by a UNAM-PAPIIT research grant. The basic results appeared initially in a 2002 technical report [2]. An extended abstract [1] containing some of the results of this paper was presented at the 25th International Conference on Distributed Computing Systems, June 2005, Columbus, Ohio. ∗ Corresponding author. Funding Information: < The first author was supported by the National Science Foundation under Grant NSF CCF-0438971. The fourth author was supported by the National Science

PY - 2011/6

Y1 - 2011/6

N2 - We study f-resilient services, which are guaranteed to operate as long as no more than f of the associated processes fail. We prove three theorems asserting the impossibility of boosting the resilience of such services. Our first theorem allows any connection pattern between processes and services but assumes these services to be atomic (linearizable) objects. This theorem says that no distributed system in which processes coordinate using f-resilient atomic objects and reliable registers can solve the consensus problem in the presence of f+1 undetectable process stopping failures. In contrast, we show that it is possible to boost the resilience of some systems solving problems easier than consensus: for example, the 2-set-consensus problem is solvable for 2n processes and 2n-1 failures (i.e., wait-free) using n-process consensus services resilient to n-1 failures (wait-free). Our proof is short and self-contained. We then introduce the larger class of failure-oblivious services. These are services that cannot use information about failures, although they may behave more flexibly than atomic objects. An example of such a service is totally ordered broadcast. Our second theorem generalizes the first theorem and its proof to failure-oblivious services. Our third theorem allows the system to contain failure-aware services, such as failure detectors, in addition to failure-oblivious services. This theorem requires that each failure-aware service be connected to all processes; thus, f+1 process failures overall can disable all the failure-aware services. In contrast, it is possible to boost the resilience of a system solving consensus using failure-aware services if arbitrary connection patterns between processes and services are allowed: consensus is solvable for any number of failures using only 1-resilient 2-process perfect failure detectors. As far as we know, this is the first time a unified framework has been used to describe both atomic and non-atomic objects, and the first time boosting analysis has been performed for services more general than atomic objects.

AB - We study f-resilient services, which are guaranteed to operate as long as no more than f of the associated processes fail. We prove three theorems asserting the impossibility of boosting the resilience of such services. Our first theorem allows any connection pattern between processes and services but assumes these services to be atomic (linearizable) objects. This theorem says that no distributed system in which processes coordinate using f-resilient atomic objects and reliable registers can solve the consensus problem in the presence of f+1 undetectable process stopping failures. In contrast, we show that it is possible to boost the resilience of some systems solving problems easier than consensus: for example, the 2-set-consensus problem is solvable for 2n processes and 2n-1 failures (i.e., wait-free) using n-process consensus services resilient to n-1 failures (wait-free). Our proof is short and self-contained. We then introduce the larger class of failure-oblivious services. These are services that cannot use information about failures, although they may behave more flexibly than atomic objects. An example of such a service is totally ordered broadcast. Our second theorem generalizes the first theorem and its proof to failure-oblivious services. Our third theorem allows the system to contain failure-aware services, such as failure detectors, in addition to failure-oblivious services. This theorem requires that each failure-aware service be connected to all processes; thus, f+1 process failures overall can disable all the failure-aware services. In contrast, it is possible to boost the resilience of a system solving consensus using failure-aware services if arbitrary connection patterns between processes and services are allowed: consensus is solvable for any number of failures using only 1-resilient 2-process perfect failure detectors. As far as we know, this is the first time a unified framework has been used to describe both atomic and non-atomic objects, and the first time boosting analysis has been performed for services more general than atomic objects.

KW - Atomic objects

KW - Boosting

KW - Consensus

KW - Distributed services

KW - Failure detectors

KW - I/O automata

KW - Resilience

UR - http://www.scopus.com/inward/record.url?scp=79952603732&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952603732&partnerID=8YFLogxK

U2 - 10.1016/j.ic.2010.07.005

DO - 10.1016/j.ic.2010.07.005

M3 - Article

AN - SCOPUS:79952603732

SN - 0890-5401

VL - 209

SP - 927

EP - 950

JO - Information and Computation

JF - Information and Computation

IS - 6

ER -

The impossibility of boosting distributed service resilience

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this