TY - JOUR
T1 - Rambo: a robust, reconfigurable atomic memory service for dynamic networks.
T2 - A robust, reconfigurable atomic memory service for dynamic networks
AU - Gilbert, Seth
AU - Lynch, Nancy A.
AU - Shvartsman, Alexander A.
N1 - Funding Information:
Preliminary versions of this work appeared as the following extended abstracts: (a) Nancy A. Lynch, Alexander A. Shvartsman: RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks. DISC 2002:173–190, and (b) Seth Gilbert, Nancy A. Lynch, Alexander A. Shvartsman: RAMBO II: Rapidly Reconfigurable Atomic Memory for Dynamic Networks. DSN 2003:259–268. This work was supported in part by the NSF ITR Grant CCR-0121277. The work of the second author was additionally supported by the NSF Grant 9804665, and the work of the third author was additionally supported in part by the NSF Grants 9984778, 9988304, and 0311368.
PY - 2010
Y1 - 2010
N2 - In this paper, we present Rambo, an algorithm for emulating a read/write distributed shared memory in a dynamic, rapidly changing environment. Rambo provides a highly reliable, highly available service, even as participants join, leave, and fail. In fact, the entire set of participants may change during an execution, as the initial devices depart and are replaced by a new set of devices. Even so, Rambo ensures that data stored in the distributed shared memory remains available and consistent. There are two basic techniques used by Rambo to tolerate dynamic changes. Over short intervals of time, replication suffices to provide fault-tolerance. While some devices may fail and leave, the data remains available at other replicas. Over longer intervals of time, Rambo copes with changing participants via reconfiguration, which incorporates newly joined devices while excluding devices that have departed or failed. The main novelty of Rambo lies in the combination of an efficient reconfiguration mechanism with a quorum-based replication strategy for read/write shared memory. The Rambo algorithm can tolerate a wide variety of aberrant behavior, including lost and delayed messages, participants with unsynchronized clocks, and, more generally, arbitrary asynchrony. Despite such behavior, Rambo guarantees that its data is stored consistency. We analyze the performance of Rambo during periods when the system is relatively well-behaved: messages are delivered in a timely fashion, reconfiguration is not too frequent, etc. We show that in these circumstances, read and write operations are efficient, completing in at most eight message delays.
AB - In this paper, we present Rambo, an algorithm for emulating a read/write distributed shared memory in a dynamic, rapidly changing environment. Rambo provides a highly reliable, highly available service, even as participants join, leave, and fail. In fact, the entire set of participants may change during an execution, as the initial devices depart and are replaced by a new set of devices. Even so, Rambo ensures that data stored in the distributed shared memory remains available and consistent. There are two basic techniques used by Rambo to tolerate dynamic changes. Over short intervals of time, replication suffices to provide fault-tolerance. While some devices may fail and leave, the data remains available at other replicas. Over longer intervals of time, Rambo copes with changing participants via reconfiguration, which incorporates newly joined devices while excluding devices that have departed or failed. The main novelty of Rambo lies in the combination of an efficient reconfiguration mechanism with a quorum-based replication strategy for read/write shared memory. The Rambo algorithm can tolerate a wide variety of aberrant behavior, including lost and delayed messages, participants with unsynchronized clocks, and, more generally, arbitrary asynchrony. Despite such behavior, Rambo guarantees that its data is stored consistency. We analyze the performance of Rambo during periods when the system is relatively well-behaved: messages are delivered in a timely fashion, reconfiguration is not too frequent, etc. We show that in these circumstances, read and write operations are efficient, completing in at most eight message delays.
KW - Atomic register
KW - Distributed shared memory
KW - Dynamic distributed systems
KW - Eventual synchrony
KW - Fault-tolerance
KW - Reconfigurable
UR - http://www.scopus.com/inward/record.url?scp=78650629847&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78650629847&partnerID=8YFLogxK
U2 - 10.1007/s00446-010-0117-1
DO - 10.1007/s00446-010-0117-1
M3 - Article
AN - SCOPUS:78650629847
SN - 0178-2770
VL - 23
SP - 225
EP - 272
JO - Distributed Computing
JF - Distributed Computing
IS - 4
M1 - 4
ER -