Improved communication complexity of fault-tolerant consensus

Mohammad T. Hajiaghayi; Dariusz R. Kowalski; Jan Olkowski

doi:10.1145/3519935.3520078

Improved communication complexity of fault-tolerant consensus

Mohammad T. Hajiaghayi, Dariusz R. Kowalski, Jan Olkowski

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

3 Scopus citations

Abstract

Consensus is one of the most thoroughly studied problems in distributed computing, yet there are still complexity gaps that have not been bridged for decades. In particular, in the classical message-passing setting with processes' crashes, since the seminal works of Bar-Joseph and Ben-Or [PODC 1998] and Aspnes and Waarts [SICOMP 1996, JACM 1998] in the previous century, there is still a fundamental unresolved question about communication complexity of fast randomized Consensus against a (strong) adaptive adversary crashing processes arbitrarily online. The best known upper bound on the number of communication bits is (n3/2/logn) per process, while the best lower bound is ω(1). This is in contrast to randomized Consensus against a (weak) oblivious adversary, for which time-almost-optimal algorithms guarantee amortized O(1) communication bits per process. We design an algorithm against adaptive adversary that reduces the communication gap by nearly linear factor to O(n· n) bits per process, while keeping almost-optimal (up to factor O(log3 n)) time complexity O(n·log5/2 n). More surprisingly, we show this complexity indeed can be lowered further, but at the expense of increasing time complexity, i.e., there is a trade-off between communication complexity and time complexity. More specifically, our main Consensus algorithm allows to reduce communication complexity per process to any value from n to O(n· n), as long as Time × Communication = O(n· n). Similarly, reducing time complexity requires more random bits per process, i.e., Time × Randomness =O(n· n). Our parameterized consensus solutions are based on a few newly developed paradigms and algorithms for crash-resilient computing, interesting on their own. The first one, called a Fuzzy Counting, provides for each process a number which is in-between the numbers of alive processes at the end and in the beginning of the counting. Our deterministic Fuzzy Counting algorithm works in O(log3 n) rounds and uses only O( n) amortized communication bits per process, unlike previous solutions to counting that required ω(n) bits. This improvement is possible due to a new Fault-tolerant Gossip solution with O(log3 n) rounds using only O(||· n) communication bits per process, where || is the length of the rumor binary representation. It exploits distributed fault-tolerant divide-and-conquer idea, in which processes run a Bipartite Gossip algorithm for a considered partition of processes. To avoid passing many long messages, processes use a family of small-degree compact expanders for local signaling to their overlay neighbors if they are in a compact (large and well-connected) party, and switch to a denser overlay graph whenever local signalling in the current one is failed.

Original language	English (US)
Title of host publication	STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing
Editors	Stefano Leonardi, Anupam Gupta
Publisher	Association for Computing Machinery
Pages	488-501
Number of pages	14
ISBN (Electronic)	9781450392648
DOIs	https://doi.org/10.1145/3519935.3520078
State	Published - Sep 6 2022
Event	54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022 - Rome, Italy Duration: Jun 20 2022 → Jun 24 2022

Publication series

Name	Proceedings of the Annual ACM Symposium on Theory of Computing
ISSN (Print)	0737-8017

Conference

Conference	54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022
Country/Territory	Italy
City	Rome
Period	6/20/22 → 6/24/22

Keywords

adaptive adversary
crash failures
distributed consensus

ASJC Scopus subject areas

Software

Access to Document

10.1145/3519935.3520078

Cite this

Hajiaghayi, M. T., Kowalski, D. R., & Olkowski, J. (2022). Improved communication complexity of fault-tolerant consensus. In S. Leonardi, & A. Gupta (Eds.), STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing (pp. 488-501). (Proceedings of the Annual ACM Symposium on Theory of Computing). Association for Computing Machinery. https://doi.org/10.1145/3519935.3520078

Improved communication complexity of fault-tolerant consensus. / Hajiaghayi, Mohammad T.; Kowalski, Dariusz R.; Olkowski, Jan.
STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing. ed. / Stefano Leonardi; Anupam Gupta. Association for Computing Machinery, 2022. p. 488-501 (Proceedings of the Annual ACM Symposium on Theory of Computing).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Hajiaghayi, MT, Kowalski, DR & Olkowski, J 2022, Improved communication complexity of fault-tolerant consensus. in S Leonardi & A Gupta (eds), STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing. Proceedings of the Annual ACM Symposium on Theory of Computing, Association for Computing Machinery, pp. 488-501, 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022, Rome, Italy, 6/20/22. https://doi.org/10.1145/3519935.3520078

Hajiaghayi MT, Kowalski DR, Olkowski J. Improved communication complexity of fault-tolerant consensus. In Leonardi S, Gupta A, editors, STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing. Association for Computing Machinery. 2022. p. 488-501. (Proceedings of the Annual ACM Symposium on Theory of Computing). doi: 10.1145/3519935.3520078

Hajiaghayi, Mohammad T. ; Kowalski, Dariusz R. ; Olkowski, Jan. / Improved communication complexity of fault-tolerant consensus. STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing. editor / Stefano Leonardi ; Anupam Gupta. Association for Computing Machinery, 2022. pp. 488-501 (Proceedings of the Annual ACM Symposium on Theory of Computing).

@inproceedings{f53a8534fcce4a5483fa89203f55d87d,

title = "Improved communication complexity of fault-tolerant consensus",

abstract = "Consensus is one of the most thoroughly studied problems in distributed computing, yet there are still complexity gaps that have not been bridged for decades. In particular, in the classical message-passing setting with processes' crashes, since the seminal works of Bar-Joseph and Ben-Or [PODC 1998] and Aspnes and Waarts [SICOMP 1996, JACM 1998] in the previous century, there is still a fundamental unresolved question about communication complexity of fast randomized Consensus against a (strong) adaptive adversary crashing processes arbitrarily online. The best known upper bound on the number of communication bits is (n3/2/logn) per process, while the best lower bound is ω(1). This is in contrast to randomized Consensus against a (weak) oblivious adversary, for which time-almost-optimal algorithms guarantee amortized O(1) communication bits per process. We design an algorithm against adaptive adversary that reduces the communication gap by nearly linear factor to O(n· n) bits per process, while keeping almost-optimal (up to factor O(log3 n)) time complexity O(n·log5/2 n). More surprisingly, we show this complexity indeed can be lowered further, but at the expense of increasing time complexity, i.e., there is a trade-off between communication complexity and time complexity. More specifically, our main Consensus algorithm allows to reduce communication complexity per process to any value from n to O(n· n), as long as Time × Communication = O(n· n). Similarly, reducing time complexity requires more random bits per process, i.e., Time × Randomness =O(n· n). Our parameterized consensus solutions are based on a few newly developed paradigms and algorithms for crash-resilient computing, interesting on their own. The first one, called a Fuzzy Counting, provides for each process a number which is in-between the numbers of alive processes at the end and in the beginning of the counting. Our deterministic Fuzzy Counting algorithm works in O(log3 n) rounds and uses only O( n) amortized communication bits per process, unlike previous solutions to counting that required ω(n) bits. This improvement is possible due to a new Fault-tolerant Gossip solution with O(log3 n) rounds using only O(||· n) communication bits per process, where || is the length of the rumor binary representation. It exploits distributed fault-tolerant divide-and-conquer idea, in which processes run a Bipartite Gossip algorithm for a considered partition of processes. To avoid passing many long messages, processes use a family of small-degree compact expanders for local signaling to their overlay neighbors if they are in a compact (large and well-connected) party, and switch to a denser overlay graph whenever local signalling in the current one is failed.",

keywords = "adaptive adversary, crash failures, distributed consensus",

author = "Hajiaghayi, {Mohammad T.} and Kowalski, {Dariusz R.} and Jan Olkowski",

note = "Funding Information: M.T. HajiAghayi and J. Olkowski were partially supported by NSF CCF grant-2114269 and an Amazon AWS award. D.R. Kowalski was partially supported by the NSF grant 2131538. Publisher Copyright: {\textcopyright} 2022 ACM.; 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022 ; Conference date: 20-06-2022 Through 24-06-2022",

year = "2022",

month = sep,

day = "6",

doi = "10.1145/3519935.3520078",

language = "English (US)",

series = "Proceedings of the Annual ACM Symposium on Theory of Computing",

publisher = "Association for Computing Machinery",

pages = "488--501",

editor = "Stefano Leonardi and Anupam Gupta",

booktitle = "STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing",

}

TY - GEN

T1 - Improved communication complexity of fault-tolerant consensus

AU - Hajiaghayi, Mohammad T.

AU - Kowalski, Dariusz R.

AU - Olkowski, Jan

N1 - Funding Information: M.T. HajiAghayi and J. Olkowski were partially supported by NSF CCF grant-2114269 and an Amazon AWS award. D.R. Kowalski was partially supported by the NSF grant 2131538. Publisher Copyright: © 2022 ACM.

PY - 2022/9/6

Y1 - 2022/9/6

N2 - Consensus is one of the most thoroughly studied problems in distributed computing, yet there are still complexity gaps that have not been bridged for decades. In particular, in the classical message-passing setting with processes' crashes, since the seminal works of Bar-Joseph and Ben-Or [PODC 1998] and Aspnes and Waarts [SICOMP 1996, JACM 1998] in the previous century, there is still a fundamental unresolved question about communication complexity of fast randomized Consensus against a (strong) adaptive adversary crashing processes arbitrarily online. The best known upper bound on the number of communication bits is (n3/2/logn) per process, while the best lower bound is ω(1). This is in contrast to randomized Consensus against a (weak) oblivious adversary, for which time-almost-optimal algorithms guarantee amortized O(1) communication bits per process. We design an algorithm against adaptive adversary that reduces the communication gap by nearly linear factor to O(n· n) bits per process, while keeping almost-optimal (up to factor O(log3 n)) time complexity O(n·log5/2 n). More surprisingly, we show this complexity indeed can be lowered further, but at the expense of increasing time complexity, i.e., there is a trade-off between communication complexity and time complexity. More specifically, our main Consensus algorithm allows to reduce communication complexity per process to any value from n to O(n· n), as long as Time × Communication = O(n· n). Similarly, reducing time complexity requires more random bits per process, i.e., Time × Randomness =O(n· n). Our parameterized consensus solutions are based on a few newly developed paradigms and algorithms for crash-resilient computing, interesting on their own. The first one, called a Fuzzy Counting, provides for each process a number which is in-between the numbers of alive processes at the end and in the beginning of the counting. Our deterministic Fuzzy Counting algorithm works in O(log3 n) rounds and uses only O( n) amortized communication bits per process, unlike previous solutions to counting that required ω(n) bits. This improvement is possible due to a new Fault-tolerant Gossip solution with O(log3 n) rounds using only O(||· n) communication bits per process, where || is the length of the rumor binary representation. It exploits distributed fault-tolerant divide-and-conquer idea, in which processes run a Bipartite Gossip algorithm for a considered partition of processes. To avoid passing many long messages, processes use a family of small-degree compact expanders for local signaling to their overlay neighbors if they are in a compact (large and well-connected) party, and switch to a denser overlay graph whenever local signalling in the current one is failed.

AB - Consensus is one of the most thoroughly studied problems in distributed computing, yet there are still complexity gaps that have not been bridged for decades. In particular, in the classical message-passing setting with processes' crashes, since the seminal works of Bar-Joseph and Ben-Or [PODC 1998] and Aspnes and Waarts [SICOMP 1996, JACM 1998] in the previous century, there is still a fundamental unresolved question about communication complexity of fast randomized Consensus against a (strong) adaptive adversary crashing processes arbitrarily online. The best known upper bound on the number of communication bits is (n3/2/logn) per process, while the best lower bound is ω(1). This is in contrast to randomized Consensus against a (weak) oblivious adversary, for which time-almost-optimal algorithms guarantee amortized O(1) communication bits per process. We design an algorithm against adaptive adversary that reduces the communication gap by nearly linear factor to O(n· n) bits per process, while keeping almost-optimal (up to factor O(log3 n)) time complexity O(n·log5/2 n). More surprisingly, we show this complexity indeed can be lowered further, but at the expense of increasing time complexity, i.e., there is a trade-off between communication complexity and time complexity. More specifically, our main Consensus algorithm allows to reduce communication complexity per process to any value from n to O(n· n), as long as Time × Communication = O(n· n). Similarly, reducing time complexity requires more random bits per process, i.e., Time × Randomness =O(n· n). Our parameterized consensus solutions are based on a few newly developed paradigms and algorithms for crash-resilient computing, interesting on their own. The first one, called a Fuzzy Counting, provides for each process a number which is in-between the numbers of alive processes at the end and in the beginning of the counting. Our deterministic Fuzzy Counting algorithm works in O(log3 n) rounds and uses only O( n) amortized communication bits per process, unlike previous solutions to counting that required ω(n) bits. This improvement is possible due to a new Fault-tolerant Gossip solution with O(log3 n) rounds using only O(||· n) communication bits per process, where || is the length of the rumor binary representation. It exploits distributed fault-tolerant divide-and-conquer idea, in which processes run a Bipartite Gossip algorithm for a considered partition of processes. To avoid passing many long messages, processes use a family of small-degree compact expanders for local signaling to their overlay neighbors if they are in a compact (large and well-connected) party, and switch to a denser overlay graph whenever local signalling in the current one is failed.

KW - adaptive adversary

KW - crash failures

KW - distributed consensus

UR - http://www.scopus.com/inward/record.url?scp=85132785777&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85132785777&partnerID=8YFLogxK

U2 - 10.1145/3519935.3520078

DO - 10.1145/3519935.3520078

M3 - Conference contribution

AN - SCOPUS:85132785777

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 488

EP - 501

BT - STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing

A2 - Leonardi, Stefano

A2 - Gupta, Anupam

PB - Association for Computing Machinery

T2 - 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022

Y2 - 20 June 2022 through 24 June 2022

ER -

Improved communication complexity of fault-tolerant consensus

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this