TY - JOUR
T1 - On the distributed complexity of large-scale graph computations
AU - Pandurangan, Gopal
AU - Robinson, Peter
AU - Scquizzato, Michele
N1 - Funding Information:
A preliminary version of this work [56]appearedinthe Proceedings of the 30th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’18). This work was supported, in part, by NSF grants CCF-1527867, CCF-1540512, IIS-1633720, CCF-1717075, by BSF grants 2008348 and 2016419, by University of Padova grant BIRD197859/19, by a grant from the City University of Hong Kong [Project No. 7200639/CS], and by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China [Project No. CityU11213620]. Authors’ addresses: G. Pandurangan, Department of Computer Science, University of Houston, 3551 Cullen Blvd, Houston, TX 77204, USA; email: gopalpandurangan@gmail.com; P. Robinson, Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong; email: peter.robinson@cityu.edu.hk; M. Scquizzato, Department of Mathematics, University of Padova, Via Trieste 63, 35121 Padova, Italy; email: scquizza@math.unipd.it. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2021 Association for Computing Machinery. 1539-9087/2021/06-ART7 $15.00 https://doi.org/10.1145/3460900
Publisher Copyright:
© 2021 Association for Computing Machinery.
PY - 2021/6
Y1 - 2021/6
N2 - Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where k ≥ 2 machines jointly perform computations on graphs with n nodes (typically, n ≫ k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds of the computation. Our main contribution is the General Lower Bound Theorem, a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. This result is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic, and this theorem can be used in a “cookbook” fashion to show distributed lower bounds for several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds on the round complexity of two fundamental graph problems, namely, PageRank computation and triangle enumeration. These applications show that our approach can yield lower bounds for problems where the application of communication complexity techniques seems not obvious or gives weak bounds, including and especially under a stochastic partition of the input. We then present distributed algorithms for PageRank and triangle enumeration with a round complexity that (almost) matches the respective lower bounds; these algorithms exhibit a round complexity that scales superlinearly in k, improving significantly over previous results [Klauck et al., SODA 2015]. Specifically, we show the following results: • PageRank: We show a lower bound of Ω (n/k2 ) rounds and present a distributed algorithm that computes an approximation of the PageRank of all the nodes of a graph in Õ (n/k2 ) rounds. • Triangle enumeration: We show that there exist graphs with m edges where any distributed algorithm requires Ω (m/k5/3 ) rounds. This result also implies the first non-trivial lower bound of Ω (n1/3 ) rounds for the congested clique model, which is tight up to logarithmic factors. We then present a distributed algorithm that enumerates all the triangles of a graph in Õ (m/k5/3 + n/k4/3 ) rounds.
AB - Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where k ≥ 2 machines jointly perform computations on graphs with n nodes (typically, n ≫ k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds of the computation. Our main contribution is the General Lower Bound Theorem, a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. This result is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic, and this theorem can be used in a “cookbook” fashion to show distributed lower bounds for several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds on the round complexity of two fundamental graph problems, namely, PageRank computation and triangle enumeration. These applications show that our approach can yield lower bounds for problems where the application of communication complexity techniques seems not obvious or gives weak bounds, including and especially under a stochastic partition of the input. We then present distributed algorithms for PageRank and triangle enumeration with a round complexity that (almost) matches the respective lower bounds; these algorithms exhibit a round complexity that scales superlinearly in k, improving significantly over previous results [Klauck et al., SODA 2015]. Specifically, we show the following results: • PageRank: We show a lower bound of Ω (n/k2 ) rounds and present a distributed algorithm that computes an approximation of the PageRank of all the nodes of a graph in Õ (n/k2 ) rounds. • Triangle enumeration: We show that there exist graphs with m edges where any distributed algorithm requires Ω (m/k5/3 ) rounds. This result also implies the first non-trivial lower bound of Ω (n1/3 ) rounds for the congested clique model, which is tight up to logarithmic factors. We then present a distributed algorithm that enumerates all the triangles of a graph in Õ (m/k5/3 + n/k4/3 ) rounds.
KW - Distributed graph algorithms
KW - Lower bounds
KW - PageRank
KW - Triangle enumeration
UR - http://www.scopus.com/inward/record.url?scp=85113500275&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113500275&partnerID=8YFLogxK
U2 - 10.1145/3460900
DO - 10.1145/3460900
M3 - Article
AN - SCOPUS:85113500275
SN - 2329-4949
VL - 8
JO - ACM Transactions on Parallel Computing
JF - ACM Transactions on Parallel Computing
IS - 2
M1 - 7
ER -