TY - GEN
T1 - Parallelizing pruned landmark labeling
T2 - 34th ACM International Conference on Supercomputing, ICS 2020
AU - Jin, Ruoming
AU - Peng, Zhen
AU - Wu, Wendell
AU - Dragan, Feodor
AU - Agrawal, Gagan
AU - Ren, Bin
N1 - Funding Information:
and/or technical support that have contributed to the results reported within this paper. The work was also partially supported by the Ohio Supercomputer Center under Grant no. PGS0218.
Publisher Copyright:
© 2020 ACM.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/6/29
Y1 - 2020/6/29
N2 - To help compute shortest path distances over large graphs efficiently, 2-hop labeling has emerged as a major tool, with Pruned Landmark Labeling (PPL) as a popular algorithm. This paper demonstrates the first scalable parallel implementation of the PPL algorithm that produces the same results as the sequential algorithm. Based on theoretical analysis, we show how computations on each vertex can be performed in parallel while maintaining correctness, resulting in the Vertex-Centrix PLL (VC-PLL) algorithm. We also show a formulation of this algorithm based on linear algebra and argue why the use of a library based on linear algebra operations will not produce an efficient implementation. Next, we introduce a batched VC-PLL (BVC-PLL) algorithm to reduce the computational inefficiency in VC-PLL. We have carried out a parallel implementation of this method for modern clusters, combining shared memory and distributed memory parallelism, that can efficiently execute on graphs with more than a billion edges. We also demonstrate how BVC-PLL algorithm can be extended to handle directed graphs and weighted graphs and how the version for weighted graphs can benefit from SIMD parallelization.
AB - To help compute shortest path distances over large graphs efficiently, 2-hop labeling has emerged as a major tool, with Pruned Landmark Labeling (PPL) as a popular algorithm. This paper demonstrates the first scalable parallel implementation of the PPL algorithm that produces the same results as the sequential algorithm. Based on theoretical analysis, we show how computations on each vertex can be performed in parallel while maintaining correctness, resulting in the Vertex-Centrix PLL (VC-PLL) algorithm. We also show a formulation of this algorithm based on linear algebra and argue why the use of a library based on linear algebra operations will not produce an efficient implementation. Next, we introduce a batched VC-PLL (BVC-PLL) algorithm to reduce the computational inefficiency in VC-PLL. We have carried out a parallel implementation of this method for modern clusters, combining shared memory and distributed memory parallelism, that can efficiently execute on graphs with more than a billion edges. We also demonstrate how BVC-PLL algorithm can be extended to handle directed graphs and weighted graphs and how the version for weighted graphs can benefit from SIMD parallelization.
KW - dependency resolving
KW - multi-level parallelization
KW - parallel graph algorithms
UR - http://www.scopus.com/inward/record.url?scp=85088499114&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85088499114&partnerID=8YFLogxK
U2 - 10.1145/3392717.3392745
DO - 10.1145/3392717.3392745
M3 - Conference contribution
AN - SCOPUS:85088499114
T3 - Proceedings of the International Conference on Supercomputing
BT - Proceedings of the 34th ACM International Conference on Supercomputing, ICS 2020
PB - Association for Computing Machinery
Y2 - 29 June 2020 through 2 July 2020
ER -