End-to-End LU Factorization of Large Matrices on GPUs

Yang Xia, Peng Jiang, Gagan Agrawal, Rajiv Ramnath

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

LU factorization for sparse matrices is an important computing step for many engineering and scientific problems such as circuit simulation. There have been many efforts toward parallelizing and scaling this algorithm, which include the recent efforts targeting the GPUs. However, it is still challenging to deploy a complete sparse LU factorization workflow on a GPU due to high memory requirements and data dependencies. In this paper, we propose the first complete GPU solution for sparse LU factorization. To achieve this goal, we propose an out-of-core implementation of the symbolic execution phase, thus removing the bottleneck due to large intermediate data structures. Next, we propose a dynamic parallelism implementation of Kahn's algorithm for topological sort on the GPUs. Finally, for the numeric factorization phase, we increase the parallelism degree by removing the memory limits for large matrices as compared to the existing implementation approaches. Experimental results show that compared with an implementation modified from GLU 3.0, our out-of-core version achieves speedups of 1.13 - 32.65X. Further, our out-of-core implementation achieves a speedup of 1.2 - 2.2 over an optimized unified memory implementation on the GPU. Finally, we show that the optimizations we introduce for numeric factorization turn out to be effective.

Original languageEnglish (US)
Title of host publicationPPoPP 2023 - Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming
PublisherAssociation for Computing Machinery
Pages288-300
Number of pages13
ISBN (Electronic)9798400700156
DOIs
StatePublished - Feb 25 2023
Event28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2023 - Montreal, Canada
Duration: Feb 25 2023Mar 1 2023

Publication series

NameProceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Conference

Conference28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2023
Country/TerritoryCanada
CityMontreal
Period2/25/233/1/23

Keywords

  • GPU acceleration
  • LU factorization
  • memory limits

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'End-to-End LU Factorization of Large Matrices on GPUs'. Together they form a unique fingerprint.

Cite this