TY - GEN
T1 - Sampling-based Sparse Format Selection on GPUs
AU - Zhu, Gangyi
AU - Agrawal, Gagan
N1 - Funding Information:
Acknowledgements: This work was partially supported by the following NSF grants: 1629392, 2007793, 2034850, 2131509, and 2018627.
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Sparse Matrix-Vector Multiplication (SpMV) is an important kernel in numerous computational disciplines. The overall performance of SpMV is highly dependent on the storage format of the sparse matrix. This has led to much interest in recent years on automatically choosing the appropriate format, typically using machine learning techniques and training a model using a large number of matrices. However, these methods have limitations in practice-besides the dependency on obtaining a large number of sparse matrices of training and expensive overheads of the training, they usually have limited prediction ability across architectures. In this paper, we take a very distinct approach to the same problem. This approach involves obtaining samples from the original matrix, executing the kernel using these samples, and selecting the best format. However, our approach requires obtaining representative samples that can help understand performance associated with using a specific format on the full matrix, which turns out to be challenging. Based on the storage properties and processing granularity associated with different formats, we develop three novel sampling schemes: Row Cropping sampling, Random Warp sampling, and Diagonal Aligning sampling. These sampling methods are designed by observing that certain factors tend to be critical for performance associated with a particular format, and thus preserving that factor through sampling. Experimental results using nearly 2000 matrices demonstrate that our approach delivers high efficiency without the expensive training process, and it is easy to migrate across architectures. At the same time, our approach achieves comparable prediction accuracy with the state-of-art methodologies, and even outperforms them in certain cases (especially for predicting on some of the largest matrices we use). Through our work, we also offer new insights into the performance achieved using different formats on GPUs.
AB - Sparse Matrix-Vector Multiplication (SpMV) is an important kernel in numerous computational disciplines. The overall performance of SpMV is highly dependent on the storage format of the sparse matrix. This has led to much interest in recent years on automatically choosing the appropriate format, typically using machine learning techniques and training a model using a large number of matrices. However, these methods have limitations in practice-besides the dependency on obtaining a large number of sparse matrices of training and expensive overheads of the training, they usually have limited prediction ability across architectures. In this paper, we take a very distinct approach to the same problem. This approach involves obtaining samples from the original matrix, executing the kernel using these samples, and selecting the best format. However, our approach requires obtaining representative samples that can help understand performance associated with using a specific format on the full matrix, which turns out to be challenging. Based on the storage properties and processing granularity associated with different formats, we develop three novel sampling schemes: Row Cropping sampling, Random Warp sampling, and Diagonal Aligning sampling. These sampling methods are designed by observing that certain factors tend to be critical for performance associated with a particular format, and thus preserving that factor through sampling. Experimental results using nearly 2000 matrices demonstrate that our approach delivers high efficiency without the expensive training process, and it is easy to migrate across architectures. At the same time, our approach achieves comparable prediction accuracy with the state-of-art methodologies, and even outperforms them in certain cases (especially for predicting on some of the largest matrices we use). Through our work, we also offer new insights into the performance achieved using different formats on GPUs.
KW - GPUs Sampling
KW - Sparse Computations
UR - http://www.scopus.com/inward/record.url?scp=85124373821&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85124373821&partnerID=8YFLogxK
U2 - 10.1109/SBAC-PAD53543.2021.00031
DO - 10.1109/SBAC-PAD53543.2021.00031
M3 - Conference contribution
AN - SCOPUS:85124373821
T3 - Proceedings - Symposium on Computer Architecture and High Performance Computing
SP - 198
EP - 208
BT - Proceedings - 2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2021
PB - IEEE Computer Society
T2 - 33rd IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2021
Y2 - 26 October 2021 through 29 October 2021
ER -