AUTO-GC: Automatic translation of data mining applications to GPU clusters

Wenjing Ma; Gagan Agrawal

doi:10.1109/IPDPSW.2010.5470883

AUTO-GC: Automatic translation of data mining applications to GPU clusters

Wenjing Ma, Gagan Agrawal

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

4 Scopus citations

Abstract

Because of the very favorable price to performance ratio of the GPUs, a popular parallel programming configuration today is a cluster of GPUs. However, extracting performance on such a configuration would typically require programming in both MPI and CUDA, thus requiring a high degree of expertise and effort. It is clearly desirable to be able to support higherlevel programming of this emerging high-performance computing platform. This paper reports on a code generation system that can translate data mining applications on a GPU cluster. Our work is driven by the observation that a common processing structure, that of generalized reductions, fits a large number of popular data mining algorithms. In our solution, the programmers simply need to specify the sequential reduction loop(s) with some additional information about the parameters. We use program analysis and code generation to automatically map the applications to the API of FREERIDE, which is a middleware for parallel data mining. We also automatically generate CUDA code for using the GPU on each node of the cluster. We have evaluated our system using two popular data mining applications, k-means clustering and Principal Component Analysis (PCA). We observed good scalability over the number of computing nodes, and the automatically generated version did not have any noticeable overheads compared to hand written codes. The speedup obtained by using GPU over using only the CPU on each node of a cluster is between 3 and 21.

Original language	English (US)
Title of host publication	Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010
DOIs	https://doi.org/10.1109/IPDPSW.2010.5470883
State	Published - 2010
Externally published	Yes
Event	2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010 - Atlanta, GA, United States Duration: Apr 19 2010 → Apr 23 2010

Publication series

Name	Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010

Conference

Conference	2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010
Country/Territory	United States
City	Atlanta, GA
Period	4/19/10 → 4/23/10

Keywords

CUDA
Cluster
Data mining
GPGPU

ASJC Scopus subject areas

Computational Theory and Mathematics
Software
Theoretical Computer Science

Access to Document

10.1109/IPDPSW.2010.5470883

Cite this

Ma, W., & Agrawal, G. (2010). AUTO-GC: Automatic translation of data mining applications to GPU clusters. In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010 Article 5470883 (Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010). https://doi.org/10.1109/IPDPSW.2010.5470883

AUTO-GC: Automatic translation of data mining applications to GPU clusters. / Ma, Wenjing; Agrawal, Gagan.
Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010. 2010. 5470883 (Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Ma, W & Agrawal, G 2010, AUTO-GC: Automatic translation of data mining applications to GPU clusters. in Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010., 5470883, Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010, 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010, Atlanta, GA, United States, 4/19/10. https://doi.org/10.1109/IPDPSW.2010.5470883

Ma W, Agrawal G. AUTO-GC: Automatic translation of data mining applications to GPU clusters. In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010. 2010. 5470883. (Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010). doi: 10.1109/IPDPSW.2010.5470883

Ma, Wenjing ; Agrawal, Gagan. / AUTO-GC : Automatic translation of data mining applications to GPU clusters. Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010. 2010. (Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010).

@inproceedings{5104a48bcc6a49a59405715093d907ce,

title = "AUTO-GC: Automatic translation of data mining applications to GPU clusters",

abstract = "Because of the very favorable price to performance ratio of the GPUs, a popular parallel programming configuration today is a cluster of GPUs. However, extracting performance on such a configuration would typically require programming in both MPI and CUDA, thus requiring a high degree of expertise and effort. It is clearly desirable to be able to support higherlevel programming of this emerging high-performance computing platform. This paper reports on a code generation system that can translate data mining applications on a GPU cluster. Our work is driven by the observation that a common processing structure, that of generalized reductions, fits a large number of popular data mining algorithms. In our solution, the programmers simply need to specify the sequential reduction loop(s) with some additional information about the parameters. We use program analysis and code generation to automatically map the applications to the API of FREERIDE, which is a middleware for parallel data mining. We also automatically generate CUDA code for using the GPU on each node of the cluster. We have evaluated our system using two popular data mining applications, k-means clustering and Principal Component Analysis (PCA). We observed good scalability over the number of computing nodes, and the automatically generated version did not have any noticeable overheads compared to hand written codes. The speedup obtained by using GPU over using only the CPU on each node of a cluster is between 3 and 21.",

keywords = "CUDA, Cluster, Data mining, GPGPU",

author = "Wenjing Ma and Gagan Agrawal",

year = "2010",

doi = "10.1109/IPDPSW.2010.5470883",

language = "English (US)",

isbn = "9781424465347",

series = "Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010",

booktitle = "Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010",

note = "2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010 ; Conference date: 19-04-2010 Through 23-04-2010",

}

TY - GEN

T1 - AUTO-GC

T2 - 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010

AU - Ma, Wenjing

AU - Agrawal, Gagan

PY - 2010

Y1 - 2010

N2 - Because of the very favorable price to performance ratio of the GPUs, a popular parallel programming configuration today is a cluster of GPUs. However, extracting performance on such a configuration would typically require programming in both MPI and CUDA, thus requiring a high degree of expertise and effort. It is clearly desirable to be able to support higherlevel programming of this emerging high-performance computing platform. This paper reports on a code generation system that can translate data mining applications on a GPU cluster. Our work is driven by the observation that a common processing structure, that of generalized reductions, fits a large number of popular data mining algorithms. In our solution, the programmers simply need to specify the sequential reduction loop(s) with some additional information about the parameters. We use program analysis and code generation to automatically map the applications to the API of FREERIDE, which is a middleware for parallel data mining. We also automatically generate CUDA code for using the GPU on each node of the cluster. We have evaluated our system using two popular data mining applications, k-means clustering and Principal Component Analysis (PCA). We observed good scalability over the number of computing nodes, and the automatically generated version did not have any noticeable overheads compared to hand written codes. The speedup obtained by using GPU over using only the CPU on each node of a cluster is between 3 and 21.

AB - Because of the very favorable price to performance ratio of the GPUs, a popular parallel programming configuration today is a cluster of GPUs. However, extracting performance on such a configuration would typically require programming in both MPI and CUDA, thus requiring a high degree of expertise and effort. It is clearly desirable to be able to support higherlevel programming of this emerging high-performance computing platform. This paper reports on a code generation system that can translate data mining applications on a GPU cluster. Our work is driven by the observation that a common processing structure, that of generalized reductions, fits a large number of popular data mining algorithms. In our solution, the programmers simply need to specify the sequential reduction loop(s) with some additional information about the parameters. We use program analysis and code generation to automatically map the applications to the API of FREERIDE, which is a middleware for parallel data mining. We also automatically generate CUDA code for using the GPU on each node of the cluster. We have evaluated our system using two popular data mining applications, k-means clustering and Principal Component Analysis (PCA). We observed good scalability over the number of computing nodes, and the automatically generated version did not have any noticeable overheads compared to hand written codes. The speedup obtained by using GPU over using only the CPU on each node of a cluster is between 3 and 21.

KW - CUDA

KW - Cluster

KW - Data mining

KW - GPGPU

UR - http://www.scopus.com/inward/record.url?scp=77954053740&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954053740&partnerID=8YFLogxK

U2 - 10.1109/IPDPSW.2010.5470883

DO - 10.1109/IPDPSW.2010.5470883

M3 - Conference contribution

AN - SCOPUS:77954053740

SN - 9781424465347

T3 - Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010

BT - Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010

Y2 - 19 April 2010 through 23 April 2010

ER -

AUTO-GC: Automatic translation of data mining applications to GPU clusters

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this