CUDA-DTM: Distributed transactional memory for GPU clusters

Samuel Irving; Sui Chen; Lu Peng; Costas Busch; Maurice Herlihy; Christopher J. Michael

doi:10.1007/978-3-030-31277-0_12

CUDA-DTM: Distributed transactional memory for GPU clusters

Samuel Irving, Sui Chen, Lu Peng, Costas Busch, Maurice Herlihy, Christopher J. Michael

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Scopus citations

Abstract

We present CUDA-DTM, the first ever Distributed Transactional Memory framework written in CUDA for large scale GPU clusters. Transactional Memory has become an attractive auto-coherence scheme for GPU applications with irregular memory access patterns due to its ability to avoid serializing threads while still maintaining programmability. We extend GPU Software Transactional Memory to allow threads across many GPUs to access a coherent distributed shared memory space and propose a scheme for GPU-to-GPU communication using CUDA-Aware MPI. The performance of CUDA-DTM is evaluated using a suite of seven irregular memory access benchmarks with varying degrees of compute intensity, contention, and node-to-node communication frequency. Using a cluster of 256 devices, our experiments show that GPU clusters using CUDA-DTM can be up to 115x faster than CPU clusters.

Original language	English (US)
Title of host publication	Networked Systems - 7th International Conference, NETYS 2019, Revised Selected Papers
Editors	Mohamed Faouzi Atig, Alexander A. Schwarzmann
Publisher	Springer
Pages	183-199
Number of pages	17
ISBN (Print)	9783030312763
DOIs	https://doi.org/10.1007/978-3-030-31277-0_12
State	Published - 2019
Externally published	Yes
Event	7th International Conference on Networked Systems, NETYS 2019 - Marrakech, Morocco Duration: Jun 19 2019 → Jun 21 2019

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11704 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	7th International Conference on Networked Systems, NETYS 2019
Country/Territory	Morocco
City	Marrakech
Period	6/19/19 → 6/21/19

Keywords

CUDA
Distributed Transactional Memory
GPU cluster

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/978-3-030-31277-0_12

Cite this

Irving, S., Chen, S., Peng, L., Busch, C., Herlihy, M., & Michael, C. J. (2019). CUDA-DTM: Distributed transactional memory for GPU clusters. In M. F. Atig, & A. A. Schwarzmann (Eds.), Networked Systems - 7th International Conference, NETYS 2019, Revised Selected Papers (pp. 183-199). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11704 LNCS). Springer. https://doi.org/10.1007/978-3-030-31277-0_12

CUDA-DTM: Distributed transactional memory for GPU clusters. / Irving, Samuel; Chen, Sui; Peng, Lu et al.
Networked Systems - 7th International Conference, NETYS 2019, Revised Selected Papers. ed. / Mohamed Faouzi Atig; Alexander A. Schwarzmann. Springer, 2019. p. 183-199 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11704 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Irving, S, Chen, S, Peng, L, Busch, C, Herlihy, M & Michael, CJ 2019, CUDA-DTM: Distributed transactional memory for GPU clusters. in MF Atig & AA Schwarzmann (eds), Networked Systems - 7th International Conference, NETYS 2019, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11704 LNCS, Springer, pp. 183-199, 7th International Conference on Networked Systems, NETYS 2019, Marrakech, Morocco, 6/19/19. https://doi.org/10.1007/978-3-030-31277-0_12

Irving S, Chen S, Peng L, Busch C, Herlihy M, Michael CJ. CUDA-DTM: Distributed transactional memory for GPU clusters. In Atig MF, Schwarzmann AA, editors, Networked Systems - 7th International Conference, NETYS 2019, Revised Selected Papers. Springer. 2019. p. 183-199. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-31277-0_12

Irving, Samuel ; Chen, Sui ; Peng, Lu et al. / CUDA-DTM : Distributed transactional memory for GPU clusters. Networked Systems - 7th International Conference, NETYS 2019, Revised Selected Papers. editor / Mohamed Faouzi Atig ; Alexander A. Schwarzmann. Springer, 2019. pp. 183-199 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{7c211a850b44444fbe693ef33cb519bf,

title = "CUDA-DTM: Distributed transactional memory for GPU clusters",

abstract = "We present CUDA-DTM, the first ever Distributed Transactional Memory framework written in CUDA for large scale GPU clusters. Transactional Memory has become an attractive auto-coherence scheme for GPU applications with irregular memory access patterns due to its ability to avoid serializing threads while still maintaining programmability. We extend GPU Software Transactional Memory to allow threads across many GPUs to access a coherent distributed shared memory space and propose a scheme for GPU-to-GPU communication using CUDA-Aware MPI. The performance of CUDA-DTM is evaluated using a suite of seven irregular memory access benchmarks with varying degrees of compute intensity, contention, and node-to-node communication frequency. Using a cluster of 256 devices, our experiments show that GPU clusters using CUDA-DTM can be up to 115x faster than CPU clusters.",

keywords = "CUDA, Distributed Transactional Memory, GPU cluster",

author = "Samuel Irving and Sui Chen and Lu Peng and Costas Busch and Maurice Herlihy and Michael, {Christopher J.}",

note = "Publisher Copyright: {\textcopyright} Springer Nature Switzerland AG 2019.; 7th International Conference on Networked Systems, NETYS 2019 ; Conference date: 19-06-2019 Through 21-06-2019",

year = "2019",

doi = "10.1007/978-3-030-31277-0_12",

language = "English (US)",

isbn = "9783030312763",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "183--199",

editor = "Atig, {Mohamed Faouzi} and Schwarzmann, {Alexander A.}",

booktitle = "Networked Systems - 7th International Conference, NETYS 2019, Revised Selected Papers",

}

TY - GEN

T1 - CUDA-DTM

T2 - 7th International Conference on Networked Systems, NETYS 2019

AU - Irving, Samuel

AU - Chen, Sui

AU - Peng, Lu

AU - Busch, Costas

AU - Herlihy, Maurice

AU - Michael, Christopher J.

PY - 2019

Y1 - 2019

N2 - We present CUDA-DTM, the first ever Distributed Transactional Memory framework written in CUDA for large scale GPU clusters. Transactional Memory has become an attractive auto-coherence scheme for GPU applications with irregular memory access patterns due to its ability to avoid serializing threads while still maintaining programmability. We extend GPU Software Transactional Memory to allow threads across many GPUs to access a coherent distributed shared memory space and propose a scheme for GPU-to-GPU communication using CUDA-Aware MPI. The performance of CUDA-DTM is evaluated using a suite of seven irregular memory access benchmarks with varying degrees of compute intensity, contention, and node-to-node communication frequency. Using a cluster of 256 devices, our experiments show that GPU clusters using CUDA-DTM can be up to 115x faster than CPU clusters.

AB - We present CUDA-DTM, the first ever Distributed Transactional Memory framework written in CUDA for large scale GPU clusters. Transactional Memory has become an attractive auto-coherence scheme for GPU applications with irregular memory access patterns due to its ability to avoid serializing threads while still maintaining programmability. We extend GPU Software Transactional Memory to allow threads across many GPUs to access a coherent distributed shared memory space and propose a scheme for GPU-to-GPU communication using CUDA-Aware MPI. The performance of CUDA-DTM is evaluated using a suite of seven irregular memory access benchmarks with varying degrees of compute intensity, contention, and node-to-node communication frequency. Using a cluster of 256 devices, our experiments show that GPU clusters using CUDA-DTM can be up to 115x faster than CPU clusters.

KW - CUDA

KW - Distributed Transactional Memory

KW - GPU cluster

UR - http://www.scopus.com/inward/record.url?scp=85075595144&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85075595144&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-31277-0_12

DO - 10.1007/978-3-030-31277-0_12

M3 - Conference contribution

AN - SCOPUS:85075595144

SN - 9783030312763

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 183

EP - 199

BT - Networked Systems - 7th International Conference, NETYS 2019, Revised Selected Papers

A2 - Atig, Mohamed Faouzi

A2 - Schwarzmann, Alexander A.

PB - Springer

Y2 - 19 June 2019 through 21 June 2019

ER -

CUDA-DTM: Distributed transactional memory for GPU clusters

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this