CUDA-DTM: Distributed transactional memory for GPU clusters

Samuel Irving, Sui Chen, Lu Peng, Costas Busch, Maurice Herlihy, Christopher J. Michael

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations


We present CUDA-DTM, the first ever Distributed Transactional Memory framework written in CUDA for large scale GPU clusters. Transactional Memory has become an attractive auto-coherence scheme for GPU applications with irregular memory access patterns due to its ability to avoid serializing threads while still maintaining programmability. We extend GPU Software Transactional Memory to allow threads across many GPUs to access a coherent distributed shared memory space and propose a scheme for GPU-to-GPU communication using CUDA-Aware MPI. The performance of CUDA-DTM is evaluated using a suite of seven irregular memory access benchmarks with varying degrees of compute intensity, contention, and node-to-node communication frequency. Using a cluster of 256 devices, our experiments show that GPU clusters using CUDA-DTM can be up to 115x faster than CPU clusters.

Original languageEnglish (US)
Title of host publicationNetworked Systems - 7th International Conference, NETYS 2019, Revised Selected Papers
EditorsMohamed Faouzi Atig, Alexander A. Schwarzmann
Number of pages17
ISBN (Print)9783030312763
StatePublished - 2019
Externally publishedYes
Event7th International Conference on Networked Systems, NETYS 2019 - Marrakech, Morocco
Duration: Jun 19 2019Jun 21 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11704 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference7th International Conference on Networked Systems, NETYS 2019


  • CUDA
  • Distributed Transactional Memory
  • GPU cluster

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'CUDA-DTM: Distributed transactional memory for GPU clusters'. Together they form a unique fingerprint.

Cite this