TY - GEN
T1 - Using tiling to scale parallel data cube construction
AU - Jin, Ruoming
AU - Vaidyanathan, Karthik
AU - Yang, Ge
AU - Agrawal, Gagan
PY - 2004
Y1 - 2004
N2 - Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. Also, for both sequential and parallel data cube construction, effectively using the main memory is an important challenge. In our prior work, we have developed parallel algorithms for this problem. In this paper, we show how sequential and parallel data cube construction algorithms can be further scaled to handle larger problems, when the memory requirements could be a constraint. This is done by tiling the input and output arrays on each node. We address the challenges in using tiling while still maintaining the other desired properties of a data cube construction algorithm, which are, using minimal parents, and achieving maximal cache and memory reuse. We present a parallel algorithm that combines tiling with interprocessor communication. Our experimental results show the following. First, tiling helps in scaling data cube construction in both sequential and parallel environments. Second, choosing tiling parameters as per our theoretical results does result in better performance.
AB - Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. Also, for both sequential and parallel data cube construction, effectively using the main memory is an important challenge. In our prior work, we have developed parallel algorithms for this problem. In this paper, we show how sequential and parallel data cube construction algorithms can be further scaled to handle larger problems, when the memory requirements could be a constraint. This is done by tiling the input and output arrays on each node. We address the challenges in using tiling while still maintaining the other desired properties of a data cube construction algorithm, which are, using minimal parents, and achieving maximal cache and memory reuse. We present a parallel algorithm that combines tiling with interprocessor communication. Our experimental results show the following. First, tiling helps in scaling data cube construction in both sequential and parallel environments. Second, choosing tiling parameters as per our theoretical results does result in better performance.
UR - http://www.scopus.com/inward/record.url?scp=10044240505&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=10044240505&partnerID=8YFLogxK
U2 - 10.1109/ICPP.2004.1327944
DO - 10.1109/ICPP.2004.1327944
M3 - Conference contribution
AN - SCOPUS:10044240505
SN - 0769521975
T3 - Proceedings of the International Conference on Parallel Processing
SP - 365
EP - 372
BT - Proceedings - 2004 International Conference on Parallel Processing, ICPP 2004
A2 - Eigenmann, R.
T2 - Proceedings - 2004 International Conference on Parallel Processing, ICPP 2004
Y2 - 15 August 2004 through 18 August 2004
ER -