Servicing range queries on multidimensional datasets with partial replicas

Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

Partial replication is one type of optimization to speed up execution of queries submitted to large datasets. In partial replication, a portion of the dataset is extracted, re-organized, and re-distributed across the storage system. The objective is to reduce the volume of I/O and increase I/O parallelism for different types of queries and for the portions of the dataset that are likely to be accessed frequently. When multiple partial replicas of a dataset exist, query execution plan should be generated so as to use the best combination of subsets of partial replicas (and possibly the original dataset) to minimize query execution time. In this paper, we present a compiler and runtime approach for range queries submitted against distributed scientific datasets. A heuristic algorithm is proposed to choose the set of replicas to reduce query execution. We show the efficiency of the proposed method using datasets and queries in oil reservoir simulation studies on a cluster machine.

Original languageEnglish (US)
Title of host publication2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005
Pages726-733
Number of pages8
DOIs
StatePublished - 2005
Externally publishedYes
Event2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005 - Cardiff, Wales, United Kingdom
Duration: May 9 2005May 12 2005

Publication series

Name2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005
Volume2

Conference

Conference2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005
Country/TerritoryUnited Kingdom
CityCardiff, Wales
Period5/9/055/12/05

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Servicing range queries on multidimensional datasets with partial replicas'. Together they form a unique fingerprint.

Cite this