TY - GEN
T1 - Servicing range queries on multidimensional datasets with partial replicas
AU - Weng, Li
AU - Catalyurek, Umit
AU - Kurc, Tahsin
AU - Agrawal, Gagan
AU - Saltz, Joel
PY - 2005
Y1 - 2005
N2 - Partial replication is one type of optimization to speed up execution of queries submitted to large datasets. In partial replication, a portion of the dataset is extracted, re-organized, and re-distributed across the storage system. The objective is to reduce the volume of I/O and increase I/O parallelism for different types of queries and for the portions of the dataset that are likely to be accessed frequently. When multiple partial replicas of a dataset exist, query execution plan should be generated so as to use the best combination of subsets of partial replicas (and possibly the original dataset) to minimize query execution time. In this paper, we present a compiler and runtime approach for range queries submitted against distributed scientific datasets. A heuristic algorithm is proposed to choose the set of replicas to reduce query execution. We show the efficiency of the proposed method using datasets and queries in oil reservoir simulation studies on a cluster machine.
AB - Partial replication is one type of optimization to speed up execution of queries submitted to large datasets. In partial replication, a portion of the dataset is extracted, re-organized, and re-distributed across the storage system. The objective is to reduce the volume of I/O and increase I/O parallelism for different types of queries and for the portions of the dataset that are likely to be accessed frequently. When multiple partial replicas of a dataset exist, query execution plan should be generated so as to use the best combination of subsets of partial replicas (and possibly the original dataset) to minimize query execution time. In this paper, we present a compiler and runtime approach for range queries submitted against distributed scientific datasets. A heuristic algorithm is proposed to choose the set of replicas to reduce query execution. We show the efficiency of the proposed method using datasets and queries in oil reservoir simulation studies on a cluster machine.
UR - http://www.scopus.com/inward/record.url?scp=33845346800&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33845346800&partnerID=8YFLogxK
U2 - 10.1109/CCGRID.2005.1558635
DO - 10.1109/CCGRID.2005.1558635
M3 - Conference contribution
AN - SCOPUS:33845346800
SN - 0780390741
SN - 9780780390744
T3 - 2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005
SP - 726
EP - 733
BT - 2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005
T2 - 2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005
Y2 - 9 May 2005 through 12 May 2005
ER -