TY - GEN
T1 - Supporting load balancing for distributed data-intensive applications
AU - Glimcher, Leonid
AU - Ravi, Vignesh T.
AU - Agrawal, Gagan
PY - 2009
Y1 - 2009
N2 - In data-intensive computing, an important problem that has received relatively little attention is of transparent processing of data stored in remote data repositories. Interesting load balancing considerations arise for these scenarios. Particularly, based on where data is generated and how it is shared, a dataset of interest can be divided across multiple data repositories, which may be geographically distributed and the data may be partitioned in a number of ways. This paper focuses on enabling such distributed processing of data from distributed resources. We have developed a load balancing algorithm, which minimizes the total time spent on processing the data. We consider weighted sum of two factors, a load balancing factor and a term that captures the amount of time spent by processing nodes waiting for the data. Our solutions have been implemented and evaluated in the context of FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grid). We have extensively evaluated our techniques using two data-intensive applications.
AB - In data-intensive computing, an important problem that has received relatively little attention is of transparent processing of data stored in remote data repositories. Interesting load balancing considerations arise for these scenarios. Particularly, based on where data is generated and how it is shared, a dataset of interest can be divided across multiple data repositories, which may be geographically distributed and the data may be partitioned in a number of ways. This paper focuses on enabling such distributed processing of data from distributed resources. We have developed a load balancing algorithm, which minimizes the total time spent on processing the data. We consider weighted sum of two factors, a load balancing factor and a term that captures the amount of time spent by processing nodes waiting for the data. Our solutions have been implemented and evaluated in the context of FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grid). We have extensively evaluated our techniques using two data-intensive applications.
UR - http://www.scopus.com/inward/record.url?scp=77952227539&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952227539&partnerID=8YFLogxK
U2 - 10.1109/HIPC.2009.5433204
DO - 10.1109/HIPC.2009.5433204
M3 - Conference contribution
AN - SCOPUS:77952227539
SN - 9781424449224
T3 - 16th International Conference on High Performance Computing, HiPC 2009 - Proceedings
SP - 235
EP - 244
BT - 16th International Conference on High Performance Computing, HiPC 2009 - Proceedings
T2 - 16th International Conference on High Performance Computing, HiPC 2009
Y2 - 16 December 2009 through 19 December 2009
ER -