Supporting load balancing for distributed data-intensive applications

Leonid Glimcher, Vignesh T. Ravi, Gagan Agrawal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

In data-intensive computing, an important problem that has received relatively little attention is of transparent processing of data stored in remote data repositories. Interesting load balancing considerations arise for these scenarios. Particularly, based on where data is generated and how it is shared, a dataset of interest can be divided across multiple data repositories, which may be geographically distributed and the data may be partitioned in a number of ways. This paper focuses on enabling such distributed processing of data from distributed resources. We have developed a load balancing algorithm, which minimizes the total time spent on processing the data. We consider weighted sum of two factors, a load balancing factor and a term that captures the amount of time spent by processing nodes waiting for the data. Our solutions have been implemented and evaluated in the context of FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grid). We have extensively evaluated our techniques using two data-intensive applications.

Original languageEnglish (US)
Title of host publication16th International Conference on High Performance Computing, HiPC 2009 - Proceedings
Pages235-244
Number of pages10
DOIs
StatePublished - 2009
Externally publishedYes
Event16th International Conference on High Performance Computing, HiPC 2009 - Kochi, India
Duration: Dec 16 2009Dec 19 2009

Publication series

Name16th International Conference on High Performance Computing, HiPC 2009 - Proceedings

Conference

Conference16th International Conference on High Performance Computing, HiPC 2009
Country/TerritoryIndia
CityKochi
Period12/16/0912/19/09

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'Supporting load balancing for distributed data-intensive applications'. Together they form a unique fingerprint.

Cite this