Scaling and parallelizing a scientific feature mining application using a cluster middleware

Leonid Glimcher, Xuan Zhang, Gagan Agrawal

Research output: Contribution to conferencePaperpeer-review

11 Scopus citations

Abstract

As scientific simulations are generating large amounts of data, analyzing this data to gain insights into scientific phenomenon is increasingly becoming a challenge. In this paper, we present a case study on the use of a cluster middleware for rapidly creating a scalable and parallel implementation of a scientific data analysis application. Using FREERIDE (Framework for Rapid Implementation of Datamining Engines), we parallelize as well as scale to disk-resident datasets a feature extraction algorithm. We have developed a parallel algorithm for this problem which matches the communication and computation structure supported by the FREERIDE system. The main observations from our experimental results are as follows: 1) the overhead of using the middleware is quite small in most cases, 2) there is an overhead associated with breaking the datasets into more partitions or chunks, and 3) if the dataset is partitioned into the same number of chunks, the execution time stays proportional to the size of the dataset and inversely proportional to the number of nodes, i.e, the overhead of communication or reading disk-resident datasets is very small.

Original languageEnglish (US)
Pages1227-1236
Number of pages10
DOIs
StatePublished - 2004
Externally publishedYes
EventProceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM) - Santa Fe, NM, United States
Duration: Apr 26 2004Apr 30 2004

Conference

ConferenceProceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)
Country/TerritoryUnited States
CitySanta Fe, NM
Period4/26/044/30/04

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Scaling and parallelizing a scientific feature mining application using a cluster middleware'. Together they form a unique fingerprint.

Cite this