Scaling and parallelizing a scientific feature mining application using a cluster middleware

Leonid Glimcher; Xuan Zhang; Gagan Agrawal

doi:10.1109/IPDPS.2004.1303029

Scaling and parallelizing a scientific feature mining application using a cluster middleware

Leonid Glimcher, Xuan Zhang, Gagan Agrawal

Research output: Contribution to conference › Paper › peer-review

11 Scopus citations

Abstract

As scientific simulations are generating large amounts of data, analyzing this data to gain insights into scientific phenomenon is increasingly becoming a challenge. In this paper, we present a case study on the use of a cluster middleware for rapidly creating a scalable and parallel implementation of a scientific data analysis application. Using FREERIDE (Framework for Rapid Implementation of Datamining Engines), we parallelize as well as scale to disk-resident datasets a feature extraction algorithm. We have developed a parallel algorithm for this problem which matches the communication and computation structure supported by the FREERIDE system. The main observations from our experimental results are as follows: 1) the overhead of using the middleware is quite small in most cases, 2) there is an overhead associated with breaking the datasets into more partitions or chunks, and 3) if the dataset is partitioned into the same number of chunks, the execution time stays proportional to the size of the dataset and inversely proportional to the number of nodes, i.e, the overhead of communication or reading disk-resident datasets is very small.

Original language	English (US)
Pages	1227-1236
Number of pages	10
DOIs	https://doi.org/10.1109/IPDPS.2004.1303029
State	Published - 2004
Externally published	Yes
Event	Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM) - Santa Fe, NM, United States Duration: Apr 26 2004 → Apr 30 2004

Conference

Conference	Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)
Country/Territory	United States
City	Santa Fe, NM
Period	4/26/04 → 4/30/04

ASJC Scopus subject areas

General Engineering

Access to Document

10.1109/IPDPS.2004.1303029

https://dblp.org/rec/conf/ipps/GlimcherZA04

Cite this

Scaling and parallelizing a scientific feature mining application using a cluster middleware. / Glimcher, Leonid; Zhang, Xuan; Agrawal, Gagan.
2004. 1227-1236 Paper presented at Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM), Santa Fe, NM, United States.

Research output: Contribution to conference › Paper › peer-review

Glimcher, L, Zhang, X & Agrawal, G 2004, 'Scaling and parallelizing a scientific feature mining application using a cluster middleware', Paper presented at Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM), Santa Fe, NM, United States, 4/26/04 - 4/30/04 pp. 1227-1236. https://doi.org/10.1109/IPDPS.2004.1303029

@conference{3084208bbe2e479bab72c6a47b66c4c6,

title = "Scaling and parallelizing a scientific feature mining application using a cluster middleware",

abstract = "As scientific simulations are generating large amounts of data, analyzing this data to gain insights into scientific phenomenon is increasingly becoming a challenge. In this paper, we present a case study on the use of a cluster middleware for rapidly creating a scalable and parallel implementation of a scientific data analysis application. Using FREERIDE (Framework for Rapid Implementation of Datamining Engines), we parallelize as well as scale to disk-resident datasets a feature extraction algorithm. We have developed a parallel algorithm for this problem which matches the communication and computation structure supported by the FREERIDE system. The main observations from our experimental results are as follows: 1) the overhead of using the middleware is quite small in most cases, 2) there is an overhead associated with breaking the datasets into more partitions or chunks, and 3) if the dataset is partitioned into the same number of chunks, the execution time stays proportional to the size of the dataset and inversely proportional to the number of nodes, i.e, the overhead of communication or reading disk-resident datasets is very small.",

author = "Leonid Glimcher and Xuan Zhang and Gagan Agrawal",

note = "DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.; Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM) ; Conference date: 26-04-2004 Through 30-04-2004",

year = "2004",

doi = "10.1109/IPDPS.2004.1303029",

language = "English (US)",

pages = "1227--1236",

}

TY - CONF

T1 - Scaling and parallelizing a scientific feature mining application using a cluster middleware

AU - Glimcher, Leonid

AU - Zhang, Xuan

AU - Agrawal, Gagan

N1 - DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.

PY - 2004

Y1 - 2004

N2 - As scientific simulations are generating large amounts of data, analyzing this data to gain insights into scientific phenomenon is increasingly becoming a challenge. In this paper, we present a case study on the use of a cluster middleware for rapidly creating a scalable and parallel implementation of a scientific data analysis application. Using FREERIDE (Framework for Rapid Implementation of Datamining Engines), we parallelize as well as scale to disk-resident datasets a feature extraction algorithm. We have developed a parallel algorithm for this problem which matches the communication and computation structure supported by the FREERIDE system. The main observations from our experimental results are as follows: 1) the overhead of using the middleware is quite small in most cases, 2) there is an overhead associated with breaking the datasets into more partitions or chunks, and 3) if the dataset is partitioned into the same number of chunks, the execution time stays proportional to the size of the dataset and inversely proportional to the number of nodes, i.e, the overhead of communication or reading disk-resident datasets is very small.

AB - As scientific simulations are generating large amounts of data, analyzing this data to gain insights into scientific phenomenon is increasingly becoming a challenge. In this paper, we present a case study on the use of a cluster middleware for rapidly creating a scalable and parallel implementation of a scientific data analysis application. Using FREERIDE (Framework for Rapid Implementation of Datamining Engines), we parallelize as well as scale to disk-resident datasets a feature extraction algorithm. We have developed a parallel algorithm for this problem which matches the communication and computation structure supported by the FREERIDE system. The main observations from our experimental results are as follows: 1) the overhead of using the middleware is quite small in most cases, 2) there is an overhead associated with breaking the datasets into more partitions or chunks, and 3) if the dataset is partitioned into the same number of chunks, the execution time stays proportional to the size of the dataset and inversely proportional to the number of nodes, i.e, the overhead of communication or reading disk-resident datasets is very small.

UR - http://www.scopus.com/inward/record.url?scp=12444270044&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=12444270044&partnerID=8YFLogxK

U2 - 10.1109/IPDPS.2004.1303029

DO - 10.1109/IPDPS.2004.1303029

M3 - Paper

SP - 1227

EP - 1236

T2 - Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)

Y2 - 26 April 2004 through 30 April 2004

ER -

Scaling and parallelizing a scientific feature mining application using a cluster middleware

Abstract

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this