Parallelizing an information theoretic Co-clustering algorithm using a cloud middleware

Venkatram Ramanathan, Wenjing Ma, Vignesh T. Ravi, Tantan Liu, Gagan Agrawal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

The emerging cloud environments are well suited for storage and analysis of large datasets, since they can allow on-demand access to resources. However, developing high-performance implementations of data analysis tasks is a challenging problem. In our prior work, we have developed a middleware called FREERIDE (FRamework for Rapid Implementation of Datamining Engines). FREERIDE is based upon the observation that the processing structure of a large number of data mining algorithms involves generalized reductions. FREERIDE offers a high-level interface and implements both distributed memory and shared memory parallelization. In this paper, we consider a challenging new data mining algorithm, information theoretic co-clustering, and parallelize it using the FREERIDE middleware. We show how the main processing loops of row clustering and column clustering of the Co-clustering algorithm can essentially be fit into a generalized reduction structure. We achieve good parallel efficiency, with a speedup of nearly 21 on 32 cores.

Original languageEnglish (US)
Title of host publicationProceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010
Pages186-193
Number of pages8
DOIs
StatePublished - 2010
Externally publishedYes
Event10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 - Sydney, NSW, Australia
Duration: Dec 14 2010Dec 17 2010

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference10th IEEE International Conference on Data Mining Workshops, ICDMW 2010
Country/TerritoryAustralia
CitySydney, NSW
Period12/14/1012/17/10

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Parallelizing an information theoretic Co-clustering algorithm using a cloud middleware'. Together they form a unique fingerprint.

Cite this