Query planning for searching inter-dependent deep-web databases

Fan Wang, Gagan Agrawal, Ruoming Jin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

Increasingly, many data sources appear as online databases, hidden behind query forms, thus forming what is referred to as the deep web. It is desirable to have systems that can provide a high-level and simple interface for users to query such data sources, and can automate data retrieval from the deep web. However, such systems need to address the following challenges. First, in most cases, no single database can provide all desired data, and therefore, multiple different databases need to be queried for a given user query. Second, due to the dependencies present between the deep-web databases, certain databases must be queried before others. Third, some database may not be available at certain times because of network or hardware problems, and therefore, the query planning should be capable of dealing with unavailable databases and generating alternative plans when the optimal one is not feasible. This paper considers query planning in the context of a deep-web integration system. We have developed a dynamic query planner to generate an efficient query order based on the database dependencies. Our query planner is able to select the top K query plans. We also develop cost models suitable for query planning for deep web mining. Our implementation and evaluation has been made in the context of a bioinformatics system, SNPMiner. We have compared our algorithm with a naive algorithm and the optimal algorithm. We show that for the 30 queries we used, our algorithm outperformed the naive algorithm and obtained very similar results as the optimal algorithm. Our experiments also show the scalability of our system with respect to the number of data sources involved and the number of query terms.

Original languageEnglish (US)
Title of host publicationScientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings
Pages24-41
Number of pages18
DOIs
StatePublished - 2008
Externally publishedYes
Event20th International Conference on Scientific and Statistical Database Management, SSDBM 2008 - Hong Kong, China
Duration: Jul 9 2008Jul 11 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5069 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Conference on Scientific and Statistical Database Management, SSDBM 2008
Country/TerritoryChina
CityHong Kong
Period7/9/087/11/08

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Query planning for searching inter-dependent deep-web databases'. Together they form a unique fingerprint.

Cite this