Differential analysis on deep web data sources

Tantan Liu; Fan Wang; Jiedan Zhu; Gagan Agrawal

doi:10.1109/ICDMW.2010.22

Differential analysis on deep web data sources

Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

4 Scopus citations

Abstract

The growing use of Internet in everyday life has been creating new challenges and opportunities to use data mining techniques. A relatively new trend in the Internet is the deep web. As a large number of deep web data sources tend to provide similar data, an important problem is to perform offline analysis to understand the differences in data available from different sources. This paper introduces data mining methods to extract a high-level summary of the differences in data provided by different deep web data sources.We consider pattern of values with respect to the same entity and we formulate a new data mining problem, which we refer to as differential rule mining. We have developed an algorithm for mining such rules. Our method includes a pruning method to summarize the identified differential rules. For efficiency, a hash-table is used to accelerate the pruning process. We show the effectiveness, efficiency, and utility of our methods by analyzing data across four travel-related web-sites.

Original language	English (US)
Title of host publication	Proceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010
Pages	33-40
Number of pages	8
DOIs	https://doi.org/10.1109/ICDMW.2010.22
State	Published - 2010
Externally published	Yes
Event	10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 - Sydney, NSW, Australia Duration: Dec 14 2010 → Dec 17 2010

Publication series

Name	Proceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)	1550-4786

Conference

Conference	10th IEEE International Conference on Data Mining Workshops, ICDMW 2010
Country/Territory	Australia
City	Sydney, NSW
Period	12/14/10 → 12/17/10

ASJC Scopus subject areas

General Engineering

Access to Document

10.1109/ICDMW.2010.22

Cite this

Liu, T, Wang, F, Zhu, J & Agrawal, G 2010, Differential analysis on deep web data sources. in Proceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010., 5693279, Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 33-40, 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010, Sydney, NSW, Australia, 12/14/10. https://doi.org/10.1109/ICDMW.2010.22

@inproceedings{3eae83e416ad450bb14e11798d686326,

title = "Differential analysis on deep web data sources",

abstract = "The growing use of Internet in everyday life has been creating new challenges and opportunities to use data mining techniques. A relatively new trend in the Internet is the deep web. As a large number of deep web data sources tend to provide similar data, an important problem is to perform offline analysis to understand the differences in data available from different sources. This paper introduces data mining methods to extract a high-level summary of the differences in data provided by different deep web data sources.We consider pattern of values with respect to the same entity and we formulate a new data mining problem, which we refer to as differential rule mining. We have developed an algorithm for mining such rules. Our method includes a pruning method to summarize the identified differential rules. For efficiency, a hash-table is used to accelerate the pruning process. We show the effectiveness, efficiency, and utility of our methods by analyzing data across four travel-related web-sites.",

author = "Tantan Liu and Fan Wang and Jiedan Zhu and Gagan Agrawal",

year = "2010",

doi = "10.1109/ICDMW.2010.22",

language = "English (US)",

isbn = "9780769542577",

series = "Proceedings - IEEE International Conference on Data Mining, ICDM",

pages = "33--40",

booktitle = "Proceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010",

note = "10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 ; Conference date: 14-12-2010 Through 17-12-2010",

}

TY - GEN

T1 - Differential analysis on deep web data sources

AU - Liu, Tantan

AU - Wang, Fan

AU - Zhu, Jiedan

AU - Agrawal, Gagan

PY - 2010

Y1 - 2010

N2 - The growing use of Internet in everyday life has been creating new challenges and opportunities to use data mining techniques. A relatively new trend in the Internet is the deep web. As a large number of deep web data sources tend to provide similar data, an important problem is to perform offline analysis to understand the differences in data available from different sources. This paper introduces data mining methods to extract a high-level summary of the differences in data provided by different deep web data sources.We consider pattern of values with respect to the same entity and we formulate a new data mining problem, which we refer to as differential rule mining. We have developed an algorithm for mining such rules. Our method includes a pruning method to summarize the identified differential rules. For efficiency, a hash-table is used to accelerate the pruning process. We show the effectiveness, efficiency, and utility of our methods by analyzing data across four travel-related web-sites.

AB - The growing use of Internet in everyday life has been creating new challenges and opportunities to use data mining techniques. A relatively new trend in the Internet is the deep web. As a large number of deep web data sources tend to provide similar data, an important problem is to perform offline analysis to understand the differences in data available from different sources. This paper introduces data mining methods to extract a high-level summary of the differences in data provided by different deep web data sources.We consider pattern of values with respect to the same entity and we formulate a new data mining problem, which we refer to as differential rule mining. We have developed an algorithm for mining such rules. Our method includes a pruning method to summarize the identified differential rules. For efficiency, a hash-table is used to accelerate the pruning process. We show the effectiveness, efficiency, and utility of our methods by analyzing data across four travel-related web-sites.

UR - http://www.scopus.com/inward/record.url?scp=79951760466&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79951760466&partnerID=8YFLogxK

U2 - 10.1109/ICDMW.2010.22

DO - 10.1109/ICDMW.2010.22

M3 - Conference contribution

AN - SCOPUS:79951760466

SN - 9780769542577

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 33

EP - 40

BT - Proceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010

T2 - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010

Y2 - 14 December 2010 through 17 December 2010

ER -

Differential analysis on deep web data sources

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this