TY - GEN
T1 - A tool for supporting integration across multiple flat-file datasets
AU - Xuan, Zhang
AU - Agrawal, Gagan
N1 - DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.
PY - 2006
Y1 - 2006
N2 - Traditionally, biologists focused on a single research subject. New high-throughput experimental and analytical technologies, such as microarray and BLAST programs, have changed this. An important functionality required now is the ability to process queries about multiple data entries with little user intervention. This paper presents the design, implementation, and evaluation of a data integration tool that supports database-like query operations across flat-file biological datasets. Compared with the existing solutions, our system has several advantages, i.e., no database management system is required, users can still use declarative languages to communicate with the system, and no data parsing, loading, or indexing utility programs need to be written. We have used the system on three biological queries, each of which was inspired by an actual study from bioinformatics research literature. These case studies have demonstrated the functionality and scalability of our tool. Overall, our approach provides a light-weight and scalable solution for data integration over flat-file datasets.
AB - Traditionally, biologists focused on a single research subject. New high-throughput experimental and analytical technologies, such as microarray and BLAST programs, have changed this. An important functionality required now is the ability to process queries about multiple data entries with little user intervention. This paper presents the design, implementation, and evaluation of a data integration tool that supports database-like query operations across flat-file biological datasets. Compared with the existing solutions, our system has several advantages, i.e., no database management system is required, users can still use declarative languages to communicate with the system, and no data parsing, loading, or indexing utility programs need to be written. We have used the system on three biological queries, each of which was inspired by an actual study from bioinformatics research literature. These case studies have demonstrated the functionality and scalability of our tool. Overall, our approach provides a light-weight and scalable solution for data integration over flat-file datasets.
UR - http://www.scopus.com/inward/record.url?scp=34547429912&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547429912&partnerID=8YFLogxK
U2 - 10.1109/BIBE.2006.253327
DO - 10.1109/BIBE.2006.253327
M3 - Conference contribution
SN - 0769527272
SN - 9780769527277
T3 - Proceedings - Sixth IEEE Symposium on BioInformatics and BioEngineering, BIBE 2006
SP - 141
EP - 148
BT - Proceedings - Sixth IEEE Symposium on BioInformatics and BioEngineering, BIBE 2006
T2 - 6th IEEE Symposium on BioInformatics and BioEngineering, BIBE 2006
Y2 - 16 October 2006 through 18 October 2006
ER -