TY - GEN
T1 - Enabling ad hoc queries over low-level scientific data sets
AU - Chiu, David
AU - Agrawal, Gagan
PY - 2009
Y1 - 2009
N2 - Technological success has ushered in massive amounts of data for scientific analysis. To enable effective utilization of these data sets for all classes of users, supporting intuitive data access and manipulation interfaces is crucial. This paper describes an autonomous scientific workflow system that enables high-level, natural language based, queries over low-level data sets. Our technique involves a combination of natural language processing, metadata indexing, and a semantically-aware workflow composition engine which dynamically constructs workflows for answering queries based on service and data availability. A specific contribution of this work is a metadata registration scheme that allows for a unified index of heterogeneous metadata formats and service annotations. Our approach thus avoids a standardized format for storing all data sets or the implementation of a federated, mediator-based, querying framework. We have evaluated our system using a case study from the geospatial domain to show functional results. Our evaluation supports the potential benefits which our approach can offer to scientific workflow systems and other domain-specific, data intensive applications.
AB - Technological success has ushered in massive amounts of data for scientific analysis. To enable effective utilization of these data sets for all classes of users, supporting intuitive data access and manipulation interfaces is crucial. This paper describes an autonomous scientific workflow system that enables high-level, natural language based, queries over low-level data sets. Our technique involves a combination of natural language processing, metadata indexing, and a semantically-aware workflow composition engine which dynamically constructs workflows for answering queries based on service and data availability. A specific contribution of this work is a metadata registration scheme that allows for a unified index of heterogeneous metadata formats and service annotations. Our approach thus avoids a standardized format for storing all data sets or the implementation of a federated, mediator-based, querying framework. We have evaluated our system using a case study from the geospatial domain to show functional results. Our evaluation supports the potential benefits which our approach can offer to scientific workflow systems and other domain-specific, data intensive applications.
UR - http://www.scopus.com/inward/record.url?scp=69049112126&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=69049112126&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-02279-1_17
DO - 10.1007/978-3-642-02279-1_17
M3 - Conference contribution
AN - SCOPUS:69049112126
SN - 3642022782
SN - 9783642022784
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 218
EP - 236
BT - Scientific and Statistical Database Management - 21st International Conference, SSDBM 2009, Proceedings
T2 - 21st International Conference on Scientific and Statistical Database Management, SSDBM 2009
Y2 - 2 June 2009 through 4 June 2009
ER -