"Querying and Managing Provenance though User Views in Scientific Workflows"
Susan Davidson

Workflow systems have become increasingly popular for managing large-scale in-silico experiments where many bioinformatics tasks are chained together. Due to the large amount of data products generated by these experiments and the need for reproducible results, provenance has become of paramount importance. Several workflow systems are therefore starting to provide support for querying provenance. However, the amount of provenance information produced may be overwhelming, so there is a need for abstraction mechanisms to present the most relevant information.
The technique we pursue is that of ``user views." Since bioinformatics tasks may themselves be complex sub-workflows, the notion of a user view determines what level of granularity the user can see in the workflow. For example, biologists may simply wish a view in which reformatting tasks are hidden and biologically relevant tasks are seen. Thus the user view determines what data products and tasks can be seen and queried when answering questions of provenance. This talk gives an example of a phylogenomic analysis workflow, discusses the notion of user views relative to this workflow, demonstrates how user views can be used in provenance queries, and discusses how a user view is generated based on what tasks the user perceives to be biologically relevant in the workflow specification.

Susan B. Davidson received the B.A. degree in Mathematics from Cornell University, Ithaca, NY, in 1978, and the M.A. and Ph.D. degrees in Electrical Engineering and Computer Science from Princeton University, Princeton NJ, in 1980 and 1982. Dr. Davidson joined the University of Pennsylvania in 1982, and is now the Weiss Professor of Computer and Information Science and Deputy Dean of the School of Engineering and Applied Science. She is an ACM Fellow, a Fulbright scholar, and recently stepped down as founding co-Director of the Center for Bioinformatics at UPenn (PCBI).
Preceeding the formation of the PCBI, Dr. Davidson was involved with planning and administering an NSF funded research training program in computational biology, which has been run at the University of Pennsylvania since 1995. She also helped establish undergraduate degree programs in bioinformatics and computational biology run through the departments of Biology and Computer and Information Science, as well as tracks in this field in the Masters of Biotechnology degree program.
Dr. Davidson's research interests include database systems, database modeling, distributed systems, and bioinformatics. Within bioinformatics she is best known for her work in data integration, XML query and update technologies, and more recently provenance in workflow systems.