Davidson, Susan B

Email Address

ORCID

Disciplines

relationships.isProjectOf

relationships.isOrgUnitOf

Position

Introduction

Research Interests

Search Results

Now showing 1 - 10 of 72
  • Publication
    Programming Constructs for Unstructured Data
    (1995-03-01) Buneman, Peter; Davidson, Susan B; Suciu, Dan
    We investigate languages for querying and transforming unstructured data, by which we mean languages than can be used without knowledge of the structure (schema) of the database. Such data can be represented using labeled trees, as suggested by ACeDB (A C. elegans Database), a database system popular with biologists, and more recently in Tsimmis, a system developed at Stanford for heterogeneous data integration. The approach we take is to extend structural recursion to labeled trees. This poses some interesting problems: first, it is no longer ``flat’’ structural recursion, so that the usual syntactic forms and optimizations for collection types such as lists bags and sets may not be relevant. Second, we shall want to examine the possibility that the values we are manipulating may be cyclic. It is common in ACeDB, and generally in object-oriented databases, for objects to refer to each other, allowing the possibility of arbitrarily ``deep’’ queries. Of course, such cyclic structures are usually constructed through the use of a reference/pointer type; however query languages are insensitive to these object identities and perform automatic dereferencing. We therefore want to understand what programs are well defined when we are allowed to make unbounded searches in the database.
  • Publication
    Hiding Data and Structure in Workflow Provenance
    (2011-12-01) Davidson, Susan B; Bao, Zhuowei; Roy, Sudeepa
    In this paper we discuss the use of views to address the problem of providing useful answers to provenance queries while ensuring that privacy concerns are met. In particular, we propose a hierarchical workflow model, based on context-free graph grammars, in which fine-grained dependencies between the inputs and outputs of a module are explicitly specified. Using this model, we examine how privacy concerns surrounding data, module function, and workflow structure can be addressed.
  • Publication
    Labeling Recursive Workflow Executions On-the-Fly
    (2011-06-01) Bao, Zhuowei; Davidson, Susan B.; Milo, Tova
    This paper presents a compact labeling scheme for answering reachability queries over workflow executions. In contrast to previous work, our scheme allows nodes (processes and data) in the execution graph to be labeled on-the-fly, i.e., in a dynamic fashion. In this way, reachability queries can be answered as soon as the relevant data is produced. We first show that, in general, for workflows that contain recursion, dynamic labeling of executions requires long (linear-size) labels. Fortunately, most real-life scientific workflows are linear recursive, and for this natural class we show that dynamic, yet compact (logarithmic-size) labeling is possible. Moreover, our scheme labels the executions in linear time, and answers any reachability query in constant time. We also show that linear recursive workflows are, in some sense, the largest class of workflows that allow compact, dynamic labeling schemes. Interestingly, the empirical evaluation, performed over both real and synthetic workflows, shows that our proposed dynamic scheme outperforms the state-of-the-art static scheme for large executions, and creates labels that are shorter by a factor of almost 3.
  • Publication
    Querying an Object-Oriented Database Using CPL
    (1997-05-01) Davidson, Susan B.; Hara, Carmem; Popa, Lucian
    The Collection Programming Language is based on a complex value model of data and has successfully been used for querying transforming and integrating data from a wide variety of structured data sources - relational, ACeDB, and ASN.1 among others. However, since there is no notion of objects and classes in CPL, it cannot adequately model recursive types or inheritance, and hence cannot be used to query object-oriented databases (OODBs). By adding a reference type and four operations to CPL - dereference, method invocation, identity test and class type cast - it is possible to express a large class of interesting "safe" queries against OODBs. As an example of how the extended CPL can be used to query an OODB, we will describe how the extended language has been used as a query interface to Shore databases.
  • Publication
    Designing and Evaluating an XPath Dialect for Linguistic Queries
    (2006-04-03) Bird, Steven; Davidson, Susan B.; Lee, Haejoong; Zheng, Yifeng; Chen, Yi
    Linguistic research and natural language processing employ large repositories of ordered trees. XML, a standard ordered tree model, and XPath, its associated language, are natural choices for linguistic data and queries. However, several important expressive features required for linguistic queries are missing or hard to express in XPath. In this paper, we motivate and illustrate these features with a variety of linguistic queries. Then we propose extensions to XPath to support linguistic queries, and design an efficient query engine based on a novel labeling scheme. Experiments demonstrate that our language is not only sufficiently expressive for linguistic trees but also efficient for practical usage.
  • Publication
    A Bi-Labeling Based XPath Processing System
    (2010-01-01) Chen, Yi; Davidson, Susan B; Zheng, Yifei
    We present BLAS, a Bi-LAbeling based XPath processing System. BLAS uses two labeling schemes to speed up query processing: P-labeling for processing consecutive child (or parent) axis traversals, and D-labeling for processing descendant (or ancestor) axis traversals. XML data are stored in labeled form and indexed. Algorithms are presented for translating XPath queries to SQL expressions. BLAS reduces the number of joins in the SQL query translated from a given XPath query and reduces the number of disk accesses required to execute the SQL query compared with the traditional XPath processing using D-labeling alone. We also propose an approximate P-labeling scheme and the corresponding query translation algorithm to handle XML data trees that contain a large number of distinct tag names, and/or are very deep. This extension captures a spectrum of XPath-to-SQL query translation schemes, ranging from existing schemes that do not use P-labels to the one that uses exact P-labels. Experimental results demonstrate the efficiency of the BLAS system.
  • Publication
    Reasoning about Keys for XML
    (2001-09-08) Buneman, Peter; Davidson, Susan B.; Fan, Wenfei; Hara, Carmem; Tan, Wang-Chiew
    We study absolute and relative keys for XML, and investigate their associated decision problems. We argue that these keys are important to many forms of hierarchically structured data including XML documents. In contrast to other proposals of keys for XML, these keys can be reasoned about efficiently. We show that the (finite) satisfiability problem for these keys is trivial, and their (finite) implication problem is finitely axiomatizable and decidable in PTIME in the size of keys.
  • Publication
    Enabling Privacy in Provenance-Aware Workflow Systems
    (2011-01-01) Davidson, Susan B; Khanna, Sanjeev; Stoyanovich, Julia; Tannen, Val; Roy, Sudeepa; Chen, Yi; Milo, Tova
  • Publication
    Search and Result Presentation in Scientific Workflow Repositories
    (2013-05-17) Davidson, Susan; Huang, Xiaocheng; Stoyanovich, Julia; Yuan, Xiaojie
    We study the problem of searching a repository of complex hierarchical workflows whose component modules, both composite and atomic, have been annotated with keywords. Since keyword search does not use the graph structure of a workflow, we develop a model of workflows using context-free bag grammars. We then give efficient polynomial-time algorithms that, given a workflow and a keyword query, determine whether some execution of the workflow matches the query. Based on these algorithms we develop a search and ranking solution that efficiently retrieves the top-k grammars from a repository. Finally, we propose a novel result presentation method for grammars matching a keyword query, based on representative parse-trees. The effectiveness of our
  • Publication
    K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources
    (2001-03-01) Davidson, Susan B; Brunk, Brian P.; Crabtree, Jonathan; Tannen, Val; Schug, Jonathan; Overton, Chris; Stoeckert, Christian J.
    The integration of heterogeneous data sources and software systems is a major issue in the biomed ical community and several approaches have been explored: linking databases, "on-the- fly" integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear "winner". Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application.