Processing XML Streams with Deterministic Automata and Stream Indexes

Loading...
Thumbnail Image

Degree type

Discipline

Subject

Funder

Grant number

License

Copyright date

Distributor

Related resources

Author

Gupta, Ashish
Miklau, Gerome
Onizuka, Makoto
Suciu, Dan

Contributor

Abstract

We consider the problem of evaluating a large number of XPath expressions on a stream of XML packets. We contribute two novel techniques. The first is to use a single Deterministic Finite Automaton (DFA). The contribution here is to show that the DFA can be used effectively for this problem: in our experiments we achieve a constant throughput, independently of the number of XPath expressions. The major issue is the size of the DFA, which, in theory, can be exponential in the number of XPath expressions. We provide a series of theoretical results and experimental evaluations that show that the lazy DFA has a small number of states, for all practical purposes. These results are of general interest in XPath processing, beyond stream processing. The second technique is the Streaming IndeX (SIX), which consists of adding a small amount of binary data to each XML packet that allows the query processor to achieve significant speedups. As an application of these techniques we describe the XML Toolkit (XMLTK), a collection of command-line tools providing highly scalable XML data processing.

Advisor

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Publication date

2004-05-11

Journal title

Volume number

Issue number

Publisher

Publisher DOI

relationships.isJournalIssueOf

Comments

Postprint version. Copyright ACM 2004. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Transactions on Database Systems (TODS), Volume 29, Issue 4, December 2004, pages 752-788. Publisher URL: http://doi.acm.org/10.1145/1042046.1042051

Recommended citation

Collection