Ives, Zachary
Loading...
Email Address
ORCID
Disciplines
Computer Sciences
relationships.isProjectOf
relationships.isOrgUnitOf
Position
Assistant Professor
Introduction
Zachary Ives is an Assistant Professor at the University of Pennsylvania and an Associated Faculty Member of the Penn Center for Bioinformatics. He received his B.S. from Sonoma State University and his PhD from the University of Washington. His research interests include data integration, peer-to-peer models of data sharing, processing and security of heterogeneous sensor streams, and data exchange between autonomous systems. He is a recipient of the NSF CAREER award and a member of the DARPA Computer Science Study Panel.
Research Interests
Databases, data integration, peer-to-peer computing, sensor networks
Collection
43 results
Search Results
Now showing 1 - 10 of 43
Publication MOSAIC: Declarative Platform for Dynamic Overlay Composition(2012-05-27) Loo, Boon Thau; Ives, Zachary G; Mao, Yun; Smith, Jonathan MOverlay networks create new networking services using nodes that communicate using pre-existing networks. They are often optimized for specific applications and targeted at niche vertical domains, but lack interoperability with which their functionalities can be shared. MOSAIC is a declarative platform for constructing new overlay networks from multiple existing overlays, each possessing a subset of the desired new network’s characteristics. This paper focuses on the design and implementation of MOSAIC: composition and deployment of control and/or data plane functions of different overlay networks, dynamic compositions of overlay networks to meet changing application needs and network conditions, and seamless support for legacy applications. MOSAIC overlays are specified using Mozlog, a new declarative language for expressing overlay properties independently from their particular implementation or underlying network. MOSAIC is validated experimentally using compositions specified in Mozlog in order to create new overlay networks with compositions of their functions: the i3 indirection overlay that supports mobility, the resilient overlay network (RON) overlay for robust routing, and the Chord distributed hash table for scalable lookups. MOSAIC uses runtime composition to simultaneously deliver application-aware mobility, NAT traversal and reliability. We further demonstrate MOSAIC’s dynamic composition capabilities by Chord switching its underlay from IP to RON at runtime. MOSAIC’s benefits are obtained at a low performance cost, as demonstrated by measurements on both a local cluster environment and the PlanetLab global testbed.Publication Piazza: Mediation and Integration Infrastructure for Semantic Web Data(2004-02-01) Ives, Zachary G; Halevy, Alon Y; Mork, Peter; Tatarinov, IgorThe SemanticWeb envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings for concepts and developed techniques for reasoning about them, using RDF as the data model. To flourish, the Semantic Web needs to provide interoperability -- both between sites with different terminologies and with existing data and the applications operating on them. To achieve this, we are faced with two problems. First, most of the world's data is available not in RDF but in XML; XML and the applications consuming it rely not only on the domain structure of the data, but also on its document structure. Hence, to provide interoperability between such sources, we must map between both their domain structures and their document structures. Second, data management practitioners often prefer to exchange data through local point-to-point data translations, rather than mapping to common mediated schemas or ontologies. This paper describes the Piazza system, which addresses these challenges. Piazza offers a language for mediating between data sources on the SemanticWeb, and it maps both the domain structure and document structure. Piazza also enables interoperation of XML data with RDF data that is accompanied by rich OWL ontologies. Mappings in Piazza are provided at a local scale between small sets of nodes, and our query answering algorithm is able to chain sets mappings together to obtain relevant data from across the Piazza network. We also describe an implemented scenario in Piazza and the lessons we learned from it.Publication A Substrate for In-Network Sensor Data Integration(2008-08-24) Mihaylov, Svilen; Jacob, Marie; Ives, Zachary G; Guha, SudiptoWith the ultimate goal of extending the data integration paradigm and query processing capabilities to ad hoc wireless networks, sensors, and stream systems, we consider how to support communication between sets of nodes performing distributed joins in sensor networks. We develop a communication model that enables in-network join at a variety of locations, and which facilitates coordination among nodes in order to make optimization decisions. While we defer a discussion of the optimizer to future work, we experimentally compare a variety of strategies, including at-base and in-network joins. Results show significant performance gains versus prior work, as well as opportunities for optimization.Publication Reliable Storage and Querying for Collaborative Data Sharing Systems(2010-03-01) Taylor, Nicolas; Ives, Zachary GThe sciences, business confederations, and medicine urgently need infrastructure for sharing data and updates among collaborators’ constantly changing, heterogeneous databases. The ORCHESTRA system addresses these needs by providing data transformation and exchange capabilities across DBMSs, combined with archived storage of all database versions. ORCHESTRA adopts a peer-to-peer architecture in which individual collaborators contribute data and compute resources, but where there may be no dedicated server or compute cluster. We study how to take the combined resources of ORCHESTRA’s autonomous nodes, as well as PCs from “cloud” services such as Amazon EC2, and provide reliable, cooperative storage and query processing capabilities. We guarantee reliability and correctness as in distributed or cloud DBMSs, while also supporting cross-domain deployments, replication, and transparent failover, as provided by peer-to-peer systems. Our storage and query subsystem supports dozens to hundreds of nodes across different domains, possibly including nodes on cloud services. Our contributions include (1) a modified data partitioning substrate that combines cluster and peer-to-peer techniques, (2) an efficient implementation of replicated, reliable, versioned storage of relational data, (3) new query processing and indexing techniques over this storage layer, and (4) a mechanism for incrementally recomputing query results that ensures correct, complete, and duplicate-free results in the event of node failure during query execution. We experimentally validate query processing performance, failure detection methods, and the performance benefits of incremental recovery in a prototype implementation.Publication Orchestra: Facilitating Collaborative Data Sharing(2007-06-11) Green, Todd J; Karvounarakis, Grigoris; Taylor, Nicholas E; Biton, Olivier; Ives, Zachary G; Tannen, ValOne of the most elusive goals of structured data management has been sharing among large, heterogeneous populations: while data integration [4, 10] and exchange [3] are gradually being adopted by corporations or small confederations, little progress has been made in integrating broader communities. Yet the need for large-scale sharing of heterogeneous data is increasing: most of the sciences, particularly biology and astronomy, have become data-driven as they have attempted to tackle larger questions. The field of bioinformatics, in particular, has seen a plethora of different databases emerge: each is focused on a related but subtly different collection of organisms (e.g., CryptoDB, TIGR, FlyNome), genes (GenBank, GeneDB), proteins (UniProt, RCSB Protein Databank), diseases (OMIM, GeneDis), and so on. Such communities have a pressing need to interlink their heterogeneous databases in order to facilitate scientific discovery.Publication Sensor Network Security: More Interesting Than You Think(2006-07-31) Anand, Madhukar; Cronin, Eric; Sherr, Micah; Blaze, Matthew A; Ives, Zachary G; Lee, InsupWith the advent of low-power wireless sensor networks, a wealth of new applications at the interface of the real and digital worlds is emerging. A distributed computing platform that can measure properties of the real world, formulate intelligent inferences, and instrument responses, requires strong foundations in distributed computing, artificial intelligence, databases, control theory, and security. Before these intelligent systems can be deployed in critical infrastructures such as emergency rooms and powerplants, the security properties of sensors must be fully understood. Existing wisdom has been to apply the traditional security models and techniques to sensor networks. However, sensor networks are not traditional computing devices, and as a result, existing security models and methods are ill suited. In this position paper, we take the first steps towards producing a comprehensive security model that is tailored for sensor networks. Incorporating work from Internet security, ubiquitous computing, and distributed systems, we outline security properties that must be considered when designing a secure sensor network. We propose challenges for sensor networks – security obstacles that, when overcome, will move us closer to decreasing the divide between computers and the physical world.Publication An Adaptive Query Execution System for Data Integration(1999-06-01) Ives, Zachary G; Florescu, Daniela; Friedman, Marc; Levy, Alon; Weld, Daniel SQuery processing in data integration occurs over network bound, autonomous data sources. This requires extensions to traditional optimization and execution techniques for three reasons: there is an absence of quality statistics about the data, data transfer rates are unpredictable and bursty, and slow or unavailable data sources can often be replaced by overlapping or mirrored sources. This paper presents the Tukwila data integration system, designed to support adaptivity at its core using a two-pronged approach. Interleaved planning and execution with partial optimization allows Tukwila to quickly recover from decisions based on inaccurate estimates. During execution, Tukwila uses adaptive query operators such as the double pipelined hash join, which produces answers quickly, and the dynamic collector, which robustly and efficiently computes unions across overlapping data sources. We demonstrate that the Tukwila architecture extends previous innovations in adaptive execution (such as query scrambling, mid-execution re-optimization, and choose nodes), and we present experimental evidence that our techniques result in behavior desirable for a data integration system.Publication Maintaining Recursive Views of Regions and Connectivity in Networks(2010-08-01) Liu, Mengmeng; Taylor, Nicholas E; Zhou, Wenchao; Ives, Zachary G; Loo, Boon ThauThe data management community has recently begun to consider declarative network routing and distributed acquisition: e.g., sensor networks that execute queries about contiguous regions, declarative networks that maintain shortest paths, and distributed and peer-to-peer stream systems that detect transitive relationships among data at the distributed sources. In each case, the fundamental operation is to maintain a view over dynamic network state. This view is typically distributed, recursive, and may contain aggregation, e.g., describing shortest paths or least costly paths. Surprisingly, solutions to computing such views are often domain-specific, expensive, and incomplete. We recast the problem as incremental recursive view maintenance given distributed streams of updates to tuples: new stream data becomes insert operations and tuple expirations become deletions. We develop techniques to maintain compact information about tuple derivability or data provenance. We complement this with techniques to reduce communication: aggregate selections to prune irrelevant aggregation tuples, provenance-aware operators that determine when tuples are no longer derivable and remove them from the view, and shipping operators that reduce the information being propagated while still maintaining correct answers. We validate our work in a distributed setting with sensor and network router queries, showing significant gains in communication overhead without sacrificing performance.Publication The ORCHESTRA Collaborative Data Sharing System(2008-01-01) Ives, Zachary G; Green, Todd J; Karvounarakis, Grigorios; Tannen, Val; Taylor, Nicholas E; Pratim Talukdar, Partha; Jacob, Marie; Pereira, FernandoSharing structured data today requires standardizing upon a single schema, then mapping and cleaning all of the data. This results in a single queriable mediated data instance. However, for settings in which structured data is being collaboratively authored by a large community, e.g., in the sciences, there is often a lack of consensus about how it should be represented, what is correct, and which sources are authoritative. Moreover, such data is seldom static: it is frequently updated, cleaned, and annotated. The ORCHESTRA collaborative data sharing system develops a new architecture and consistency model for such settings, based on the needs of data sharing in the life sciences. In this paper we describe the basic architecture and implementation of the ORCHESTRA system, and summarize some of the open challenges that arise in this setting.Publication Ronciling Differences(2011-01-01) Ives, Zachary G; Green, Todd J.; Tannen, ValIn this paper we study a problem motivated by the management of changes in databases. It turns out that several such change scenarios, e.g., the separately studied problems of view maintenance (propagation of data changes) and view adaptation (propagation of view definition changes) can be unified as instances of query reformulation using views provided that support for the relational difference operator exists in the context of query reformulation. Exact query reformulation using views in positive relational languages is well understood, and has a variety of applications in query optimization and data sharing. Unfortunately, most questions about queries become undecidable in the presence of difference (or negation), whether we use the foundational set semantics or the more practical bag semantics. We present a new way of managing this difficulty by defining a novel semantics, Z- relations, where tuples are annotated with positive or negative integers. Z-relations conveniently represent data, insertions, and deletions in a uniform way, and can apply deletions with the union operator (deletions are tuples with negative counts). We show that under Z-semantics relational algebra (R A) queries have a normal form consisting of a single difference of positive queries, and this leads to the decidability of their equivalence.We provide a sound and complete algorithm for reformulating R A queries, including queries with difference, over Z-relations. Additionally, we show how to support standard view maintenance

