Differencing Provenance in Scientific Workflows

Loading...
Thumbnail Image

Related Collections

Degree type

Discipline

Subject

Funder

Grant number

License

Copyright date

Distributor

Related resources

Author

Cohen-Boulakia, Sarah
Eyal, Anat

Contributor

Abstract

Scientific workflow management systems are increaingly providing the ability to manage and query the provenance of data products. However, the problem of differencing the provenance of two data products produced by executions of the same specification has not been adequately addressed. Although this problem is NP-hard for general workflow specifications, an analysis of real scientific (and business) workflows shows that their specifications can be captured as series-parallel graphs overlaid with well-nested forking and looping. For this natural restriction, we present efficient, polynomial-time algorithms for differencing executions of the same specification and thereby understanding the difference in the provenance of their data products. We then describe a prototype called PDiffView built around our differencing algorithm. Experimental results demonstrate the scalability of our approach using collected, real workflows and increasingly complex runs.

Advisor

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Publication date

2008-01-01

Volume number

Issue number

Publisher

Publisher DOI

relationships.isJournalIssueOf

Comments

University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-08-04.

Recommended citation

Collection