This is a list of publications about provenance. This list is most likely not complete so if you know of any publications that are missing, please, send us an email at We hope this is a useful resource for those of you researching provenance.


  1. Jonathan Ledlie, Chaki Ng, David A. Holland, Kiran-Kumar Muniswamy-Reddy, Uri Braun, and Margo Seltzer. Provenance-Aware Sensor Data Storage. In NetDB 2005, April 2005.
  2. Paul Townend, Paul Groth, and Jie Xu. A Provenance-Aware Weighted Fault Tolerance Scheme for Service-Based Applications. In Proc. of the 8th IEEE International Symposium on Object-oriented Real-time distributed Computing (ISORC 2005), May 2005.
  3. J. Widom. Trio: a system for integrated management of data, accuracy, and lineage. In Second Biennial Conference on Innovative Data Systems Research (CIDR 2005), Asilomar, Calif., January 2005.

  1. V. H. K. Tan. Interaction tracing for mobile agent security. PhD thesis, University of Southampton, 2004.
  2. R. Bose and J. Frew. Composing lineage metadata with XML for custom satellite-derived data products. In 16th International Conference on Scientific and Statistical Database Management, pages 275 - 284, June 2004.
  3. Paul Groth, Michael Luck, and Luc Moreau. A protocol for recording provenance in service-oriented Grids. In Proceedings of the 8th International Conference on Principles of Distributed Systems (OPODIS'04), Grenoble, France, December 2004.
  4. P. Groth, M. Luck, and L. Moreau. Formalising a protocol for recording provenance in Grids. In Proc. of the UK OST e-Science second All Hands Meeting 2004 (AHM'04), Nottingham, UK, September 2004. [WWW ]
  5. P. Ruth, D. Xu, B. K. Bhargava, and F. Regnier. E-notebook Middleware for Acccountability and Reputation Based Trust in Distributed Data Sharing Communities. In Proc. 2nd Int. Conf. on Trust Management, Oxford, UK, volume 2995 of LNCS, 2004. Springer.
  6. Yong Zhao, Michael Wilde, Ian Foster, Jens Voeckler, Thomas Jordan, Elizabeth Quigg, and James Dobson. Grid middleware services for virtual data discovery, composition, and integration. In Proceedings of the 2nd workshop on Middleware for grid computing, New York, NY, USA, pages 57--62, 2004. ACM Press.

  1. Data Provenance and Annotation, December 2003. [WWW ]
  2. Y. Cui and J. Widom. Lineage tracing for general data warehouse transformations. The VLDB Journal, 12(1):41--58, 2003.
  3. J.D. Myers, A.R. Chappell, M. Elder, A. Geist, and J. Schwidder. Re-integrating the research record. IEEE Computing in Science & Engineering, pp 44-50, 2003.
  4. P. P. da Silva, D. L. McGuinness, and R. McCool. Knowledge Provenance Infrastructure. Data Engineering Bulletin, 26(4):26-32, December 2003.
  5. I. Foster, J. Vockler, M. Wilde, and Y. Zhao. The virtual data grid: A new model and architecture for data-intensive collaboration. In In Proc. of the CIDR 2003 First Biennial Conference on Innovative Data Systems Research, January 2003.
  6. M. Greenwood, C. Goble, R. Stevens, J. Zhao, M. Addis, D. Marvin, L. Moreau, and T. Oinn. Provenance of e-Science Experiments - experience from Bioinformatics. In Simon J Cox, editor, Proc. UK e-Science All Hands Meeting 2003, pages 223--226, September 2003.
  7. J. D. Myers, C. Pancerella, C. Lansing, K. L. Schuchardt, and B. Didier. Multi-scale science: supporting emerging practice with semantically derived provenance. In ISWC 2003 Workshop: Semantic Web Technologies for Searching and Retrieving Scientific Data, Sanibel Island, Florida, USA, October 2003.
  8. M. Szomszor and L. Moreau. Recording and Reasoning over Data Provenance in Web and Grid Services. In Int. Conf. on Ontologies, Databases and Applications of Semantics, volume 2888 of LNCS, 2003.
  9. J. Zhao, C. Goble, M. Greenwood, C. Wroe, and R. Stevens. Annotating, linking and browsing provenance logs for e-Science. In Proc. of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data, October 2003.

  1. Data Provenance/Derivation Workshop, October 2002. [WWW ]
  2. R. Bose. A Conceptual Framework for Composing and Managing Scientific Data Lineage. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management, Edinburgh, Scotland, pages 15-19, July 2002.
  3. P. Buneman, S. Khanna, K.Tajima, and W.C. Tan. Archiving scientific data. In Proc. of the 2002 ACM SIGMOD International Conference on Management of Data, pages 1--12, 2002. ACM Press.
  4. J. Eder, G. E. Olivotto, and W. Gruber. A Data Warehouse for Workflow Logs. In Y.Han, S.Tai, and D.Wikarski, editors, Engineering and Deployment of Cooperative Information Systems: First Int. Conf., EDCIS 2002, September 2002. Springer.
  5. I. Foster, J. Voeckler, M. Wilde, and Y.Zhao. Chimera: A Virtual Data System for Representing, Querying and Automating Data Derivation. In Proc. of the 14th Conf. on Scientific and Statistical Database Management, July 2002.
  6. C. Goble. Position Statement: Musings on provenance, workflow and (semantic web) annotations for bioinformatics.. In Workshop on Data Provenance and Derivation, October 2002.

  1. Y. Cui. Lineage Tracing in Data Warehouses. PhD thesis, Stanford University, December 2001.
  2. A. P. Marathe. Tracing Lineage of Array Data. J. Intell. Inf. Syst., 17(2-3):193--214, 2001.
  3. P. Buneman, S. Khanna, and W.C. Tan. Why and Where: A Characterization of Data Provenance. In Int. Conf. on Databases Theory (ICDT), 2001.
  4. I. Foster, E. Alpert, A. Chervenak, B. Drach, C. Kesselman, V. Nefedova, D. Middleton, A. Shoshani, A. Sim, and D. Williams.. The Earth System Grid II: Turning Climate Datasets Into Community Resources.. In Proc. of the American Meterologcal Society Conference, 2001.
  5. I. Foster, C. Kesselman, and S. Tuecke. The Anatomy of the Grid: Enabling Scalable Virtual Organizations. In Int. J. Supercomputer Applications, pages 15-18, 2001.
  6. J. Frew and R. Bose. Earth System Science Workbench: A Data Management Infrastructure for Earth Science Products. In Proceedings of the 13th International Conference on Scientific and Statistical Database Management, Fairfax, VA, pages 180-189, July 2001.

  1. Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst., 25(2):179--227, 2000.
  2. P. Buneman, S. Khanna, and W.C. Tan. Data Provenance: Some Basic Issues. In Foundations of Software Technology and Theoretical Computer Science, 2000.
  3. Y. Cui and J. Widom. Practical Lineage Tracing in Data Warehouses. In Proceedings of the 16th International Conference on Data Engineering (ICDE'00), San Diego, California, February 2000. [WWW ] Keyword(s): Data Warehousing.

  1. Allison Gyle Woodruff. Data Lineage and Information Density in Database Visualization. PhD thesis, University of California at Berkeley, 1998. [WWW ]
  2. A. Vahdat and T. Anderson. Transparent Result Caching. In Proc. of the 1998 USENIX Technical Conference, New Orleans, Louisiana, June 1998.

  1. G. Alonso and C. Hagen. Geo-Opera: Workflow Concepts for Spatial Processes. In Proc. 5th Intl. Symposium on Spatial Databases (SSD '97), Berlin, Germany, June 1997.
  2. A. Woodruff and M. Stonebraker. Supporting Fine-grained Data Lineage in a Database Visualization Environment. In Proc. of the 13th International Conference on Data Engineering, Birmingham, England, pages 91-102, April 1997.

  1. G. Alonso and A. El Abbadi. GOOSE: Geographic Object Oriented Support Environment. In Proc. of the ACM workshop on Advances in Geographic Information Systems, Arlington, Virginia, pages 38-49, November 1993.

  1. D.P. Lanter. Design of a Lineage-Based Meta-Data Base for GIS. Cartography and Geographic Information Systems, 18(4):255-261, 1991.
  2. D.P. Lanter. Lineage in GIS: The Problem and a Solution. Technical report 90-6, National Center for Geographic Information and Analysis (NCGIA), UCSB, Santa Barbara, CA, 1991.
  3. D.P. Lanter and R. Essinger. User-Centered Graphical User Interface Design for GIS. Technical report 91-6, National Center for Geographic Information and Analysis (NCGIA). UCSB, 1991.

  1. R. A. Becker and J. M. J. M. Chambers. Auditing of data analyses. SIAM Journal of Scientific and Statistical Computing, 9(4):747-760, 1988.

