Skip to topic | Skip to bottom

Open Provenance Model

OPM
OPM.ChangeProposalWasDerivedCannotBeInferred

Start of topic | Skip to actions

WasDerivedFrom Cannot be Inferred

Authors

Luc Moreau, June 19, 2009

Subject

OPM v1.01.

Background

OPM v1.00 introduced an inference rule that allows us to infer a WasDerivedFrom edge. This inference rule was observed to be incorrect, and OPMv1.01 made this clear in rule (3) of Figure 11:

<a2,r2,p1,acc2> in WasGeneratedBy and <p1,r1,a1,acc1> in Used
----------------------------------------------------------------
  <a2,a1,acc1 union acc2> in MayHaveBeenDerivedFrom

Problem addressed

In this page, we make it clear why the inference rule is incorrect. We will put this page to the ballot, stating that this problem needs to be addressed. We are not proposing a solution here.

Why is the inference rule incorrect?

According to Definition 6, a2 was generated by p1, if p1 was required to initiate its execution for a2 to be generated. Hence, a2 was generated after p1 started.

According to Definition 5, p1 used a1, if the availability of a1 was required for p1 to complete its execution. Hence, a1 must have been generated before p1 completed.

Hence ,we have: a1 < p1End and p1Start < a2 and p1Start<p1End

Therefore, there is not guarantee that a1 was generated before a2 was generated.

Hence, we cannot infer that a2 was derived from a1 (see Definition 8).

The origin of this problem is that wasGeneratedBy and Used are essentially temporal relations that are setting weak constraints between the artifacts and the processes that used and derived them. In the above example, it is possible that an artifact a2 was generated before an artifact a1, whilst a2 was generated by p1 and p1 used a2. If so, then clearly a2 cannot be derived from a1.

Proposed solution

This page has demonstrated that the inference rule, based on the definitions on the specification, is incorrect. A number of possible solutions to address the problem can be considered. They should be expressed as separate proposals.

Rationale for the solution

It is not reasonable to allow incorrect inference rules. This problem is to be addressed.

One may wonder why the definitions for used and wasGeneratedBy introduce such weak temporal constraints between artifacts and processes. A reason is that OPM was designed to be compositional. If, at some level of abstraction, we know of two processes p1 and p2, using and generating artifacts as follows.

a3 -> wasGeneratedBy  -> p1 -> used -> a1
a4 -> wasGeneratedBy  -> p2 -> used -> a2

If at some other level of abstraction, we know that process p3 consists of two parallel processes p1 and p2. Then, we want to be able to say that:

a3,a4 -> wasGeneratedBy  -> p -> used -> a1,a2

The definitions of edges wasGeneratedBy and used allow us to express such dependencies, as we move up levels of abstraction. If we had consider stricter constraints for these definitions, such as a1 and a2 needed to be available before p1 and p2 started respectively, it may not have been the case that both a1 and a2 were available before p was able to start.



Comments

Community is invited to provide comments on proposals.

Comment 1 by Simon Miles

My preference would be to remove the inference rule entirely. The special cases where the inference of derivation is possible, which I can see could be useful, can be documented in profiles about specific forms of application. This would also apply to inference of derivation with a well defined uncertainty.

And I can't see how we could get around the current weak semantics of used and wasGeneratedBy. Any stronger semantics requires a knowledge of the internals of processes which the recorder of OPM documentation simply may not have.

-- JimMyers - 17 Sep 2009

I agree the inference can be incorrect when p1 is a composite process. My concern would be that if people do not assert this relationship, OPM's ability to infer coarser representations become seriously compromised. I hve not thought it through, but I would almost flip this around and ake a OPM requirement to remove the incorrect inference case - it is illegal to say a1-p1-a2 unless a1 was used before a2 was completely generated. Allowing someone to make a state where the derivd from inference is not valid at least in the sense of control flow does not seem to add anything to OPM's capability to represent tue provenance.

Comment by Paolo Missier

I am still confused both by the rule, and, honestly, by the counter-example that shows its incorrectness (incidentally, shouldn't "whilst a2 was generated by p1 and p1 used a2" be "whilst a1 was generated by p1 and p1 used a2"). I also have trobule following the reasoning here: "If we had consider stricter constraints for these definitions, such as a1 and a2 needed to be available before p1 and p2 started respectively, it may not have been the case that both a1 and a2 were available before p was able to start."

I am in favour of removing the rule.



Vote

Luc Moreau, yes, the inference is incorrect, i am in favor of ChangeProposalDeleteWasDerivedInference.

Jim Myes, yes

Paolo Missier, yes

Simon Miles, yes

Paul Groth, yes

NataliaKwasnikowska, yes

Jan Van den Bussche, yes

Outcome

Unanimity in favour of the proposal: Yes: 7/No: 0.

We agree that the inference of an edge wasDerivedFrom is not correct.

-- LucMoreau - 15 Jun 2009 -- LucMoreau - 19 Jun 2009
to top


You are here: OPM > WorkInProgressV1pt1 > ChangeProposalWasDerivedCannotBeInferred

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback