Skip to topic | Skip to bottom

Open Provenance Model

OPM
OPM.ChangeProposalRemoveOverlaps

Start of topic | Skip to actions

Change Proposal: Remove Asserted Account Overlaps

Authors

SimonMiles 2009 July 21, extracted from previous discussion in ChangeProposalRemoveNonCore.

Subject

Core OPM specification

Background

Along with other proposed changes, the aim is to remove possible unnecessary complexity and length of the specification, and the discouragement of adoption which that may bring.

Problem addressed

As above, the problem addressed is that a non-essential feature is increasing the complexity of the core specification. Specifically, asserting account overlaps other than refinement seems not to be essential to most uses of OPM and could be provided in a profile. Furthermore, refinement relations between accounts could be inferred, so asserting them is merely an addition of information to the graph to avoid having to re-infer it. Therefore, it could, and would intuitively to me, be expressed as an annotation to an account.

Proposed solution

I propose the following changes to the specification document.

  1. Define an annotation with property refines, a range of account and an object of account (following the annnotation definition style used in ChangeProposalAnnotations). The explanation of refinement would remain in the specification, but the Refines set would be removed from the formalisation.
  2. Move details on non-refinement overlaps to a profile, where we can define many new annotation types to express those overlap kinds. Specifically, the following would be removed.
    • Point 13 from chapter 4 of OPM v1.1: "Two account views can be declared to be overlapping to express the fact that they represent different descriptions of an execution."
    • The Overlaps sets from the formalisation(s) of OPM.
    • Removal of rule 11 in the formalisation of OPM v1.1

Rationale for the solution

If two accounts overlap, then this appears to be evident from the graph itself, and so does not need to be declared separately. What would be lost by removing the Overlaps set from the formal model?

If we do want to talk about overlapping accounts (whether in the core spec or a profile), e.g. to make legality clear, it would also be good to include an example of non-refinement overlap, to make its meaning and implications clear.

For refinement, I believe it is possible to determine whether one account is a refinement of another without explicit assertion. If a single process P uses A and generates B in one account, and a chain of processes Q initially use A and finally generate B in another account, is it possible for P not to be a refinement of Q? Given that each artifact is generated by only one process, presumably the 'end' of P must be the same as the 'end' of Q?



Comments

Community is invited to provide comments on proposals.

comment 1 by Luc Moreau

It is quite clear to me that the notions of account, refinement and overlap are novelties of OPM that we need to explore more. I think there is a case for overlapping (non refinement) accounts. I will try to write up the example of a process randomly selecting one of its two inputs. When the actual choice has not been observed by an observer, we can use alternating accounts to describe that one of the inputs was used.

comment 2 by Robert McGrath

WRT accounts: The whole idea of accounts is that they are "hearsay". Note that the definition of an OPM graph is that it is an account. There is no such thing as an OPM graph that is not hearsay.

By definition, there can be many different accounts of the same events. That is the nature of hearsay. Hence, accounts overlap in many ways.

This is critical to the entire OPM!

comment 3 by Simon Miles in response to comments 1, 2

Sorry, I was not clear enough about the change proposed regarding overlapping accounts. I did not mean to say that accounts cannot be about the same occurrences, or that they would not overlap.

The part of the specification I considered removable is the explicit assertion of non-refinement overlaps, and its inclusion in the formalised model. I don't see what is lost, and can see a gain in simplification, by removing declaration of overlaps between accounts. I have tried to clarify this by revising the change proposal above.

comment 4 by Luc Moreau in reply to comment 3

The reason why we have explicit assertions of account relationship (e.g. overlap, refinement, maybe others in the future) is that in the absence of such assertions, an opm graph reader would have to process the whole graph to infer them. It's not impossible, but it's tedious to do. Typically, this information is readily available to the opm graph creator and hence is worth including in the graph.

I think that overtime we will come up with new account relationships. Overlapping is a simple property (currently not very useful, I agree). Refinment is more interesting, and more work is required to tighten its definition. I attach an example of overlapping accounts that are not-refinement. Maybe we may want to introduce the "mutually exclusive" account relationship, which is another kind of overlap.

Comment 5 by Luc on the revised proposal

I feel that it's important to keep the overlap declaration because it helps a querier/visualisation tool to make sense of the graph, its nesting/overlap, without having to process it! What would be the cost of inferring pairwise overlap relationships in a graph with A accounts, N nodes, E edges. I think in the worst case, it's O(A^2 x N^2 x E) (for all edges propagate effective accounts to all nodes, then for each potential pairing of account check whether there is overlap.

Comment 6 by Simon Miles in reply to comments 4, 5

OK, I accept that there is a value to asserting inferrable information, so removing the cost of inferring it again. I had assumed that only non-inferrable information was captured in the core model, and so it is now this apparent inconsistency that troubles me.

Specifically, why aren't Used*, WasGeneratedBy?*, WasDerivedFrom?*, WasTriggeredBy?*, WasDependentOn?*, and MayHaveBeenDerivedFrom? part of the formal model? As they can be inferred, can't they be asserted? If so, what difference do they have to the relationships which are part of the formal model (Used, WasGeneratedBy? etc.) which mean they should not be included?

Also, I assume there must be a cut-off point where we decide something should not be a core part of the model but instead as an annotation with a defined type and/or in a profile. Otherwise, the core model will expand with ever more complexity. I do not find it that convincing that non-refinement overlaps are so useful that they are worth adding to the complexity of the core model.

Couldn't overlaps, including refinement, be expressed just as well as annotations to accounts? And if so, to keep the effort to adopting OPM to the minimum, couldn't we just define the annotation type for refinement in the core specification, and leave the rest for a profile on other types of account overlaps?

Comment 7 by Luc in response to comment 6

The formal model in the opmv1.01 spec is not complete. We do have Used*, WasGeneratedBy?*, WasDerivedFrom?*, WasDependentOn?* in the fopm paper with Jan and Natalia. WasTriggeredBy? is not transitive. WasDerivedFrom?* can be asserted, because it says something about the number of steps involved (1 or more). The other edges can be inferred.

It is well understood how to do transitive closures on edges, and databases and triple stores are getting very good at it. It's from that point of view that I find it's ok to infer them.

In theory, we could compute whether two graphs overlap or not, but as indicated it's expensive. It think that these are there primarily to help users (GUIs) to make sense of a graph, without having to pre-process it fully.

Why not annotations? hhm, pushing this philosophy to the extreme, we just have rdf. Annotations (though not specified yet) tend to be optional. I didn't feel this was optional.

-- JimMyers - 17 Sep 2009

I guess I'm confused by the comments on whether inferable relations are in the spec - I thought they were in that we defined what could be inferred to create a new account asserting these relationships. I the issue just that the formal / non-English part of the spec doc does not say this or is there a deeper issue?

As for the overlaps issue - I cannot yet recall the exact disussion that led to its inclusion but I think there were ambiguous cases where we thought an assertion would be needed. Trying to reconstruct - if one account said a1-p1-b1 and the other a2-p2-b2, an assertion of overlap is a statement that although the witnesses do not know common IDs for what they saw, they are claiming to have observed the same thing. I think our discussion captured more subtle cases where the accounts differed in that witnesses disagreed about the number of inputs or outputs of processes and overlaps was trying to assert that the witnesses know they had partial views of what happened (the user didn't see a debug log so i is OK that the sys admin account shows an extra output). Trying to reconstruct further I think the assertion was meant to help decide whether accounts that agreed on some input and output artifacts but differed by having some not in common should be considered to be consistent. I vagely recall Joe saying "open world assumption" in here somewhere and the overlaps assertion basically being an acknowlegement by the witness that they had incomplete info...

With that said, I dont think w've hit real use cases or chllenge questions where this is needed, partly because we have not pushed into the area of reconciling accounts. I think OPM needs to support this and thus, while we coud remove this from 1., I think it is a reasonable placeholder that does havesome value and it is worthwhile to keep as we explore potentially more powerful alternatives...

Comment 9 from Simon Miles in reply to Jim

I'm afraid I don't really get the use case described. You mean that two OPM sub-graphs could be asserted to be describing the same sequence of processes and artifacts even though they do not appear to be (due to lack of asserter knowledge)? If so, why would you even have to have common inputs/outputs?

With regards to overlaps being a "reasonable placeholder", my feeling is that this (or any) specification will be more readily adopted if there is a minimum in it and that which is there has a clear usefulness and meaning.

Comment 10 from Luc about Jan's comment

We are given a graph of several Gb, millions of Nodes, thousands of accounts to visualise. Why is it embarrassing to have the overlap declaration? I thought this could be useful to build a graphical representation, possibly by starting by a clustering of accounts.

Now from a practical view point, barely no participating team in PC3 declared overlapping and refined accounts. Did they produce illegal opm graphs? Probably not. So it seems that the declarations are effectively optional.

Comment 11 from Luc

I do believe in the usefulness of this declaration for very large graphs. I however recognise that it can be inferred. (Note that Jim's point is quite interesting, but it is not intent of the original declaration). I also acknolwedge that I am not at ease to consider a graph illegal if it does not have overlap declarations. Likewise, it would be silly to consider as overlapping only those accounts declared to be.

So, I defend this declaration from a performance viewpoint only. In that case, it should be seen as an optional annotation to graph.

Note, I however propose to keep the concept of overlapping accounts in the specification.

Comment 12 from Simon in reply to Comment 11

Yes, that sounds reasonable. I don't deny that the concept has meaning, and could be worth asserting even though it can be inferred (and I believe the same could be true of other information), and annotations seem a good way to include optional additional information such as this. I've adjusted my vote below accordingly.

Comment 13 by PaoloMissier

The situation with asserted/inferred statements is not unlike that of a set of axioms, say in OWL, on which entailments can be made using a reasoner, which is known to be computationally expensive and does not scale well with the number of axioms, or individuals. Here it is common practice to use the reasoner to check that the new ontology (in the case of OWL) with any new asserted but inferrable statement is indeed consistent. Thus, there is nothing wrong in principle in asserting inferrable statements, as long as one can verify that this does not lead to inconsistencies (i.e., logical contradictions).

The aspect of consistency verification has not been discussed enough IMO.

I also agree with Simon (comment 6), in that I believe the criteria for deciding what to include or exclude from the core are still quite blurry.

Comment 14 by Luc about comment 13.

Paolo, you're making good points here, and we should definitely incorporate this in the specification.

Can we try to formalize the criteria to be in/out the core? It would be good to have them, because they would provide a useful rationale for the spec. I am attempting to describe why the following constructs are in the core:

  • wasDerivedFrom: cannot be inferred
  • Used/generatedBy: cannot be inferred
  • wasTriggeredBy: necessary for process view where the communicated artifact is unknown (cannot be expressed otherwise really, since we have no way of talking about an artifact we do not know)???
  • Time: we want to support one and only one way of describing time (though it is optional to include time information in a graph). Is an integral part of the semantics (time annotations must be consistent with causality graph for it to be valid). Cannot be inferrred.
  • Role: cannot be inferred, discriminator between edges to/from a process
  • process/artifact: no other way of describing these entities
  • agent: we have not discussed them at all for v1.1.
  • Account: seems essential to allow multiple descriptions to coexist. Cannot be inferred.

On the other hand,

  • Overlaps not in core since can be inferred, it would be provide as an annotation for optimization purpose.



Vote

Simon Miles, yes, and I agree with Luc's variation on the proposal as articulated below

Jan Van den Bussche, yes, I find it an embarrassing feature of OPM

Luc Moreau, yes for 1) keeping the concept of overlapping account 2) removing the declaration Overlaps 3) introducing an annotation overlaps

PaoloMissier, yes in the sense proposed by Simon and Luc in the latest comments

-- EricStephan - 24 Sep 2009 - yes based on Luc's latest comments

Outcome

The result is: no: 0/Unconditional yes: 1/conditional yes: 3.

The recommendation is that the proposal is adopted, we keep the concept of overlapping account, we remove the declaration, and we introduce a profile for overlapping accounts.

-- SimonMiles - 21 Jul 2009
to top


You are here: OPM > WorkInProgressV1pt1 > ChangeProposalRemoveNonCore > ChangeProposalRemoveOverlaps

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback