Skip to topic | Skip to bottom

Open Provenance Model

OPM
OPM.ChangeProposalMultipleRoleLabels

Start of topic | Skip to actions

Change Proposal: Allow Multiple Role Labels on Edges

Authors

SimonMiles 2009 June 19.

Subject

Core OPM specification.

Background

Currently, the OPM specification only allows one role label per used or wasGeneratedBy edge. For the reasons given below, I believe this may be detrimentally restrictive.

Problem addressed

Only one role label per used and wasGeneratedBy edge allowed by the specification.

Proposed solution

Allow each used and wasGeneratedBy edge to be labelled with multiple roles, not just one.

Rationale for the solution

Ambiguity of Multiple Edges

A counter-argument to accepting this proposal may be that, where we want to provide multiple role labels, multiple edges, each with a different label, could be expressed in the graph. However, it is unclear whether multiple used edges, for example, also implies multiple uses of an artifact by a process. If so, this is not what we are trying to express with multiple role labels on one edge.

Multiple Ways to Name a Role

It does not unreasonable to expect different asserters to describe the role of the same artifact in the same process in different ways, e.g. different systems make use of different ontologies. For example, one agent may label the role as a:divisor, while another uses b:divisor. If two accounts are merged, there will be multiple role descriptions per edge.

Multiple Roles in a Single Usage/Generation

There are cases where one asserter would wish to identify multiple independent roles of an artifact in one account. For example, a workflow engine may distinguish between the roles of 'input' and 'parameter' artifacts, but it is also useful to specify what functional relationship these artifacts have to the workflow enacted, e.g. 'dividend' and 'divisor'. These are independent roles, therefore it is not obviously useful to require one to be a sub-type of another, e.g. dividend as sub-type of input.

In the case of Java-based provenance, we may describe the role of an argument to a method call as the 'nth' parameter, or by the parameter name. Given that callers may only know the order of arguments in calling (so the index matters) but that APIs may change (so the index is not reliable to interpret the role of the argument in the provenance), it could be helpful to have both.

Lack of Reason to Reject Proposal

Finally, there does not seem a strong counter-argument to allowing multiple roles per arc. It does not hinder interoperability, as if you are checking for the existence of a role in a query having other roles present does not restrict you. It is not compulsory, so does not discourage adoption.



Comments

Community is invited to provide comments on proposals.

comment 1 by Luc Moreau

I think the problem is not correctly posed, and does not expose all the issues to take into consideration.

The example give above may not be correct: "e.g. different systems make use of different ontologies. For example, one agent may label the role as a:divisor, while another uses b:divisor. If two accounts are merged, there will be multiple role descriptions per edge. ".

Indeed, the meaning of roles (according to OPMv1.01) is defined by the process they are attached to. So, we should consider the more general problems where two asserters a and b define

 pa -> used(ra) -> aa  in account acc_a
 pb -> used(rb) -> ab  in account acc_b
Let us assume that each process is given a type (in an ontology) and a persistent name.
 pa hasType ontoa:ta
 pb hasType ontob:tb 
 pa hasPersistentName na
 pb hasPersistentName nb

The meaning of ra is given by the ontology of pa: ontoa:ra (and likewise for rb).

To understand if and when the problem that Simon describes occurs, I think we need to do a case analysis:

1.

  acc_a <> acc_b
If the two accounts are different, that's fine, the two used edges, can coexist because they belong to different accounts.

2.

  acc_a = acc_b
2.a Process identities are different, so they are not merged.
  na <> nb
Again, that's fine, we have two different Used edges.

2.b Process identities are the same, so they can be merged in a union operation.

  na ==nb (and likewise for artifact names)

2.b.1 Process has same type

  ontoa:ta == ontob:tb

2.b.1.a Roles are identical

  ra = rb
So fine, the union results in a single edge.

2.b.1.b Roles differ

  ra <> rb
The meaning of the roles is defined by the process ontology. I believe this case is the one that Simon refers to, though I am not sure. I claim that it is fine to have two Used edges with roles ra and rb. The ontology should resolve the "Multiple Roles in a Single Usage/Generation" raised by Simon.

For instance, the ontology could declare roles: divisor, dividend, and 1st, 2nd and declare that divisor and 1st are interchangeable (likewise for dividend and 2nd).

2.b.2

  ontoa:ta <> ontob:tb
So necessarily,
  ra <> rb
since roles should be understood in the context of the processes that defined them, so:
  ontoa:ta:ra <> ontob:tb:rb
Hence, we can have two edges, with the respective roles. They offer two separate descriptions in terms of two different ontologies.

Conclusion

I don't see the case for multiple roles on edges, given that their meaning is given by the process they relate to.

Comment 2 by Simon Miles in reply to Comment 1

I think I understand the argument, but have a couple of problems with it.

First, it states a particular view of the world, where each process is associated with one ontology. Given that OPM does not prevent multiple independent actors documenting the same process, I don't see why this would necessarily be so. Also, would it have to be a new ontology for every OPM process, i.e. one execution of a procedure, or only one for each procedure of which the process is an execution? In human documentation, people describe the same events in different terms, with those terms having different connotations. Why would this be different for OPM? Doesn't the restriction of one ontology per process require co-ordination between OPM producers beyond the scope of which the specification should be prescribing?

Second, with regards to the ontology resolving the "Multiple Roles in a Single Usage/Generation" problem, doesn't requiring this place more of a burden the ontology developer than is reasonable? Why would it be a better solution than allowing multiple role labels? To return to the example of naming something both an 'input' (as opposed to 'parameter') and a 'dividend' (as opposed to 'divisor'), I would not expect an ontology written along with the application to say that every input is a dividend, or even that every dividend is an input, so we would require a 'dividend-and-input' concept to be created. Would this seem reasonable to users of OPM when the two concepts they wish to use already exist in the domain ontology?

Finally, I still can't think of an argument for not allowing multiple role labels.

Comment 3 by Luc Moreau

I suggest that x -(r1,r2)-> y be regarded as an abbreviation for x-r1-> y and x-r2->y. The latter is already supported by OPM (where both edges can be in a same account or in differing accounts).

If x -(r1,r2)-> y is not an abbreviation, than what would be the meaning of an OPM graph with the following edges:

  1. x -(r1,r2)-> y
  2. x-r1-> y
  3. x-r2-> y



Comment 4 by Luc Moreau

It seems that this propsal is to allow "synonyms" for roles: e.g. divisor and second argument. Why not express such a capability by means of annotation?

Comment 5 by Luc Moreau

I suggest that x -(r1,r2)-> y be regarded as an abbreviation for x-r1-> y and x-r2->y. The latter is already supported by OPM (where both edges can be in a same account or in differing accounts). In that case it is up to the semantics of the process to identify the meanings of r1 and r2, and it should determine whether an artifact was used/generate multiple time or not (for r1 and r2).

I however don't see a strong use case for this.

-- JimMyers - 17 Sep 2009

I think we introduced roles as a way to catch when there was disagreement between accounts despite agreement on what was used - i.e. a and b both used but acc1 says a is the divisor and acc2 says that b is. Thus I think we have some form of constraint that if roles are defined and different for two+ inputs, the role assignments must agree for the accounts to be in agreement.

Given the complexities thatcould be involved in role ontologies, I don't think OPM should delve into resolving anything but exact matches between accounts ala the current rules. If so, I don't think that allowing multiple roles for a given used edge, or mltiple used edges within one account makes sense. There's no prohibition now about differences between accounts so that would still be a way to carry the info around.

Comment 6 by Simon Miles in reply to comment 5 and Jim above

The main purpose of role labels I am considering in this proposal is querying: if I want to know about the provenance of the divisor in an operation, I follow the edge labelled "divisor". For the reasons above, particularly "Multiple Roles in a Single Usage/Generation", allowing only one role in one account seems restrictive and unnecessary.

With regards to putting two labels in a merged data structure, (r1,r2), this is fine (and meets the change proposal), as long it is possible for an OPM-parsing tool to know how to separate them. If nothing is said about multiple roles in the spec and we allow the merged data structure to be used, I can't see how a query can check whether an edge has one particular label.

Comment 7 by Paolo Missier

If the purpose of the multiple labels is to allow synonyms, as it seems from the early justification and comments and as Luc suggests in comment 4, then I would argue that synonymy issues should be resolved elsewhere, rather at the point of use (i.e., when role labels are used). If the terms belong to some ontology, we have standard vocubulary in place to express that they are the same. So although I can see that there may be additional, unanticipated need for multiple labels, I would vote no because I think these needs should be discussed when they arise.

Vote

Luc Moreau, No, I don't see a use case for the proposal

Jim Myers, No

Simon Miles, Yes

Paul Groth, No

PaoloMissier, no

-- SimonMiles - 19 Jun 2009

Outcome

The vote is as follows: No: 4/Yes: 1.

There is no majority for this proposal. The issue however needs to be monitored. As we discover use cases where multiple roles may be required, we should reopen this proposal and/or find was of solving the problem.
to top


You are here: OPM > WorkInProgressV1pt1 > ChangeProposalMultipleRoleLabels

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback