Skip to topic | Skip to bottom

Open Provenance Model

OPM
OPM.ChangeProposalDublinCoreMapping

Start of topic | Skip to actions

Change Proposal: Endorse Dublin Core Mapping Profile

Authors

SimonMiles 2009 June 19.

Subject

Dublin Core Mapping profile

Background

Dublin Core metadata is used to represent provenance in a way which can cause an overlap of information between this metadata and OPM-based provenance. We want to be able to connect the two, or allow users of Dublin Core which wish to express richer detail about the provenance of resources to transfer to using OPM where appropriate.

Problem addressed

Mapping Dublin Core metadata terms to OPM graph patterns.

Proposed solution

Please see the current profile attached.

Rationale for the solution



Comments

Community is invited to provide comments on proposals.

comment 1 by Luc Moreau

I think it's useful and necessary to define this kind of profile for other "notions of provenance". I think it gives us the opportunity to tidy up "Agents" in the OPM spec. Do we really want dc:creator to be mapped to an opm:agent, or would they be better mapped to opm:process or opm:artifact.

Also, dc has the notion of mutable resources, which is not captured well (yet!) in OPM.

Comment 2 by Simon Miles

The profile has been updated (to version 0.3) to include a case study and graph figures. It is also simplified somewhat to rely on the existing OPM specification plus annotations and collections profile. Specifically, it now does not assume any change to the agents model, as this is a separate debate. Finally, it uses Dublin Core terms for annotations wherever possible, as re-use of existing standards may help acceptance of OPM.

This is the version ready for review.

Comment 3 by Paolo to the DC profile spec

sorry I am coming in so late in discussing this proposal

There are a few things I am not sure I understand in the mapping that is proposed in the spec. I pick one example here: the dc:provenance property discussed in sec. 2.17.

You write:

"Therefore, to map a Dublin Core relationship A dc:provenance P to OPM, we translate P to an OPM graph in which there is a path from A to every process, with these processes denoting the change in ownership. "

The range of the dc:provenance property is ProvenanceStatement?, so when you do your mapping, you essentially map a ProvenanceStatement? to a particular OPM graph, and I have trouble visualizing such graph. In general, shouldn't you also have a section where you map dc:ProvenanceStatement to the graph, and then refer to it in your dc:provenance mapping? i.e., build it from the bottom up

Comment by JimMyers - 17 Sep 2009

Comments on particular subsections below. Overall, I think this mapping is premature without nailing mutable resources in OPM on which to hang versions. Beyond that, I think discussion is needed about round-tripping - what can be inferred in each direction to/from OPM and DC that is needed to make sure the mapping is reasonable.

1.1 Versioning: OPM artifacts are not versions of each other, artifacts are versions of a non-OPM resource. Resources are things that can have a state that changes. I think a1-p1-a2 OPM statements would be annotated with a1 dc:isVersionOf R and a2 dc:isVersionOf R , not a2 dc:isVersionOf a1.

2.1 Accrual method – why does accrual method map to an agent rather than a process? I can see a granularity issue – the process in OPM terms that added the item to a collection is probably only a sub-process in a larger accrual method, but I don’t see that accrual method is an agent. An agent may employ the method, but method seems to imply a process. Perhaps method is more analogous to the workflow template (it’s timeless but structures real processes that act on given artifacts)? If so, I would think that some graph pattern where agent employing dc:accrualmethod opm:triggered opm:process would be more appropriate.

2.2 Available – I agree that dc:available is the opm:date of appearance of an artifact generated by an opm:process – makeAvailable. However, in the sense above that dc talks about resources whose state can change, it seems like it would be more appropriate to define a graph in which the dc:available date or a Resource R could be inferred from the creation time of a2 (a dc:isVersionOf R) from an opm:process iff we had a way to state that the process was of type ‘makeavailable’. Otherwise I think the correspondence is fairly loose: the dc:available date should not be before the creation data of some opm:artifact that is asserted to be a version of R.

2.3 Bibliographic citation: The citation is nominally based on attributes of the resource which could be derived in two ways from OPM graphs – from non-OPM annotations on an artifact representing the current version of the resource, or from inference over an OPM graph (e.g. the agent running a process could be considered a if not the creator of the generated artifact). I don’t think the section as written captures that or the subtleties involved – i.e. is the list of dc:creators the superset of the agent running the process and the dc:creators of all used artifacts? If a new version of a resource was not derived from the old one (I start from scratch to update my “OPM Talk”), does OPM info about the first version influence the bibliographic info?

2.4 Contributor: similar issues as above – a contributor could have been an agent running the process that created the latest artifact that is a version of R or it could be someone who contributed farther back in the provenance chain or a dc:contributor to some input artifact – seems like there is some way to put constraints on what patterns in OPM would be consistent with a dc:contributor statement but I don’t think it’s a 1-1 correspondence.

2.5 Creator: I can again see issues with the assumption that creator equates with agent running an opm:process that are tied a bit to the artifact-resource-version issues. If availability is related to an opm process making a version of a resource available, the agent running that process will often be an assistant – the secretary uploads the file for the author – so given the granularity OPM might be used at, I don’t think one can make strong statements. I’d be surprised if a creator was not an agent in some process along the way (perhaps not recorded in the account) but beyond that I don’t think I would assume much…

2.6-9 – Dates: Issues noted above related to versions as well as the fact that I’m not sure most people would create new versions (to a doc for example) to denote change of location, approval, etc. if those don’t change the value of the artifact. I haven’t thought it through, but I almost think that we need a way to talk about the provenance of a resource directly – it has a lifecycle and processes independent of the lifecycle and processing of artifacts that make up its various versions. One option would be to think of a resource as an artifact whose static state/value is the IDs of a version tree and lifecycle tree which then hold versions (other artifacts) and states (accepted, available, deprecated, … - which are also artifacts). Not sure if this holds together (something has to be mutable somewhere for this to work) but I think the flat model discussed in the proposal is not going to work well.

2.10 If we adopt an opm:contains concept, I think this maps reasonably well, but I wonder whether the dc:hasPart refers to the resource (which has artifacts that are versions in my rephrasing) or the version level. Assuming my arguments above, this might need to involve some inference – given an artifact a1 that is a version of R1 that also contains a second artifact a2, one could infer a new resource R2 that has a2 as a version that must also have a dc:hasPart relationship with R1

2.11 Version: Is hasVersion symmetric in Dublin core or do they just use resource both for the mutable thing and the immutable versions? As above, if we enforce immutability, which I think we need for OPM, we have to distinguish a new type of entity for a mutable resource. Continuing the thought, either mutable resources can’t go through processes or, if they do, perhaps we have the implication that versions are created. (i.e. a graph that said a book went through an opm:process and came out as an ouput would imply that the book has artifacts that are its versions and that a new one was created in the process…)

2.13 replaces: “in OPM ‘replaces’ is a type of wasderivedfrom”? We define replaces? Or is this just pointing out that replaces has the same directionality as wasderivedfrom and is a more natural direction to put a dc annotation?

2.17 Provenance: I think I agree in general that info in a dc:provenance annotation should imply something that OPM could represent, but is there a real mapping (is there any defined structure for dc:provenance)?

2.18 Publisher: “In OPM, being published is part of the a state which a resource may take, and therefore corresponds to a subset of artifacts corresponding to that resource.” – I don’t understand this sentence. Seems like publisher is an actor somewhere in the chain of provenance leading to an artifact with a published flag as part of its state, but that’s as loose as the other mappings for creator, etc.

2.20 Source: How does DC define this – source refers to a different logical resource rather than a version of this one? Seems like the discussion here would mean that both source and version dc tags would map to the same OPM construct and we’d be stuck with a one way mapping with no info to separate things out in the reverse direction (is this a general issue for other parts of the profile?).

3.0 – Some of these like reference have at least a vague provenance connection…

4.1 Why is there anything except one process with several agents running it? The graph as shown already implies separate artifacts that were somehow merged to create the final… I think this is true in this case but I think the use case should talk about what is derivable from the dc statements – all we know is multiple actors ran a process that created the collection. The next part of the discussion (4.2 is really going the other way – assuming that the OPM graph is really as shown in Fig 10, one can then derive (modulo my comments about parts earlier) that we can infer some dc:hasPart relations.

Comment by JimMyers - 17 Sep 2009

One more take on creator/contributor: I can see that one can always take a dc:creator tag and infer a graph that either has the creator as an agent or as the dc:creator of an earlier artifact. I think one could make some rules along those lines - dc:creator implies at least one of the two options above, seeing dc:creator and agents in an OPM graph allows yu to infer a DC:creator tag on the output (both subject to open world issues that the creator may not show up earlier in OPM accounts as written and just because OPM implies a causal d:creator link does not mean that the level of involvement rises to the level that would normally be recognized by dc:creator (Bill Gates is not my co-author because he contributed to the creation of Word...). Formalizng along these lines might be possible/useful.



Comment 6 by Simon Miles in reply to comments above - 18 Sep 2009

Thanks for the recent feedback on the profile. I will reply to and/or address those points later, but this is just a comment on the change proposal itself.

As I had got the impression in Amsterdam, the idea of endorsing an OPM profile was not to say that the profile was finished, but that this profile should exist at all and this be the 'official' one, presumably with the current document be the start for an eventually finished version. On this basis, I would (as you'd expect) vote in favour of the change proposal. However, if the intention is that endorsement means the profile is complete and we allow competing profiles for the same topic, then this is a different matter and I would vote against this change proposal and those for the other profiles, as they are only just being reviewed now. I'm not sure on which basis others (well, just Jim, at time of writing :-)) are voting.

Maybe Luc, Paul or Yogesh could clarify (or decide) what endorsing the profile would mean?

-- JimMyers - 18 Sep 2009

Good point for discussion. Or points. One is what a vote means - good direction or goal reached? (And if yes means good direction, is there another vote required to decide goal reached?) The other is what is a profile. In taking with Joe, I understand that the workshop framed that as just indicating 'advice' - the peple working on OPM think that using DC wth OPM is a good idea. I think that could be strengthened into some form of compliance level, la Paolo's comments elsewhere, and I was arguing in some proposals that there should really be some connecton with OPM in terms of synonyms or inference rules, etc. before somthing should be a profile (time annotations which should be constrained by OPM's statements of causal order, DC:creator connections to agents,etc.). To tie that to voting - should yes imply anyhing about whether the proposal meets oe of these definitions (and if so, which one)?

I guess I would argue that we should decide what to do about optional-required and compatible-interacting (i.e. are there inference connections) categories and then limit votes to approving finished work into particular (proposed by the profile developers)category. In some sense votes on whether the direction is good/interesting/important is so non-binding (does a no at this stage mean we'd reject a complete ogically consistent proposal for an optional profile?) that it doesn't say much beyond what could be doneby people commenting. I don't have a strong opinion if others want to use votes as some form of progress indictor/promoter - just thought I'd put out a starting option...

Comment 8 by Simon Miles in reply to Jim - 22 Sep 2009

I expected a no vote to endorsing a profile would mean, at least, that the topic is out of scope of OPM or the approach taken is unworkable. It could also mean that we don't think it is something where there will be a single suggested way to do things, i.e. there would be no interoperability or engineering benefit to mapping dublin core/collections/time/whatever to OPM in a single way and it would be better to let people find an approach that best suited them.

I was not expecting that it meant it was final for a version 1.0, but was perhaps mistaken.

Comment 9 by Paul Groth

I think the document is a good idea. My impression was that endorsement meant that the profile is on the "level of acceptance" as the main OPM spec. Lack of endorsement does not in any way mean that work should stop on the profile or that the profile is not useful. Personally, I was also under the impression that endorsement meant fairly wide adoption, which the DC profile has had yet.

Comment 10 by Luc Moreau

Our aim at this stage is to produce opm v1.1, and in this context decide what goes in profiles or core specs. I don't think we will have time to complete all profiles when opm v1.1 is released. Profiles will have their own development lifecycle. I believe we need a profile on dublin core, and this work needs to continue.

Vote

Jim Myers, No - needs further discussion

Simon Miles, yes if endorsing a profile does not require it to be final (for version 1.0), no otherwise

Paul Groth, No - good profile but not ready to be endorsed

Luc Moreau, this is good work in progress that needs to be continued.

-- SimonMiles - 19 Jun 2009
to top

I Attachment sort Action Size Date Who Comment
dcprofile.pdf manage 295.9 K 30 Jul 2009 - 11:42 SimonMiles Dublin Core profile version 0.3 - for review

You are here: OPM > WorkInProgressV1pt1 > ChangeProposalDublinCoreMapping

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback