Skip to topic | Skip to bottom

Provenance Challenge

Challenge
Challenge.FirstProvenanceChallenge

Start of topic | Skip to actions

First Provenance Challenge

Aims

The provenance challenge aims to establish an understanding of the capabilities of available provenance-related systems and, in particular, the following details.

  • The representations that systems use to document details of processes that have occurred
  • The capabilities of each system in answering provenance-related queries
  • What each system considers to be within scope of the topic of provenance (regardless of whether the system can yet achieve all problems in that scope)

To help achieve the aims, we define a simple example workflow that forms the basis of the challenge. It is inspired from a real experiment, in the area of Functional Magnetic Resonance Imaging (fMRI). Here, we use the term workflow to denote a series of procedures being performed in a system, each taking some data as input and producing other data as output. We do not assume that these procedures must use some particular form of technology (EXE files, Web Services etc.) or that the workflow is explicitly defined in a workflow technology (BPEL, compiled executable, Scufl, batch file etc.), but individual participants will adopt their technology of choice.

Our focus in this challenge is on provenance and not on running the experiment. Hence, to facilitate take-up, while based on a real experiment, the procedures can be implemented as "dummies", i.e. we provide the input, output and intermediate data and participants can use fake procedures that take the right input and produce the right output. Alternatively, participants can actually execute the real workflow after installing the necessary libaries. In addition to this, we define a set of core queries that all partipicipants should show how they address, so we can compare systems.

Each participant in the challenge will have their own page on this TWiki, following the ChallengeTemplate, where they can inform the rest of their efforts in meeting the challenge. During the provenance challenge, we expect the participants to upload the following to their page, to then allow comparison.

  • Representations of the workflow in their system
  • Representations of provenance for the example workflow
  • Representations of the result of the core (and other) queries
  • Contributions to a matrix of queries vs systems, indicating for each that: (1) the query can be answered by the system, (2) the system cannot answer the query now but considers it relevant, (3) the query is not relevant to the project.

Optionally, the participants may like to contribute the following.

  • Additional queries (beyond the core queries) that illustrate the scope of their system
  • Extensions to the example workflow to best illustrate the unique aspects of their system
  • Any categorisation of queries that the project considers to have practical value

Participants should not be too concerned about whether extensions to the workflow are scientific realistic: they are explicitly contrived to demonstrate aspects of their system.

Example Workflow

We propose an example workflow for creating population-based "brain atlases" from the fMRI Data Center's archive of high resolution anatomical data. The workflow is shown below (click for a pdf version of the image).

BrainAtlas.gif

It is comprised of procedures, shown as orange ovals, and data items flowing between them, shown as rectangles. It can be seen as five stages, where each stage is depicted as a horizontal row of the same procedure in the figure. Note that the term stage is introduced only to help description of the workflow, and we do not dictate how it is apparent in a concrete implementation. The procedures employ the AIR (automated image registration) suite to create an averaged brain from a collection of high resolution anatomical data, and the FSL suite to create 2D images across each sliced dimension of the brain. In addition to the data items shown in the figure, there are other inputs to procedures (constant string options), defined below.

The inputs to a workflow are a set of new brain images (Anatomy Image 1 to 4) and a single reference brain image (Reference Image). All input images are 3D scans of a brain of varying resolutions, so that different features are evident. For each image, there is the actual image and the metadata information for that image (Anatomy Header 1 to 4). The image data was published with article Frontal-Hippocampal Double Dissociation Between Normal Aging and Alzheimer's Disease by Head, D, Synder, AZ, Girton, LE, Morris, JC, Buckner, RL in the fMRI Data Center Accession Number: 2-2004-1168X.

The stages of the workflow are as follows.

  1. For each new brain image, align_warp compares the reference image to determine how the new image should be warped, i.e. the position and shape of the image adjusted, to match the reference brain. The output of each procedure in the stage is a _warp parameter set_ defining the spatially transformation to be performed (Warp Params 1 to 4).
  2. For each warp parameter set, the actual transformation of the image is done by reslice, which creates a new version of the original new brain image with the configuration defined in the warp parameter set. The output is a resliced image.
  3. All the resliced images are averaged into one single image using softmean.
  4. For each dimension (x, y and z), the averaged image is sliced to give a 2D atlas along a plane in that dimension, taken through the centre of the 3D image. The output is an atlas data set, using slicer. This tool can be downloaded as part of the FSL suite, available at http://www.fmrib.ox.ac.uk/fsl/.
  5. For each atlas data set, it is converted into a graphical atlas image using (the ImageMagick utility) convert.

The full steps, procedures data and parameters are enumerated in the table below. The procedure names are linked to the manual pages for those utilities, and the input and output names to the actual data exchanged between procedures.

Step Procedure Data Role Item 1 Item 2 Item 3 Item 4
1 align_warp Inputs Anatomy Image 1 Anatomy Header 1 Reference Image Reference Header
Outputs Warp Parameters 1      
Parameters -m 12 -q
2 align_warp Inputs Anatomy Image 2 Anatomy Header 2 Reference Image Reference Header
Outputs Warp Parameters 2      
Parameters -m 12 -q
3 align_warp Inputs Anatomy Image 3 Anatomy Header 3 Reference Image Reference Header
Outputs Warp Parameters 3      
Parameters -m 12 -q
4 align_warp Inputs Anatomy Image 4 Anatomy Header 4 Reference Image Reference Header
Outputs Warp Parameters 4      
Parameters -m 12 -q
5 reslice Inputs Warp Parameters 1      
Outputs Resliced Image 1 Resliced Header 1    
Parameters
6 reslice Inputs Warp Parameters 2      
Outputs Resliced Image 2 Resliced Header 2    
Parameters
7 reslice Inputs Warp Parameters 3      
Outputs Resliced Image 3 Resliced Header 3    
Parameters
8 reslice Inputs Warp Parameters 4      
Outputs Resliced Image 4 Resliced Header 4    
Parameters
9 softmean Inputs Resliced Image 1 Resliced Header 1 Resliced Image 2 Resliced Header 2
Inputs Resliced Image 3 Resliced Header 3 Resliced Image 4 Resliced Header 4
Outputs Atlas Image Atlas Header    
Parameters y null
10 slicer (download) Inputs Atlas Image Atlas Header    
Outputs Atlas X Slice      
Parameters -x .5
11 slicer (download) Inputs Atlas Image Atlas Header    
Outputs Atlas Y Slice      
Parameters -y .5
12 slicer (download) Inputs Atlas Image Atlas Header    
Outputs Atlas Z Slice      
Parameters -z .5
13 convert Inputs Atlas X Slice      
Outputs Atlas X Graphic      
Parameters
14 convert Inputs Atlas Y Slice      
Outputs Atlas Y Graphic      
Parameters
15 convert Inputs Atlas Z Slice      
Outputs Atlas Z Graphic      
Parameters

Core Provenance Queries

An initial set of provenance-related queries is given below.

  1. Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.
  2. Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean.
  3. Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic.
  4. Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday.
  5. Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. The contents of a header file can be extracted as text using the scanheader AIR utility.
  6. Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12."
  7. A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.
  8. A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago.
  9. A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files.

Participant Instructions

We here give the specific steps that we expect each participating team to perform in completing the challenge.

  • The partipant should determine how they are going to execute the workflow (or a simulation of it) and how it will record data (provenance) about the execution.
  • The team should add the provenance to their TWiki page, and to declare the way in which they executed the workflow, e.g. upload a workflow script.
  • If the partipant has varied the workflow to make it more suitable for their system or to demonstrate an aspect important to their approach, then they should declare what this variation is.
  • The team should then use their systems to answer the core provenance queries, and any others that they wish to perform to demonstrate key aspects of their system.
  • The participant then uploads to the TWiki the queries performed, the way in which the queries were expressed/realised, and the answers they got.
  • For core queries that were not performed, the partipant should say why they were not performed, i.e. whether the query is considered out of scope for the system or in scope but not currently possible to answer.
  • For any data given above, each team should provide a link to an explanation of the representation used so that other participants can interpret it.

Sample Workflow Implementations

As it may be useful to some, we provide sample implementations of the workflow here. This should not preclude the use of any other technology. The implementations assume that the executables referenced above are all installed; they are provided by the two packages AIR (automated image registration) suite and ImageMagick.

Minor caution - this is a DOS text file, and if run on Unix the extra carriage returns at the ends of lines make their way into the filenames and cause everything to break. Strip the CRs with tr before running...

Timetable

  • 2006-June: Challenge finalised, participants start!
  • 2006-September-13: Deadline for challenge results to be uploaded
  • 2006-September-13 and 2006-September-14: Face-to-face meeting at which results are discussed
  • 2006-October-15: Comparisons performed, minutes of discussion, proposed next steps uploaded

-- SimonMiles - 21 Aug 2006
to top

I Attachment sort Action Size Date Who Comment
BrainAtlas.png manage 5.1 K 16 May 2006 - 15:02 SimonMiles Brain Atlas workflow (original vdt display)
BrainAtlas.pdf manage 118.8 K 30 May 2006 - 16:40 SimonMiles Brain Atlas workflow (hi-res)
workflow.sh manage 0.8 K 31 May 2006 - 15:07 SimonMiles Shell script version of workflow
BrainAtlas.gif manage 40.1 K 06 Jun 2006 - 17:39 LucMoreau  

You are here: Challenge > FirstProvenanceChallenge

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback