Skip to topic | Skip to bottom
Soca


Home


Create personal sidebar

Soca.VirtualDataLanguagePreprocessing

Start of topic | Skip to actions

Virtual Data Language Preprocessing

To aid the definition and configuration of our derivation workflow, we wrote a pre-processor program, vdlgen.sh. We had a pre-processed version of the VDL (analysisbase.vdl), the transformation catalog (tc.data) and the pool.config file (pool.config.kickstart or pool.config.nokickstart). These were transformed by the pre-processor into versions ready to be used by the abstract DAG generator. An example post-processed VDL script is shown in analysis.vdl.

New constructs

We added 4 simple constructs to VDL to aid our derivation workflow definition, that the pre-processor extracted and used to alter the VDL accordingly.

$${var}
  • expands to the value of variable var, supplied when pre-processing in a particular instance

MAP lfn pfn
  • causes the RLS to be checked for the mapping (logical filename) lfn to (physical filename) pfn and adds if not already mapped

FOR var FROM start TO end STEP step
block
END
  • counts from start to end incrementing by step, and outputs block each time
  • if the token %var% appears in block, it is expanded to the value of the counter
  • start, end and step may be calculations
  • these loops can be nested

LIST block start end step
  • a single-line convenience form of FOR, where var is always 'item'

Wrapping

Our pre-processor also takes configuration options that determine whether and how to 'wrap' the transformations. We have a wrapper program, like kickstart, that records provenance in our provenance store before, and potentially after, a script is executed. It is called recordProvenance.sh. We could use it by replacing kickstart in the pool.config, but we generally wanted kickstart as well.

If told to, the pre-processor calls a wrapping script, passing some configuration parameters. The wrapping script changes the VDL transformations to take extra inputs, including the path of the 'unwrapped' program to be executed at that step. It also changes the VDL derivations to pass that extra information. Finally, the wrapping script generates a new transformation catalog in which each transformation physical location was replaced with recordProvenance.

An extra field, delegates-provenance, in the pre-processed transformation catalog could mark an entry as not to be wrapped, but still be given the information required to record provenance. Transformations for which this is useful are workflow scripts that run several smaller activities locally, each of which should record its own provenance (inputs, outputs etc.), rather than just the workflow recording its inputs and outputs. This allows us to independently control the granularity of the distributed workflow, where each task should last about 15 minutes, and the granularity of the workflow of provenance-recording activities, which may be much finer.

-- SimonMiles - 23 Feb 2005
to top

I Attachment sort Action Size Date Who Comment
klausCompressWorkflow.pdf manage 7.9 K 25 Feb 2005 - 14:08 PaulGroth Compression Workflow
klausMeasureWorkflow.pdf manage 5.0 K 25 Feb 2005 - 14:09 PaulGroth Measurement Workflow
recordProvenance.sh manage 1.1 K 02 Mar 2005 - 15:52 SimonMiles PReP? provenance recording wrapper script
analysisbase.vdl manage 2.8 K 02 Mar 2005 - 15:56 SimonMiles Pre-processed VDL script
tc.data manage 1.8 K 02 Mar 2005 - 15:59 SimonMiles Pre-processed transformation catalogue
vdlgen.sh manage 3.7 K 02 Mar 2005 - 16:02 SimonMiles VDL pre-processor script
pool.config.kickstart manage 1.0 K 02 Mar 2005 - 16:03 SimonMiles Pool config file to use with kickstart turned on
pool.config.nokickstart manage 1.1 K 02 Mar 2005 - 16:03 SimonMiles Pool config file to use with kickstart turned off
analysis.vdl manage 5.2 K 06 Mar 2005 - 19:58 SimonMiles Post-processed VDL

Soca.VirtualDataLanguagePreprocessing moved from Ourpasoa.VirtualDataToolkitExperiment on 23 May 2005 - 12:37 by SimonMiles - put it back
You are here: Soca > DocumentStore > VirtualDataLanguagePreprocessing

to top

Copyright © 2004 by the University of Southampton