Skip to topic | Skip to bottom

Provenance Challenge

Challenge
Challenge.ComposingMultipleWorkflows

Start of topic | Skip to actions

Composing a scientific experiment with several different workflows

Scenario Authors: Scientific Workflow Group, COPPE, Federal University of Rio de Janeiro

Brief Summary:

A brief summary of the proposed scenario. A scenario where pre-existing workflows were conceived independently, using different scientific workflow management systems (SWfMS?). However, these independent workflows need to be integrated into a complex experiment, which entail some additional manual activities that link such workflows. How can these two different workflows be related ? How to link the last activity of workflow 1 to the first activity of workflow 2 ? In this scenario, each SWfMS? may manage provenance information in a decentralized and isolated way, meaning that each system considers provenance in a specific granularity, stores the information on a specific language, or even worse, some SWfMS? may not even provide a provenance solution at all.

Scenario Diagram:

There are several scenarios of workflow execution in a distributed environment. Each one has its own characteristics that make provenance management difficult. According to Fig. 1, these scenarios can be classified into four types:

  1. remote execution of one or more workflow activities;
  2. remote execution of a sub-workflow;
  3. remote execution of a sub-workflow by another SWfMS?;
  4. and execution of two or more workflows that are part of the same experiment in distinct SWfMS?.

The first type is the simplest scenario since a SWfMS? that provides a good provenance management is able for gathering provenance data even when activities are executed remotely. However, for the remaining scenarios, the use of a SWfMS? is not enough to manage all the provenance information, since data can be lost. For example, in the second and third scenarios, the SWfMS? that is executing the main workflow does not know about the remote execution of the sub-workflow. It does not know what activities are executed nor even what data were consumed or generated by them. The SWfMS? only knows the input and output data of the whole sub-workflow. In the third scenario, the scientist has the chance of verifying this provenance information in the secondary SWfMS?. Nevertheless, there is a high probability that this information is represented differently from the main SWfMS?, making the analysis process more complex to the scientist. The fourth scenario is an extreme situation where the workflow of an experiment is fragmented into several smaller workflows in order to be executed in an heterogeneous environment, that has different SWfMS?. In this case, each SWfMS? manages provenance information in a decentralized way, meaning that each system considers provenance in a specific granularity, stores the information on a specific language, or even worse, some SWfMS? do not provide a provenance solution at all. In situations like that, we can say that the experiment has a heterogeneous provenance support. This last scenario, although being an extreme situation, is becoming common in scientific experiments. This behavior is motivated by the fact that specific SWfMS? have particular properties, and the adoption of different SWfMS? in different regions of the workflow is more advantageous than the adoption of only one SWfMS? for the workflow as a whole. For example, some workflow regions need to be executed in SWfMS? that support results visualization. Other workflow regions need to be executed in SWfMS? that provide grid support, and so on. An additional reason may be due to organizational issues that imply that the workflow execution should occur in several laboratories of a virtual institute.

Users:

Scientists of scenarios such as bioinformatics and oil industry

Requirement for provenance:

Since different workflows will be handled by SWfMS? individually and isolated from each other, it can be impossible to trace back to the original first workflow activity. Provenance Questions: What were all the activities of this whole experiment ? What was the previous activity before activity X (where activity X is the first activity of workflow #2 and the user wants to know the last activity from workflow #1 or a manual activty).

Technologies Used:

databases, web browser, scientific workflow, object identity and global relationships

Background and description:

Any background or other important information about the scenario


to top


Challenge.ComposingMultipleWorkflows moved from Main.ComposingMultipleWorkflows on 08 Jun 2010 - 12:16 by LucMoreau - put it back
You are here: Challenge > FourthProvenanceChallenge > FourthProvenanceChallengeCFSP > ComposingMultipleWorkflows

to top

Copyright © 1999-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback