Chemical Predictive Modelling Experiment

A large set of data is produced by lab experimentation, where the data describes properties of molecules (to varying degrees of precision), and stored in a database. The chemists aim to determine a generally applicable model, in the form of an equation, for calculating one property of a molecule from others. A subset of the data, for which the calculated property is known, is selected and the chemists attempt applying different models to it to see which fits best. The model is then used to predict values for the calculated property for molecules where that property is unknown and the predicted value put back into the database to be used in further modelling.

Provenance Use Case 1

A chemist performs the experiment. They notice that a model fits well apart from a few outlying values. The chemist determines whether the outlying values are calculated from previously predicted values, which may be less trustworthy than data from experimentation.

Provenance Use Case 2

Experimental data continues to be added to the database, including data provided by different groups. The previously determined models are then retested to see if they still fit the experimental data, and some are found to no longer fit well. The chemist determines whether the data that does not fit the models comes from a particular research group.

